Keywords :MemryXNPUAI accelerator card20 TOPSM.2 PCIe Gen3YOLOobject detection

1. Overview

Today, AI technology is everywhere, from smart cities and Industry 4.0 to autonomous driving and smart healthcare. Artificial intelligence is no longer just a theory but has become the core engine driving global progress. However,The real challenge for AI lies in real-time response and low-power consumption computing, which is also the key reason for the rise of edge computing.Although cloud-based AI computing offers powerful processing capabilities, it faces bottlenecks such as data transmission latency and high bandwidth requirements. This makes it unsuitable for applications requiring millisecond-level decision-making, such as autonomous vehicle recognition, industrial robotic arm control, and real-time alerts in surveillance systems, which cannot rely on cloud-based responses.

To bring AI computation closer to the data source and enhance real-time performance,MemryXAn AI acceleration solution specifically designed for edge computing has been proposed.The MemryX MX3 AI accelerator card adopts the BF16 floating-point computation architecture.Breaking the limitations of traditional edge devices that only support integer operations (INT8), it demonstrates exceptional performance in high-precision AI tasks such as image recognition, speech processing, and object detection. It delivers an energy efficiency ratio of 5 TFLOPS/W and computational power of up to 20 TFLOPS, enabling low-latency and high-precision AI inference.

In addition, MemryX alsoBuild a comprehensive development ecosystem that enables developers to seamlessly integrate the design, compilation, deployment, and optimization of AI models.Including the Neural Compiler (which converts AI models into the DFP format), Simulator (for predicting throughput and latency), Benchmark (for performance benchmarking), and Viewer (a GUI visualization tool), these tools make AI application development more intuitive and efficient. With these plug-and-play development tools, MemryX enables developers to quickly deploy AI models and flexibly adapt to mainstream frameworks such as TensorFlow, PyTorch, and ONNX, without the need to retrain models, making them applicable to various edge scenarios.

As the integration of AI and IoT technologies accelerates, MemryX leverages its powerful computing capabilities and low-power design to drive a new era of AI edge computing, providing innovative solutions for smart cities, industrial automation, and AIoT devices. MemryX's goal is not only to enhance AI inference performance but also to make AI computing simple, flexible, and efficient, becoming a key tool for AI developers and advancing the rapid evolution of edge intelligence technology.

2. Development Kit

MemryX offers a set ofSoftware Development Kit (SDK)It includes the compilation tool (Neural Compiler Tool), chip simulation tool (Simulator Tool), accelerator application tool (Accelerator), and visualization interface (Viewer). Currently, this suite can only be used on PC and supports Ubuntu and Windows operating systems. Please follow the steps below for installation:

SDK Software Development Kit Diagram – Source: Official Website

Source - Official Website

1. Compiler (Neural Compiler)

Neural CompilerStandardized compilation tool for MemryX (recommended for use on PC computers)It can convert various module formats into the DFP format (Dataflow Program), which allows the MX3 chip to be configured and to process the incoming module architecture and parameter information. It also supports multiple machine learning frameworks, such as TensorFlow, Keras, ONNX, and PyTorch.

Diagram of format conversion for each module to MemryX DFP format

The details of the compiler can be divided into four layers, as shown in the diagram below, in the following order:

(1) Framework Interface: Converts module forms into internal graph forms.

(2) Graph processing: By restructuring and optimizing the internal graph.

(3) Mapper: Maps the internal graph to the optimal configurable MX3 hardware resources, aiming for maximum throughput (FPS).

(4) Assembler: Generate DFP files.

DFP Document Generation Diagram

Source: Memry file

Single-Model Application

Usage:

Sentence: $ mx_nc -v -m<model>

-m, --model: Set the actual module path, supports .h5 / .pb / .py / .onnx / .tflite formats

-g, --chip_gen: Set the chip generation (default: mx3)

-c, --num_chips: Set the number of chips (default: 1)

-v: View compiler program information.

※ One MX3 chip can process approximately 10 MB of data.

※ For more actions, please refer to the official website.Software Development Kit。

Multi-Model Application

In many AI application scenarios, it is inevitable to need toThe scenario of applying multiple modules to a single moduleFor example, when detecting facial expressions for judgment, it is necessary to first locate the position of the face and then classify it based on the emotions such as happiness, anger, sadness, or joy.

Usage instructions:

$mx_nc -v -m<model_1> <model_2> <model_3>

Source: Memry Developer Website

Multi-Chip Application

The compiler will automatically allocate the workload of the given model to the available chips.

Usage instructions:

$mx_nc -v-c 2-m<model_1> <model_2>

Source: Memry Developer Website

Applications of Multiple Input Streams & Shared Input Stream

Typically, each model uses a separate data stream independently.

Source: Memry Developer Website

In the case of multiple models with the same input stream, the compiler allows the shared use of the same input.

Source: Memry Developer Website

Usage instructions:

$ mx_nc -v -m<model_1> <model_2> --models_share_ifmap

Change input shape (reshape)

The following example demonstrates a typical case of providing input shapes for a single-input model passed to the neural compiler from the command line.

Usage method (single module):

$ mx_nc -m<model> -is '300,300,3'

Usage (multi-module):

$ mx_nc -m<model_1> <model_2> -is '224,224,3' '300,300,3'

Model Cropping

When using AI chips, it is inevitable to encounter the need toRemove specific architecture layers or computational units(Operators) can perform more effectively. Therefore, MemryX also provides this functionality, allowing modules to be divided into architectures such as image pre-processing (Pre-Processing), neural network processing (Neural Network), and image post-processing (Post-Processing). These tasks can then be delegated to the image signal processor (ISP), graphics processing unit (GPU), or central processing unit (CPU) to achieve more efficient heterogeneous multi-core computation.

How to use auto-cropping:

$ mx_nc -g -m<model>-v --autocrop -so
-- autocrop: The system automatically crops before and after processing.

Manual cropping usage:

$ mx_nc -m<model>-v --so --outputs<layer>-v -so

-is, --input_shapes: Set the input size

--input_format: Set the input format (default value: BF16)

--inputs: Specify the name of the preprocessing cropping framework

--outputs: Specifies the name of the post-processing cropping framework

-so: Check the optimization steps of the compiler program

The results are as follows, and you can check the usage of computing units, weight memory, etc., from the image.

Diagram of module cropping

2. Benchmark

BenchmarkIt is one of the standard tools for AI chips, used to test the performance of running modules.MemryX has designed benchmarking tools for C/C++ and Python, namely acclBench and mx_bench. These tools can be used to measure FPS and latency data.

Download the test moduleSSDlite-MobileNet-v2_300_300_3_tensorflow.zip

$ unzipSSDlite_MobileNet_v2_300_300_3_tensorflow.zip

(2) acclBench (C++)

acclBench [-h] [-v] [-d] [-m] [-n] [-f] [-iw] [-ow] [-device_ids] [-ls]

Command:
$ acclBench -d SSDlite_MobileNet_v2_300_300_3_tensorflow.dfp -f 100

(3) mx_bench (Python)

$mx_bench [-h] [-v] [-d] [-f]

3. Simulator

SimulatorStandardized tools for MemryX (for use on PC computers)Provide high precisionSimulationPerformance: Capable of accurately simulating the performance of MemryX AI chips and displaying test data for FPS (frames per second) and latency.

Usage instructions:

$mx_sim -v -d<dfp>-f 4

-d, --dfp: Set the actual DFP file path

-f, --frames: Set the number of simulated frames (random value)

-v: View compiler program information.

--no_progress_bar: Disable the progress bar

--sim_directory: Path to the simulation folder (default: ./simdir)

※ The emulator cannot specify the number of chips; it must be determined by the chip size set by the DFP.

Diagram of the simulation tool
Source: Memry file

4. Visualization Tool (Viewer)

The visualization tool (Viewer) is the GUI interface provided by MemryX, which includes the aforementioned compiler, simulator, and accelerator.

Usage instructions:

$ mx_viewer

Compiler:

Step 1: Select a neural network model

Step 2: Select the target system

Step 3: Compile the module

Step 4: Run the results

Simulator:

Step 1: Set the number of sheets

Step 2: Run the simulation

Step 3: Review the results

Accelerator

It is necessary to connect to the physical MX3 EVK, which operates similarly to the simulator.

5. Inspector (DFP Inspect)

Inspector (DFP Inspect)A set of tools provided for MemryX to check dfp files.

Usage instructions:

$dfp_inspect<dfp>

Output information

● DFP

■ Compiler version used

■ Compilation date and time

■ Target chip quantity

■ Target Architecture Generation

■ File size (in MB) of the emulator configuration and MXA hardware configuration

● The filename of the compiled model

● Active input and output port configuration

Example:

6. Open-source module resources (Model Zoo)

Original manufacturerOfficial websiteIt also provides a wealth of open-source module resources and analysis, as shown in the figure below.

Module Analysis

Source: Memry Developer Website

Module resources

Source: Memry Developer Website

3. Conclusion

The MemryX MX3+ AI accelerator card, with its outstanding computational performance and low power consumption, provides AI developers with a powerful and flexible solution. More importantly,Built-in comprehensive software development toolchain enables developers to quickly deploy AI models while easily adjusting pre- and post-processing workflows.Achieve optimized AI inference performance. From model conversion to performance optimization, MemryX provides one-stop development support, making AI development more efficient and intuitive.

To meet the needs of developers, MemryX has meticulously crafted a suite of professional tools, including the Neural Compiler, Simulator, Benchmark, and Viewer. These tools are not only powerful but are also designed with simplicity and ease of use at their core. The Neural Compiler enables fast and seamless model conversion; the Simulator allows developers to simulate runtime performance before deployment, helping them predict real-world application behavior; the Benchmark provides detailed throughput and latency analysis; and the Viewer presents data through a visual interface, making the development process more intuitive. The integration of these tools enables developers to focus on innovation without being bogged down by complex technical details.

In real-world testing, MemryX chips demonstrated their exceptional performance and flexibility. During C/C++ and Python DEMO tests, a single chip was able to simultaneously process multiple camera streams and support the parallel execution of multiple AI models, fully showcasing its advantages in edge computing scenarios. Additionally, MemryX's automated model pruning and compilation process allows developers to deploy models directly without modifying the original ones, significantly lowering the development barrier and greatly improving development efficiency.

With the rapid evolution of AI technology, MemryX is leading the technological trend in edge computing, providing high-performance, low-power, and flexible AI solutions for various industries. This article introduces tools and application examples designed to help developers quickly master the use of the MemryX MX3+, making AI technology more accessible and driving the realization of smart living. If you are interested in MemryX products or wish to gain more technical support and collaboration opportunities, please feel free to reach out.Contact Eevee editor! Thank you.

4. Reference Documents

[1]MemryX Official Website

[2] MemryX Developer Center Technical Website

[3] EE Awards 2022 Asia Gold Selection Award

[4] MemryX_example

[5] PR Newswire - MemryX Announces the Official Production of the MX3 Edge AI Accelerator

If there is anything related.MemryXFor technical issues, feel free to leave a comment under the blog post to ask your questions!

More will be shared next.MemryXTechnical article !!Stay tuned for the 【ATU Book-MemryX Series】 !!

★All blog content is provided by individuals and is unrelated to the platform. For any legal or infringement issues, please contact the website administrator.

★ Please maintain civility online and post responsibly. If a post receives 5 reports within a week, the author will be temporarily suspended.

[ATU Book-MemryX Series] Step-by-step guide to AI - Kasumi-style Orange PI 5 Plus (Rockchip RK3588) combined with MemryX MX3 chip for easy AI setup - Tools Edition

1. Overview

2. Development Kit

1. Compiler (Neural Compiler)

Usage:

Sentence: $ mx_nc -v -m<model>

3. Conclusion

4. Reference Documents