1. Overview
In the current environment of rapid technological development, artificial intelligence (AI) technology is becoming increasingly widespread.Edge ComputingIt has also become a core pillar of AI applications. While traditional cloud computing offers powerful centralized processing capabilities, its limitations in terms of latency and bandwidth bottlenecks have become increasingly apparent under the demands of massive data transmission and real-time responsiveness. As a result, the concept of edge computing emerged, which allocates part of the computation to applications at the data generation end. This not only significantly reduces latency but also effectively alleviates network burdens.Enhanced the application's real-time performance and privacy.
In this wave of edge computing,MemryX accelerator cards, with their exceptional floating-point computing capabilities (BF16) and comprehensive software services, have become a unique presence in AI edge computing.In the past, traditional edge computing primarily focused on integer operations. However, in practical applications, certain tasks still require a high level of precision. As a result, intelligent chips capable of floating-point operations will become the ideal choice for edge applications such as object detection, image recognition, and natural language processing.In 2024, MemryX introduced a new accelerator card solution capable of delivering exceptional AI computing performance (20 TFLOPS) with low power consumption (5 TFLOPS/W).and is gradually becoming a key driver for edge intelligence applications.
In addition to hardware performance, MemryX also offers a wide range of software services, which is a major highlight. Its software support includes module evaluation, API interfaces, drivers, and various development tools, making it convenient for developers to quickly integrate and flexibly adjust AI computation requirements. MemryX's software support covers tools such as the MX3+ chip performance simulator, weight precision adjustment, and model cropping.Model LibraryResources that can help MX3+ achieve optimal performance.
In the future, MemryX will not only play a key role in upgrading existing systems but also become the core engine for deep integration of edge computing and AI. Its powerful floating-point computation capabilities and comprehensive software services provide users with plug-and-play AI solutions, ushering in a new era of edge intelligence.
This chapter will introduce users to how to install MemryX andExample program applications in C / C++。
2. Quickly Set Up MemryX
Hardware Architecture
Connect the MemryX MX3+ 2280-sized module to the M.2 slot of the Orange Pi, and install the heat sink, screen, USB camera, mouse, keyboard, and Ethernet cable.
(2) Download Orange Pi 5 Plus prebuilt image (Ubuntu)
Please go toOfficial WebsiteDownload prebuilt image
Download Orangepi5plus_1.0.8_ubuntu_focal_desktop_xfce_linux5.10.160.7z and extract it.
Supported Ubuntu versions: 18.04 (Bionic Beaver), 20.04 (Focal Fossa), 22.04 (Jammy Jellyfish)
Linux kernel version: 5.10.x ~ 6.1.x
(3) Burn the Ubuntu system onto the SD card
Please insert the SD card (it is recommended to use one with a capacity of 16GB or more) into the PC and use Rufus for flashing.
(4) Enter the Ubuntu system and connect to the network
After the burning process is complete, please insert the SD card into the Orange Pi 5 Plus. You can then connect the power to enter the system and connect to the network.
(5) Install kernel-header files
$ sudo apt install linux-headers-$(uname -r)
(6) Install the MemryX SDK package (C/C++)
▲ Add GPG key
$ wget -qO- https://developer.memryx.com/deb/memryx.asc | sudo tee /etc/apt/trusted.gpg.d/memryx.asc >/dev/null
▲ Add software to the APT list
$ echo 'deb https://developer.memryx.com/deb stable main' | sudo tee /etc/apt/sources.list.d/memryx.list >/dev/null
▲ Install MemryX MX3+ NPU drivers
$ sudo apt update
$ sudo apt install memx-drivers
▲ Install MemryX MX3+ Runtime (C/C++)
$ sudo apt install memx-accl
▲ Install MemryX MX3+ package
$ sudo apt install memx-accl-plugins
$ sudo apt install memx-utils-gui
$ sudo apt install qtbase5-dev qt5-qmake
$ sudo apt install cmake
$ sudo apt install libopencv-dev
$ sudo apt install libssl-dev
▲ Optimize hardware settings
The manufacturer currently offers Raspberry Pi 5, Orange Pi 5 Plus, and Radxa Rock 5B EVK for setup. If using Intel (x86), you can skip this step.
$ sudo mx_arm_setup
▲ Verification Environment
Please translateSystem rebootAfter, with the following instructionsConfirm whether the installation was successful.。
$ apt policy memx-drivers
3. DEMO Implementation Showcase (C/C++)
Please visit the official website and navigate to Tutorials for the DEMO teaching demonstration. Additionally, please connect a USB camera for the presentation.
Depth estimation
Depth estimation demonstrates the use of color images to generate depth maps with distance significance.
●Download and extractdepthEstimation.zip
$ unzip depthEstimation.zip
●Modify permissions
$ sudo chmod -R 777 depthEstimation/
●Compile
$ cd depthEstimation
$ mkdir build && cd build
$ cmake ..
$ make -j4
● Run
$ ./depthEstimation –cam /dev/video0
Running at approximately 29.81 frames per second, with CPU usage around 200% and memory usage approximately 0.1% (0.016 GB).
(2) Target Detection (CenterNet)
Object Detection - CenterNet is a classic object detection algorithm proposed in 2019.PDF]
●Download and extractcenternet_sample.zip
$ unzip centernet_sample.zip
●Modify permissions
$ sudo chmod -R 777 CenterNet/
●Compile
$ cd centernet_sample/CenterNet
$ mkdir build && cd build
$ cmake ..
$ make -j4
● Run
$ ./CenterNet
Running at approximately 23.6 frames per second, with a CPU usage of about 493.4% and memory usage of approximately 4.4% (0.7 GB).
Image source:https://www.pexels.com/
(3) Pose Estimation (YOLOv8)
Pose Estimation - YOLOv8 is currently the most popular DNN algorithm, proposed in 2023.UltralyticsDesigned to calculate the positions and correlations of human limb nodes.
●Download and extract poseEstimation_sample.zip
$ unzip poseEstimation_sample.zip
●Modify permissions
$ sudo chmod -R 777 poseEstimation/
●Compile
$ cd poseEstimation
$ mkdir build && cd build
$ cmake ..
$ make -j4
● Run
$ ./poseEstimation --cam /dev/video0
Running at approximately 22.4 frames per second, with CPU usage around 155.4% and memory usage approximately 1.6% (0.25 GB).
(4) Target Detection (YOLOv7t)
Object Detection - YOLOv7 Tiny is currently the most popular DNN algorithm, proposed in 2022. [PDF It is designed to calculate the positions and correlations of various objects.
●Download and extract objectDetection_sample.zip
$ unzip objectDetection_sample.zip
●Modify permissions
$ sudo chmod -R 777 objectDetection/
●Compile
$ cd objectDetection/
$ mkdir build && cd build
$ cmake ..
$ make -j4
● Run
$ ./objectDetection
Running at approximately 45.5 frames per second, CPU usage is around 445%, and memory usage is approximately 2.9% (0.46 GB).
Image source:https://www.pexels.com/
(5) Object Detection (YOLOv8s)
Object Detection - YOLOv8 is currently the most popular DNN algorithm, proposed in 2023.UltralyticsIt is designed to calculate the positions and correlations of various objects.
●Download and extractobjectDetection_sample.zip
$ unzip objectDetection_sample.zip
●Modify permissions
$ sudo chmod -R 777 yolov8_objectDetection
●Compile
$ cd yolov8_objectDetection/
$ mkdir build && cd build
$ cmake ..
$ make -j4
● Run
$ ./yolov8_objectDetection
Running at approximately 42.8 frames per second, with a CPU usage of about 225% and memory usage of around 4.2% (0.67 GB).
Image source:https://www.pexels.com/
(6) Multi-Stream Object Detection
Using the currently most popular YOLOv8 DNN algorithm to performMulti-Stream Object DetectionDisplay.
●Download and extractMX_DEMOS_20241029.tgz
If you would like to obtain this DEMO, please contact MemryX or the WPI representative.
$ tar zxvf MX_DEMOS_20241029.tgz
●Modify permissions
$ sudo chmod -R 777 MX_DEMOS/
●Compile
$ cd MX_DEMOS/
$ mkdir build && cd build
$ cmake ..
$ make -j4
● Run
$ ./demoVMS
Running at approximately 28.4 frames per second, with a CPU usage of about 615.5% and memory usage of around 20.0% (3.2 GB).
4. Conclusion
The MemryX MX3+ AI accelerator card offers a high-performance, low-power, and flexible AI edge computing solution.It is particularly suitable for applications such as object detection, visual analysis, and real-time monitoring. With floating-point operations (BF16) and built-in 10.5 MB SRAM memory, it ensures computational accuracy and enhances the performance and scalability of AI models without occupying the main system's memory resources.
In the C/C++ DEMO test, object detection with a single camera only requires approximately one CPU to process video, while system memory usage is as low as 1.6%, demonstrating the high computational efficiency and extremely low resource consumption of the MemryX chip. With deeper research, MemryX provides powerful development tools, allowing developers to flexibly partition the pre- and post-processing of AI modules. Developers can even offload image pre-processing to an ISP (Image Signal Processor) or DSP (Digital Signal Processor), further optimizing computational efficiency.
The Core Advantages of MemryX MX3+
● High frame rate processing: A single low-power M.2 card can simultaneously handle 10 camera streams, supporting parallel operation of multiple AI models.
● High precision and automatic compilation: Compile BF16 floating-point models with one click, ensuring AI accuracy without the need for additional adjustments or retraining.
● The original model remains intact: It can be deployed directly without modifying the AI model, with options for model pruning and compression to optimize the design.
● Automated pre/post-processing: Automatically identify and integrate pre- and post-processing code, reducing development and debugging time while improving deployment efficiency.
● Excellent scalability: Can operate as a single chip or combine 16 chips into a logical unit without the need for an additional PCIe switch.
● Low Power Consumption Design: A single MX3 chip consumes only 0.5W to 2.0W, and the power consumption of a 4-chip module is less than 1/10 of mainstream GPUs.
● Extensive hardware and software support: Compatible with x86, ARM, and RISC-V platforms as well as various operating systems, offering exceptional development flexibility.
With the widespread application of artificial intelligence in industries such as retail, automotive, industrial, agriculture, and robotics, MemryX is at the forefront of edge computing technology, delivering exceptional performance and greater value to its customers. In the future, MemryX will continue to drive technological innovation and becomeAn indispensable partner in the field of AI edge computingWith the tools and examples provided by the original manufacturer, AI is no longer an unattainable dream. By simply following the example steps step by step, you can quickly implement any intelligent application.If you're a new partner interested in trying or purchasing MemryX products, please contact Editor Eevee directly! Thank you.
5. Reference Documents
[2]MemryX Developer Center Technical Website
[3]EE Awards 2022 Asia Gold Selection Award
[4]MemryX - Official LinkedIn Account
[6]PR Newswire - MemryX announces the official production of the MX3 Edge AI Accelerator
If there is anything related.MemryXFor technical issues, feel free to leave a comment under the blog post to ask your questions!
More will be shared next.MemryXTechnical articles !!Stay tuned for 【ATU Book-MemryX Series】!!