[ATU Book-MemryX Series] A Step-by-Step Guide to AI - Kasumi Style Orange Pi 5 Plus (Rockchip RK3588) Combined with MemryX MX3 Chip for Easy AI Hands-On - Python Edition

Keywords :MemryXNPUAI accelerator card20 TOPSM.2 PCIe Gen3YOLO

1. Overview

In the current environment of rapid technological development, artificial intelligence (AI) technology is becoming increasingly widespread.Edge ComputingIt has also become a core pillar of AI applications. While traditional cloud computing offers powerful centralized processing capabilities, its limitations in terms of latency and bandwidth bottlenecks have become increasingly apparent under the demands of massive data transmission and real-time responsiveness. As a result, the concept of edge computing emerged, which allocates part of the computation to applications at the data generation end. This not only significantly reduces latency but also effectively alleviates network burdens.Enhanced the application's real-time performance and privacy.

In this wave of edge computing,The MemryX accelerator card, with its exceptional floating-point computation capability (BF16) and comprehensive software services, has become a unique presence in AI edge computing.In the past, traditional edge computing primarily focused on integer operations. However, in practical applications, certain tasks still require high precision. Therefore, intelligent chips capable of floating-point operations will become the ideal choice for edge applications such as object detection, image recognition, and natural language processing.In 2024, MemryX introduced a new accelerator card solution capable of delivering exceptional AI computing performance (20 TFLOPS) with low power consumption (5 TFLOPS/W)., gradually becoming a key driver for edge intelligence applications.

In addition to hardware performance, MemryX also offers a wide range of software services, which is a major highlight. Its software support includes module evaluation, API interfaces, drivers, and various development tools, making it convenient for developers to quickly integrate and flexibly adjust AI computation requirements. MemryX's software support covers features such as MX3+ chip performance simulation (Simulator), weight precision adjustment (Weight Precision), model cropping (Model Cropping) tools, and more.Model libraryResources that can help MX3+ achieve optimal performance.

In the future, MemryX will not only play a key role in upgrading existing systems but also serve as the core engine for the deep integration of edge computing and AI. Its powerful floating-point computation capabilities and comprehensive software services provide users with plug-and-play AI solutions, ushering in a new era of edge intelligence.

This chapter will introduce users to how to install MemryX andExample applications of Python programs。

2、Quickly set up MemryX

(1) Hardware architecture

Connect the MemryX MX3+ 2280-sized module to the M.2 slot of the Orange Pi, and attach the heat sink, screen, USB camera, mouse, keyboard, and Ethernet cable.

(2) Download Orange Pi 5 Plus pre-built image (Ubuntu)

Please go toOfficial websiteDownload prebuilt image

Download Orangepi5plus_1.0.8_ubuntu_focal_desktop_xfce_linux5.10.160.7z and extract it.

Supported Ubuntu versions: 18.04 (Bionic Beaver), 20.04 (Focal Fossa), 22.04 (Jammy Jellyfish)
Linux kernel version: 5.10.x ~ 6.1.x

(3) Flash the Ubuntu system onto the SD card

Please insert the SD card (it is recommended to use one with a capacity of 16GB or more) into the PC and use Rufus for burning.

(4) Enter the Ubuntu system and connect to the network

After the burning process is complete, please insert the SD card into the Orange Pi 5 Plus. You can then connect the power to enter the system and connect to the network.

(5) Install kernel-header files

$ sudo apt install linux-headers-$(uname -r)

(6) Install the MemryX SDK package

▲ Add GPG key

$ wget -qO- https://developer.memryx.com/deb/memryx.asc | sudo tee /etc/apt/trusted.gpg.d/memryx.asc >/dev/null

▲ Add software to the APT list

$ echo 'deb https://developer.memryx.com/deb stable main' | sudo tee /etc/apt/sources.list.d/memryx.list >/dev/null

▲ Install MemryX MX3+ NPU drivers

$ sudo apt update

$ sudo apt install memx-drivers

▲ Install MemryX MX3+ Runtime (C/C++)

$ sudo apt install memx-accl

▲ Install the MemryX SDK package (Python)

Install the necessary packages

$ sudo apt install python3.12-venv

(2) Create a virtual environment

$ python3 -m venv ~/mx

(3) Activate the virtual environment

$ . ~/mx/bin/activate

(4) Install the necessary packages into the virtual environment.

$ sudo apt install python3-pip

$ pip3 install --upgrade pip wheel

$ sudo apt install libhdf5-dev python3-dev cmake python3-venv

$ sudo apt install python3-pip

$ sudo apt install python3-numpy

$ sudo apt install python3-opencv

$ sudo apt install python3-matplotlib

$ pip install opencv-python

(5) Install MemryX MX3+ Runtime (Python)

$ pip3 install --extra-index-url https://developer.memryx.com/pip memryx

(6) Verify Environment

Use the following command to verify if the version is correct.

$ mx_nc --version

The following command is used to verify the chip status.

$ mx_bench –hello

3. DEMO Implementation Showcase (Python)

Please visit the official website and navigate to Tutorials for the DEMO teaching demonstration. Additionally, connect a USB camera for the presentation. Remember to enter the Python virtual environment.

(1) Depth estimation

Depth Estimation demonstrates the use of color images to generate depth maps with distance significance.

●Downloaddepth_estimate.pyandmidas_v2_small.dfpPlace it in the folder.

●Create a folder and copy files

$ mkdir DepthEstimation_Python && cd DepthEstimation_Python

●Run

$ python3 depth_estimate.py

Running at approximately 29 frames per second, with a CPU usage of about 200.3% and memory usage of approximately 0.1% (0.25 GB).

(2) Pose Estimation (YOLOv8)

Pose Estimation - YOLOv8 is currently the most popular DNN algorithm, introduced in 2023.UltralyticsDesigned to calculate the positions and correlations of human limb nodes.

●Download and extractPose_Estimation_Python.zip

$ unzip Pose_Estimaton_Python.zip

●Run

$ cd Pose_Estimaton_Python
$ python3 app.py

Running at approximately 30 frames per second, CPU usage is around 260.0%, and memory usage is about 5.2% (0.83 GB).

(3) Object Detection (YOLOv7t)

Object Detection - YOLOv7 Tiny is currently the most popular DNN algorithm, proposed in 2022.PDFDesigned to calculate the positions and correlations of various objects.

●Download and extractobject_detection_multistream_python_on_mx3.zip

$ unzip object_detection_multistream_python_on_mx3.zip

●Run

$ cd object_detection_multistream_python_on_mx3
$ python3 app.py

Running at approximately 29.8 frames per second, with a CPU usage of about 198.2% and memory usage of approximately 4.7% (0.83GB).

Note: If you encounter the issue 'ImportError: cannot import name 'Simulator' from 'memryx',' please go to yolov7.py and remove 'from memryx import Benchmark, Simulator'.

(4) Object Detection (YOLOv8S)

Object Detection - YOLOv8 is currently the most popular DNN algorithm, proposed in 2023.UltralyticsDesigned to calculate the positions and correlations of various objects.

●Download and extract yolov8_object_detection_python.zip

$ unzip yolov8_object_detection_python.zip

●Run

$ cd yolov8_object_detection_python
$ python3 app.py

Running at approximately 30 frames per second, with a CPU usage of about 192.4% and memory usage of around 5.2% (0.83GB).

(5) Face Detection and Emotion Classification

Face Detect & Emotion Classification is achieved by using the Mobilenet algorithm to calculate the correlations between various facial expressions, enabling the classification of different emotions.

●Downloadmultimodel_python.tar.xzandmodels.dfpPut it in the folder.

$ tar -xvf multimodel_python.tar.xz --xz

●Rename the model to face_det_emotion_recog.dfp

$ mv models.dfp multimodel_python/face_det_emotion_recog.dfp

●Run

$ python3 app.py

Running at approximately 30 frames per second, with a CPU usage of around 200% and memory usage of about 0.1% (0.25 GB).

4. Conclusion

The MemryX MX3+ AI accelerator card offers a high-performance, low-power, and flexible AI edge computing solution.It is particularly suitable for applications such as object detection, visual analysis, and real-time monitoring. By utilizing floating-point operations (BF16) and built-in 10MB SRAM memory, it ensures computational precision and enhances the performance and scalability of AI models without occupying the main system's memory resources.

In the Python DEMO test, object detection using a single camera only requires two CPUs to process the video, while system memory usage is as low as 5%, showcasing the high computational efficiency and extremely low resource consumption of the MemryX chip. With deeper research, MemryX provides powerful development tools, allowing developers to flexibly partition the pre- and post-processing of AI modules. Developers can even offload image pre-processing to an ISP (Image Signal Processor) or DSP (Digital Signal Processor) to further optimize computational efficiency. While Python is undoubtedly convenient, its performance shows slight differences compared to C++!

Core advantages of MemryX MX3+

● High frame rate processing: A single low-power M.2 card can simultaneously handle footage from 10 cameras and supports parallel operation of multiple AI models.

● High precision and automatic compilation: One-click BF16 floating-point model compilation ensures AI accuracy without the need for additional adjustments or retraining.

● The original model remains intact: It can be deployed directly without modifying the AI model, with options for model pruning and compression to optimize the design.

● Automated pre/post-processing: Automatically identify and integrate pre- and post-processing code to reduce development and debugging time, and improve deployment efficiency.

● Exceptional scalability: Can operate as a single chip or combine up to 16 chips into a logical unit without requiring additional PCIe switches.

● Low Power Consumption Design: A single MX3 chip consumes only 0.5W to 2.0W, and the power consumption of a 4-chip module is less than 1/10 of mainstream GPUs.

● Extensive hardware and software support: Compatible with x86, ARM, RISC-V platforms and various operating systems, offering exceptional development flexibility.

With the widespread application of artificial intelligence in industries such as retail, automotive, industrial, agriculture, and robotics, MemryX stands at the forefront of edge computing technology, delivering exceptional performance and greater value to its customers. In the future, MemryX will continue to drive technological innovation and becomeAn indispensable partner in the field of AI edge computingWith the tools and examples provided by the original manufacturer, AI is no longer an unattainable dream. By simply following the example steps step by step, you can quickly implement any intelligent application.If you are a new partner interested in trying or purchasing MemryX products, please contact Editor Eevee directly! Thank you.

5. Reference Documents

[1]MemryX Official Website

[2]MemryX Developer Center Technical Website

[3]EE Awards 2022 Asia Gold Selection Award

[4]MemryX - Official LinkedIn Account

[5]MemryX_Example

If there is anything related.MemryXFor technical issues, feel free to leave your questions in the comments below!

More will be shared next.MemryXTechnical articles !!Stay tuned for 【ATU Book-MemryX Series】!!

★All blog content is provided by individuals and is unrelated to the platform. For any legal or infringement issues, please contact the website administrator.

★ Please maintain civility online and post responsibly. If a post receives 5 reports within a week, the author will be temporarily suspended.