Use the OpenMV Cam to Quickly Start Applying Machine Learning to Object Detection

By Jacob Beningo

Contributed By Digi-Key's North American Editors

2020-02-25

The application of machine learning (ML) for object detection and classification is becoming an urgent need within the embedded systems industry—especially for Internet of Things (IoT), security, advanced driving assist systems (ADAS), and industrial automation-based systems. However, object detection is a complex topic and ML is relatively new, so developing ML applications to detect objects can be difficult and cumbersome.

For example, object detection has traditionally required developers to learn a framework like OpenCV and to purchase thousands of dollars in computer equipment in order to be successful. As such, traditional approaches to object detection and machine vision are not just time consuming, they are also expensive.

For engineers looking to apply ML for object detection and machine vision applications without the need to become an expert in ML or spend a small fortune on equipment, the Python programmable OpenMV H7 camera module from SparkFun Electronics is an innovative solution. The module is designed to be a low-cost “Arduino”-like module for image processing and detection. As such, the module and software ecosystem provide a unique and interesting solution to enable easy object detection and classification using ML in the form of a low-cost module.

This article introduces the OpenMV H7 camera module and shows how developers can get started applying ML to object detection using the CIFAR-10 computer vision image dataset.

The OpenMV H7 camera module

With its feature-rich software libraries, the OpenMV H7 camera module provides a lot to help a developer quickly create an ML application. For example, developers can use the OpenMV camera for face and eye detection and even to precisely track pupils. It can be used to create blobs or markers that then track a color. There are even examples on how to use ML to detect and track custom objects.

The OpenMV H7 camera module is a single integrated development board that includes all the hardware components necessary for ML-based object detection and classification at a cost that is orders of magnitude less than traditional machine vision systems. The module is relatively small, measuring only 1.4 x 1.75 inches (in.) and includes:

The expansion I/O provides developers with a range of peripheral features from the microcontroller (Figure 2). These features include communication interfaces such as:

UART

SPI

I2C

CAN

The expansion I/O also includes control and data channels for driving servos, generating signals through a digital to analog converter (DAC), or reading sensor values through an analog to digital converter (ADC). These expansion I/Os make the OpenMV module extremely interesting for vision applications within the home automation, robot guidance, industrial automation, and object detection and tracking spaces.

Figure 2: The OpenMV H7 camera module comes with a plethora of expandable I/O pins. These pins can be used to control servo motors, sample sensors, or to communicate with a Wi-Fi module to create an IoT device. (Image source: SparkFun)

The onboard microcontroller is STMicroelectronics’STM32F765VIT6, which includes an Arm Cortex-M7 processor in a 100-pin LQFP. The processor runs at 216 megahertz (MHz) and has 2 Megabytes (Mbytes) of Flash and 512 kilobytes (Kbytes) of RAM. The processor is extremely capable and well paired for machine vision applications due to its double-precision floating point unit (FPU) and full DSP instructions. The microcontroller also includes a hardware-based JPEG encoder that can accelerate imaging applications. The general block diagram for the STM32F765VIT6 is shown in Figure 3.

Figure 3: The STM32F765VIT includes 2 Mbytes of Flash, 512 Kbytes of RAM, and peripherals such as hardware-based JPEG encoding and DSP instructions that make it a perfect fit for machine vision applications. (Image source: STMicroelectronics)

The OpenMV H7 camera module is unique in that it supports several different camera modules. For example, if a developer didn’t want to use the onboard camera, which has a resolution of 640 x 480, a developer could switch to a module that supports the MT9V034 image sensor from ON Semiconductor. The MT9V034 is a 1/3 inch wide VGA format CMOS active-pixel digital imaging sensor that includes a global shutter and a high dynamic range (HDR) mode. The sensor has an image resolution of 752 x 480 and is designed to operate over a wide temperature range from -30˚C to +70˚C. ON Semiconductor provides a development board for this image sensor, the MT9V034C12STCH-GEVB (Figure 4).

Figure 4: The MT9V034C12STCH-GEVB is a development board for the MT9V034 image sensor that includes a built-in lens to accelerate development and testing. (Image source: ON Semiconductor)

Developing a first object detection application

Application development for the OpenMV H7 camera module is all done through the OpenMV IDE, which provides a Python interface for application development (Figure 5). With Python, there is no need to know a low-level programming language. In fact, not only is script written in Python, the OpenMV H7 camera module natively runs MicroPython. This provides developers with an extremely easy way to get started writing machine vision applications and running ML inferences with minimal effort.

Figure 5: The OpenMV IDE provides a Python-based interface to develop application code for the OpenMV H7 camera module. The application is then sent as a script to the camera module, which is running MicroPython. (Image source: Beningo Embedded Group)

One of the first things that a developer does once they are set up is to run the basic hello_world.py, which contains the code shown in Listing 1. The Python script shows a developer how to enable the OpenMV camera and take snap shots continuously. This allows the developer to get live video and measure the frame rate. The frame rate can vary while connected to a PC from as low as 25 frames per second (fps) to around 60 fps. The application is executed by simply connecting the OpenMV camera to the OpenMV IDE using the connect button in the lower left corner of the screen, and then clicking on the green run button.

Copy
# Hello World Example
#
# Welcome to the OpenMV IDE! Click on the green run arrow button below to run the script!
import sensor, image, time
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.skip_frames(time = 2000) # Wait for settings take effect.
clock = time.clock() # Create a clock object to track the FPS.
while(True):
clock.tick() # Update the FPS clock.
img = sensor.snapshot() # Take a picture and return the image.
print(clock.fps()) # Note: OpenMV Cam runs about half as fast when connected
# to the IDE. The FPS should increase once disconnected.

In order to perform a first object detection and classification test, an ML network needs to be trained with the desired object recognition classes. A commonly used image dataset that is used to train object detection models and test how well the model is working is the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60,000 images containing 32 x 32 color images for the following 10 different image classes:

Airplane

Automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Describing the process that is used to train the model and convert it to an inference that can run on the OpenMV camera is beyond the scope of this article. But it’s possible to run a CIFAR-10 trained network without going through the model development: OpenMV IDE already includes a trained model for CIFAR-10 that just needs to be loaded onto the camera.

To use this model, connect the OpenMV camera to the PC and the OpenMV IDE. From within the OpenMV IDE, click on tools -> machine learning -> CNN Network Library. A window will open with the OpenMV qtcreator models folder. There are two options:

cmsisnn

tensorflow

Under cmsisnn, navigate to the cifar10 folder, click cifar10.network and then open. Another window will then open. This window is to save the trained network file on the OpenMV camera. The user can then select the mass storage drive that appears with the camera to save the network.

Once the network has been saved, load the CIFAR-10 machine learning example by going to File -> Examples -> 25-Machine-Learning -> nn_cifar10_search_whole_window.py. This loads an example script that can be seen below (Listing 2).

Copy
# CIFAR-10 Search Whole Window Example
#
# CIFAR is a convolutional neural network designed to classify its field of view into several
# different object types and works on RGB video data.
#
# In this example we slide the LeNet detector window over the image and get a list of activations
# where there might be an object. Note that use a CNN with a sliding window is extremely compute
# expensive so for an exhaustive search do not expect the CNN to be real-time.
import sensor, image, time, os, nn
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((128, 128)) # Set 128x128 window.
sensor.skip_frames(time=750) # Don't let autogain run very long.
sensor.set_auto_gain(False) # Turn off autogain.
sensor.set_auto_exposure(False) # Turn off whitebalance.
# Load cifar10 network (You can get the network from OpenMV IDE).
net = nn.load('/cifar10.network')
# Faster, smaller and less accurate.
# net = nn.load('/cifar10_fast.network')
labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
clock = time.clock()
while(True):
clock.tick()
img = sensor.snapshot()
# net.search() will search an roi in the image for the network (or the whole image if the roi is not
# specified). At each location to look in the image if one of the classifier outputs is larger than
# threshold the location and label will be stored in an object list and returned. At each scale the
# detection window is moved around in the ROI using x_overlap (0-1) and y_overlap (0-1) as a guide.
# If you set the overlap to 0.5 then each detection window will overlap the previous one by 50%. Note
# the computational work load goes WAY up the more overlap. Finally, for mult-scale matching after
# sliding the network around in the x/y dimensions the detection window will shrink by scale_mul (0-1)
# down to min_scale (0-1). For example, if scale_mul is 0.5 the detection window will shrink by 50%.
# Note that at a lower scale there's even more area to search if x_overlap and y_overlap are small...
# contrast_threshold skips running the CNN in areas that are flat.
for obj in net.search(img, threshold=0.6, min_scale=0.5, scale_mul=0.5, \
x_overlap=0.5, y_overlap=0.5, contrast_threshold=0.5):
print("Detected%s-Confidence%f%%" % (labels[obj.index()], obj.value()))
img.draw_rectangle(obj.rect(), color=(255, 0, 0))
print(clock.fps())

Listing 2: The OpenMV IDE nn_cifar10_search_whole_window.py example application is used to classify images and provide a confidence level measurement for the classification. (Code source: OpenMV)

The test application can be executed as per the hello_world.py script: connect the OpenMV IDE to the module by clicking connect in the lower left, and then click run. At this point, the camera will be running the script and trying to classify the images that it is seeing. The terminal window outputs whether an image is classified or not and what the confidence level is.

At this point, a developer just needs to present different objects that are within the CIFAR-10 dataset and let the camera classify them. For the purposes of this article, the camera was provided with an image of a cat (Figure 6) and an image of an airplane (Figure 7). It’s difficult to see from the image, but the confidence level was around 70%. The confidence level may be low due to differences in the trained images versus the test images, lighting conditions, and other factors. Higher levels are certainly achievable with additional training and tighter control over the camera environment.

Expanding the OpenMV H7’s capabilities

There are a lot of opportunities to expand the OpenMV module for use with different camera modules and an almost limitless number of external sensors.

While there is I/O expansion capability on the OpenMV module, it can be useful to use an external expansion board to provide additional access to power, ground, and communication signals. One useful expansion board is the DFR0578 Gravity Expansion Shield for the OpenMV M7 module from DFRobot (Figure 8). In the figure, it can be seen that the Gravity module provides access to quite a few more power and ground pins. It even expands out additional I2C lines and provides additional options to power the module. This makes it much easier to interface external sensors and modules without having to use a breadboard or to splice wires.

There are many other expansion boards that could be connected to the OpenMV H7camera module depending on the end application.

Figure 9: The DFRobot FireBeetle DFR0498 includes expansion for media devices such as a microphone. (Image source: DFRobot)

Tips and tricks for working with OpenMV

Getting started with the OpenMV H7 camera module is not difficult, but there are many nuances and decisions that developers working with it for the first time should be aware of. Here are a few “tips and tricks” for getting started with the module:

When first using the module, make sure to adjust the focus on the module using the procedure outlined in the OpenMV documentation.

From the Files -> Examples menu, access dozens of examples ranging from how to detect a color through face recognition.

To add internet connectivity, consider using a Wi-Fi shield. A Wi-Fi shield can be enabled automatically on start-up in the OpenMV IDE from the Tools -> OpenMV Cam Settings option.

Consider using TensorFlow Lite to train an ML model for the object(s) of interest.

Developers that follow these “tips and tricks” will find that they can save quite a bit of time and grief when working with the OpenMV H7 camera module for the first time.

Conclusion

As shown, the OpenMV H7 camera module is uniquely suited to helping developers quickly get started applying ML principles to object detection and related applications. Not only are there example applications available for it that developers can leverage to accelerate their design, there are also numerous expansion options for cameras and sensors. In order to get started, developers just need to know how to write a few lines of Python code and they can have a working application within several hours, depending on the complexity.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.

Related Product Highlight

STM32 F3 MCUsSTMicroelectronics’ F3 Series of mixed-signal microcontrollers available at Digi-Key. View the entire series & learn more today at DigiKey.com!

STM32 F4 Microcontroller STMicroelectronics' MCU based on the Cortex-M4 core for the DSC market and offers multiple features that allow it to offer higher performance than the STM32 F2.

STM32 L0 MCUsSTM32 L0 ultra-low-power MCUs for applications operating on battery or energy harvesting are available in LQFP64, LQFP100, LQFP144, and WLCSP64 packages.

About this author

Jacob Beningo

Jacob Beningo is an embedded software consultant. He has published more than 200 articles on embedded software development techniques, is a sought-after speaker and technical trainer, and holds three degrees, including a Masters of Engineering from the University of Michigan.