Multimodal Input

Inputs as “signal”

RAPID-MIX API projects might take input from any combination of sensors, audio, or video. Generically, we could refer to streams of values from any of these inputs as a signal. Different signals will have different properties in terms of bandwidth and frame rate. RAPID-MIX API provides many process to help manage signals and make them more useful. Some of these process are general and some are specific to one kind of input.

Signal Processing vs Feature Extraction

Signal Processing refers to the general process of taking a signal as input, applying some operations to it, and outputting the result. Feature Extraction is a particular case of signal processing, in which the operations applied to the input signal aim at extracting high-level information from it.

For example if you play a song into your system, you could use signal processing to apply a low-pass filter and remove all the high pitched sounds, and you could use feature extraction to identify drum beats and detect the tempo.

Signal processing is very useful in an interactive machine learning workflow: applying filters can reduce input noise and improve classification results, but using feature extraction algorithms is a way to explicitly tell the system what it is supposed to recognise. For instance, if I feed a model with raw audio samples and expect it to recognize the original sounds afterwards, it is very likely that my system will perform badly. If instead I extract let’s say some spectral features from these audio samples and feed the model with them, the recognition will work much better.

Useful signal processing

The processors in this section mostly provide some basic mathematical functions that can be applied to any sort of input signal. Rapid-Mix has a simple API for basic processes, called rapidStream(). The pipo API lets you chain together processes, and is more efficient for higher bandwidth streams, like audio.

Windowing

Windowing refers to the process of grouping a specific number of consecutive frames or samples into a “window” that can be examined together. The term buffer is also used as a synonym for window. A window could be the direct input to interactive machine learning, or it could be the unit that is processed by further signal processing or feature extraction.

A rapidStream instance does windowing automatically, like this:

C++

1

rapidmix::rapidStream myProcessor(5);//create a processor with a window size of 5

Pipo’s windowing function is called slice.

Low-pass filters

Low-pass filtering is a way to remove rapidly changing components of a signal while leaving slower changes in place. In music, this has the affect of removing higher frequencies while leaving the bass. For a sensor, this kind of filtering is a good way to remove noise. There are a few different techniques that work as low-pass filters.

Averaging (aka mean) — The classic low-pass filter is the average of a window of two or more samples. This can be a very effective noise reduction filter for sensor signals.

RMS — “Root Mean Square” processing takes the square of each of the samples in a window, finds the mean of those squares, and then returns the square root of that mean. This filter is also good for figuring out the “power” or “strength” of a window of signals, especially when that window has both negative and positive values. We’ve had good luck using RMS on EMG signals, for example, where we want to know how strongly a muscle is being activated.

High-pass filters

High-pass filtering is good for focussing on fast moving signal components: sudden spikes rather than steady values.

Subtracting the low-pass filtered signal from the original gives the high frequency component

Statistics on windows: Standard Deviation, minimum or maximum can give some ideas if an event has happened in a signal window

Feature extractors

FFT

MFCC

segmentation

fundamental frequency

TODO:

Should we discuss some issues of time and timing here? Asynchronous vs synchronous? Fixed sample rate, time stamps, etc. Maybe too ambitious for this version