Shared Publication

EMBEDDED DESIGN EMBEDDED AI www.newelectronics.co.uk 23 February 2021 15 weerapong/stock.adobe.com “Embedded ML is about applying a proven set of technologies to a new context that will enable many new applications that were not previously possible.” Daniel Situnayake or ternary calculations that can be performed using little more than a few gates each do not hurt overall accuracy in many cases. Potentially the performance gains are enormous but lack the combination of hardware and software support needed to exploit them fully, says Situnayake. Though the tooling for the TensorFlow Lite framework typically supports int8 weights, support for lower resolutions is far from widespread. “This is changing fast,” Situnayake notes, pointing to accelerators such as Syntiant’s that support binary, 2bit and 4bit weights as well as work by Plumerai to train binarised neural networks directly. “While these technologies are still on the cutting edge and have yet to make it into the mainstream for embedded ML developers, it won’t be long before they are part of the standard toolkit,” he adds. Reducing the arithmetic burden There are other options for TinyML work that reduce the arithmetic burden. Speaking at the TinyML Asia conference late last year, Jan Jongboom, co-founder and CTO of Edge Impulse said the key attraction of ML is its ability to nd correlations in data that conventional algorithms do not pick. The issue lies in the sheer number of parameters most conventional models have to process to nd those correlations if the inputs are raw samples. “You want to lend your machinelearning algorithm a hand to make its life easier,” Jongboom says. The most helpful technique for typical real-time signals is the use of feature extraction: transforming the data into representations that make it possible to build neural networks with orders of magnitude fewer parameters. Taking speech as an example, a transformation to the mel-cepstrum space massively reduces the number of parameters that can ef ciently encode the changes in sound. In other sensor data, such as the feed from an accelerometer used for vibration detection in rotating machinery, other forms of joint timefrequency representations will often work. This approach is used by John Edwards, consultant and DSP engineer at Sigma Numerix and a visiting lecturer at the University of Oxford, in a project for vibration analysis. In this case, a short Fourier transform had the best trade-off coupled with transformations that compensate for variable speed motors. The feature extraction reduced the size of the model to just two layers that could easily be processed on an NXP LPC55C69, which combines Arm Cortex-M33 cores with a DSP accelerator. Jongboom says though it may be tempting to go down the route of deep learning, other machine-learning algorithms can deliver results. “Our best anomaly detection model is not a neural network: its basic k-means clustering.” Where deep learning is a requirement, sparsity provides a further reduction in model overhead. This can take the form of pruning, in which weights that have little effect on model output are simply removed from the pipeline. Another option is to focus effort on parts of the data stream that demonstrate changes over time. For example, in surveillance videos this may mean the use of image processing to detect moving objects and separate them from the background before feeding the processed pixels to a model. It’s been a learning experience for Jongboom and others. In describing his progress through the stages of TinyML, in the summer of 2017 he thought the whole concept was impossible. By the summer of 2020, having looked at ways to optimise application and model design together, his attitude had changed to believing real-time image classi cation on low-power hardware is feasible. As low-power accelerators that support low-precision and sparsity more ef ciently appear, the range of models that can run at micropower should expand. The result, Situnayake claims, is likely to be that “ML will end up representing a larger fraction than any other type of workload. The advantages of on-device ML will drive the industry towards creating and deploying faster, more capable low-power chips that will come to represent the majority of all embedded compute in the world”. Though there will be plenty of devices that do not run these workloads the need for speed as model sizes inevitably grow will focus attention on its needs and begin to dominate the development of software and hardware architectures, as long as the applications follow through. Yurok Aleksandrovich/stock.adobe.com /www.newelectronics.co.uk /stock.adobe.com /stock.adobe.com