EMBEDDED DESIGN EMBEDDED AI
www.newelectronics.co.uk 23 February 2021 15
weerapong/stock.adobe.com
“Embedded ML
is about applying
a proven set of
technologies to a
new context that
will enable many
new applications
that were not
previously
possible.”
Daniel Situnayake
or ternary calculations that can be
performed using little more than a
few gates each do not hurt overall
accuracy in many cases. Potentially
the performance gains are enormous
but lack the combination of hardware
and software support needed to
exploit them fully, says Situnayake.
Though the tooling for the
TensorFlow Lite framework typically
supports int8 weights, support
for lower resolutions is far from
widespread. “This is changing
fast,” Situnayake notes, pointing to
accelerators such as Syntiant’s that
support binary, 2bit and 4bit weights
as well as work by Plumerai to train
binarised neural networks directly.
“While these technologies are
still on the cutting edge and have yet
to make it into the mainstream for
embedded ML developers, it won’t
be long before they are part of the
standard toolkit,” he adds.
Reducing the arithmetic burden
There are other options for TinyML
work that reduce the arithmetic
burden. Speaking at the TinyML
Asia conference late last year, Jan
Jongboom, co-founder and CTO of
Edge Impulse said the key attraction
of ML is its ability to nd correlations
in data that conventional algorithms
do not pick. The issue lies in the
sheer number of parameters most
conventional models have to process
to nd those correlations if the inputs
are raw samples.
“You want to lend your machinelearning
algorithm a hand to make
its life easier,” Jongboom says. The
most helpful technique for typical
real-time signals is the use of feature
extraction: transforming the data into
representations that make it possible
to build neural networks with orders
of magnitude fewer parameters.
Taking speech as an example, a
transformation to the mel-cepstrum
space massively reduces the number
of parameters that can ef ciently
encode the changes in sound.
In other sensor data, such as the
feed from an accelerometer used
for vibration detection in rotating
machinery, other forms of joint timefrequency
representations will often
work.
This approach is used by John
Edwards, consultant and DSP
engineer at Sigma Numerix and a
visiting lecturer at the University
of Oxford, in a project for vibration
analysis.
In this case, a short Fourier
transform had the best trade-off
coupled with transformations that
compensate for variable speed
motors. The feature extraction
reduced the size of the model to
just two layers that could easily be
processed on an NXP LPC55C69,
which combines Arm Cortex-M33
cores with a DSP accelerator.
Jongboom says though it may be
tempting to go down the route of
deep learning, other machine-learning
algorithms can deliver results. “Our
best anomaly detection model is not
a neural network: its basic k-means
clustering.”
Where deep learning is a
requirement, sparsity provides a
further reduction in model overhead.
This can take the form of pruning,
in which weights that have little
effect on model output are simply
removed from the pipeline. Another
option is to focus effort on parts of
the data stream that demonstrate
changes over time. For example, in
surveillance videos this may mean
the use of image processing to detect
moving objects and separate them
from the background before feeding
the processed pixels to a model.
It’s been a learning experience for
Jongboom and others. In describing
his progress through the stages
of TinyML, in the summer of 2017
he thought the whole concept was
impossible. By the summer of
2020, having looked at ways to
optimise application and model
design together, his attitude had
changed to believing real-time image
classi cation on low-power hardware
is feasible. As low-power accelerators
that support low-precision and
sparsity more ef ciently appear,
the range of models that can run at
micropower should expand.
The result, Situnayake claims,
is likely to be that “ML will end up
representing a larger fraction than
any other type of workload. The
advantages of on-device ML will
drive the industry towards creating
and deploying faster, more capable
low-power chips that will come
to represent the majority of all
embedded compute in the world”.
Though there will be plenty of devices
that do not run these workloads
the need for speed as model sizes
inevitably grow will focus attention
on its needs and begin to dominate
the development of software and
hardware architectures, as long as
the applications follow through.
Yurok Aleksandrovich/stock.adobe.com
/www.newelectronics.co.uk
/stock.adobe.com
/stock.adobe.com