RESEARCH & DEVELOPMENT PROCESSORS
RAM scoops up
computation
Memory is shaping up to take some of the workload off
processors. By Chris Edwards
It may be nothing more than a shortlived
fad. But the rapid ascendance
of machine learning in computing
since deep learning became practical
has pushed hardware development
in a new direction that could have
a major effect on how computer
designers think about memory.
The focus in design has turned to
how to sift through large quantities of
data, using long-distance relationships
in information. In contrast to more
conventional algorithms that often
work on highly localised data
streaming through the system,
machine learning puts much more
pressure on memory transfer rates
and the energy cost shuffling data
from bulk memory in and out of
processors.
One option is to avoid moving
the data. Instead, you bring the
computational engines into the
memory arrays themselves. Although
it was Professor Bill Dally’s group at
Stanford University that provided solid
evidence for memory access being a
major contributor to the energy cost
of computing, nVidia’s chief scientist
is not a believer in the idea of putting
compute into memory to try to reduce
that cost. He argues that locality has
not gone away completely simply
because the data sets have exploded
in size.
In keynote speeches at a string of
conferences last year, Dally said that
it is easy to lose sight of the fact that
the most costly accesses to memory
in terms of energy are relatively
infrequent even in machine-learning
applications.
“The gains are assuming you
have to read memory for every op
operation but you don’t. You do a
read from memory maybe once every
64 ops. With a reuse of 64 or 128,
the memory reference energy is
actually in the noise,” he said in a
keynote at the SysML conference.
Chairing a session dedicated to
in-memory computing research at the
Design Automation Conference (DAC)
in Las Vegas earlier in the summer,
1 0
1 0
1 0
1 0
SA SA SA SA
1100m
BIST
45.6m
Scan
and
test
Control
and
timing
MEM
CLK gen
Precharge and header switches
42.6m
Array
Sense amps
(a) (b)
26.5m 392m
WL drivers
17m
University of Michigan assistant
professor Reetuparna Das agreed
with Dally’s assessment on energy.
“The biggest advantage of compute
in memory is bandwidth, not so much
energy in my opinion.”
Bandwidth was the target of
attempts at in-memory compute
that some manufacturers tried
two decades ago. In 1994, Sun
Microsystems made a brief foray
into memory design with the help of
Mitsubishi Electronics America to
create what they called “3D RAM”.
This put some of the processing
needed for 3D rendering into the
framebuffer memory itself, mostly
performing simple pixel-by-pixel
comparisons if one object in the
space occludes another. This removed
the need for the graphics controller
to read those pixels itself, freeing up
bandwidth on the bus. At the time, the
economics of memory did not favour
the development of highly customised
BLB0 BL0
WLi
WLi WLj Above: Prototype
test chips (left) and a
SRAM circuit for inplace
operation
Vref
NOR WLj
BLBn BLn
WLi
Vref
NOR WLj
versions except in markets where the
price premium was acceptable.
Move to customisation
The economic argument is now
shifting in favour of customisation as
traditional 2D scaling in DRAM comes
under pressure and memory makers
look to novel architectures such as
magnetic (MRAM) or resistive (ReRAM)
technologies.
“These new memories will become
the basis for in-memory computing,”
says Kevin Moraes, vice president at
Applied Materials.
Expecting the memories to be
incorporated into SoC devices, such
as low-energy sensor nodes, Applied is
working more closely with specialised
start-ups such as Crossbar and Spin
Memory to develop production tools
for fabs. “Now we do a lot of work
hardware and software co-optimisation
as well as manufacturing,” adds
Moraes.
www.newelectronics.co.uk 10 September 2019 23
/www.newelectronics.co.uk