AI chip memory issues in 2021
Update Time: 2021-09-23 14:17:57
Several companies are currently developing AI chips for network edge systems, yet vendors face a variety of challenges in processing nodes and memory selection, which will also vary by application.
For example, products in the network edge category involve cars, drones, surveillance cameras, smart speakers and even enterprise servers. All of these applications include low-power chips that run machine learning algorithms. While many of the components of these chips are no different from other digital chips, the main difference is that most of the processing on these chips takes place in, or near, memory.
Given this, manufacturers of AI edge chips are evaluating different types of memory for next-generation devices. Each type of memory has its own set of challenges. In many cases the chips use proven processes rather than state-of-the-art technology, but they must also themselves use low-power architectures.
AI chips, sometimes called deep learning gas pedals or processors, are optimized to use machine learning to handle a variety of workloads in a system. Machine learning is a subset of AI that uses neural networks to process data and identify patterns, match certain patterns, and understand which of these attributes are important.
These chips are aimed at the entire computing application domain, but there are clear differences in these designs directly. For example, chips developed for the cloud are typically based on advanced processes and are expensive to design and manufacture. Edge devices, meanwhile, include chips developed for the automotive market, as well as drones, surveillance cameras, smartphones, smart doorbells and voice assistants. In this broad field, each application has different requirements. For example, a smartphone chip is very different from a chip for a smart doorbell.
For many edge products, their goal is to develop low-power devices with just enough computing power. "These types of products can't afford a 300-watt GPU. for many of these applications, even a 30-watt GPU is too big." Linley Gwennap, principal analyst at The Linley Group, said, "But device makers still want to make some sophisticated devices. This requires more powerful AI capabilities than microcontrollers. You need powerful chips that don't run out of battery or cost too much, especially in consumer applications. So you have to consider some rather radical new solutions."
On the one hand, most edge devices don't need expensive advanced node chips because they're too expensive. There are exceptions, of course. In addition, many AI edge chips process functions in or near memory, which accelerates the system with less power consumption.
Vendors are considering various memory approaches and exploring new approaches for future chips.
Use conventional memory such as SRAM.
Using NOR memory, or a new technology called analog memory computing.
Using phase-change memory, MRAM, ReRAM and other next-generation memories, which are already being widely adopted by AI edge chips.
The AI "explosion"
Machine learning has been around for decades. Yet our systems don't have enough power to run these algorithms.
In recent years, machine learning has begun to boom thanks to the advent of GPUs and other chips and machine-generated algorithms.
"It was only since the 1990s that machine learning began to gain traction," says Aki Fujimura, CEO of D2S, "but things have changed in recent years with the advent of GPUs, which have advanced the adoption of deep learning because today our computing power is has been enhanced."
The goal of these and other devices is to process algorithms in neural networks, which are essentially calculating matrix products and summing them. The data matrix is first loaded into the network. Then, each element is multiplied by a pre-determined weight and the result is passed to the next layer of the network, where it is multiplied by a new set of weights. After repeating this step several times, the result is a conclusion about the data.
Machine learning has been used in many industries, including the semiconductor industry, where dozens of machine learning chip vendors have emerged. Many are companies that develop chips for the cloud. The chips for these systems are designed to accelerate Web search, language translation and other applications. According to the Linley Group, the market for these devices exceeded $3 billion in 2019.
In addition, dozens of AI edge chip vendors have emerged in the market, such as Ambient, BrainChip, GreenWaves, Flex Logix, Mythic, Syntiant, and others. In total, 1.6 billion edge devices are expected to be equipped with deep learning gas pedals by 2024.
AI edge chips can use 8-bit computing to run machine learning algorithms. "You can generate, use and process data in the same place. This has a big advantage: we all face battery life issues. If you can do AI processing locally without having to turn on an Internet connection, that saves a lot of power. Responsiveness is also important, as well as reliability and, ultimately, privacy as well." Kurt Busch, CEO of Syntiant, said, "In deep learning, the biggest problem lies in memory access. The battery and performance bottlenecks ultimately fall on memory. Second, parallel processing. In deep learning, I can do millions of multiplications and accumulations in parallel, and effectively scale linearly with parallel processing."
AI edge chips have different requirements. For example, smartphones integrate leading application processors. But this is not the case for other edge products such as doorbells, surveillance cameras and speakers.
Solutions for edge devices involve economics," said Walter Ng, vice president of business development at UMC. It has to be very cost-sensitive. The overall aim is competitive cost, low power consumption, and a simplified distribution of computing."
In addition, there are other factors to consider. Many AI edge chip suppliers need to deliver products at a mature node around 40nm. This process is currently ideal and not costly. But looking ahead, suppliers want higher performance with lower power consumption. The next node is 28nm, which is also very mature and cheap. Recently, manufacturers have introduced a variety of 22nm processes, which are extensions of 28nm.
22nm is slightly faster than 28nm, but more expensive. Most suppliers will not migrate to 16nm/14nm finFETs because they are too expensive.
Migrating to the next node is not an easy decision. "Many of today's customers and their applications are on 40nm." Ng said, "When looking at the next node roadmap, will they be satisfied and get the best price/performance on 28nm? Or does 22nm look more attractive and offer more benefits than 28nm? That's a factor that many are considering."
Using Legacy Memory Technology
In legacy systems, the memory hierarchy is simple. For this reason, we integrate SRAM into cache processors that can access commonly used programs. The DRAM used for main memory is independent and located in the memory module.
In most systems, data is moved back and forth between the memory and the processor. But this swapping leads to increased wait times and power consumption, sometimes called "memory walls," and the problem gets worse as the amount of data increases.
Therefore, in-memory or near-memory computing is ideal for solving this problem. In-memory computing puts the tasks that need to be processed into memory, while near-memory computing uses the memory closest to the processing logic.
Not all chips use in-memory computing. However, AI edge chip vendors are using these methods to break down memory walls. They are also moving some processing functions from the cloud.
Last year, Syntiant introduced its first product, the Neural Decision Processor, which integrates a neural network architecture into a small, low-power chip. The 40nm audio device also integrates an Arm Cortex-M0 processor with 112KB of RAM.
Syntiant's SRAM-based memory categorizes its architecture as computing around memory. The idea behind the chip is to make voice the primary interface in the system. Amazon's Alexa is a good example of an online voice interface.
"Voice is the next generation of interfaces." Syntiant's Busch says, "We've built these solutions specifically to add long-term online voice interfaces to all battery-powered devices, from small hearing aids to, well, large laptops or smart speakers."
Syntiant is developing new devices and is working on different memory types. "We are working on a number of emerging memory technologies, such as MRAM and ReRAM, primarily to increase storage density." Jeremy Holleman, Syntiant's chief scientist, said, "First of all, power consumption on read, and secondly power consumption on standby is a big thing, because for large models, you end up with a lot of memory. But maybe you only need to do a smaller portion of the computation on a given instance. The ability to reduce power consumption when not using storage units is critical."
Advanced processes are not currently required. "For the foreseeable future, the leakage of advanced nodes is too high for ultra-low power applications." Syntiant's Busch says, "Edge devices often have nothing to do. As opposed to devices in the data center, which need to process compute once they're powered on and you want it to run all the time. But edge devices are often waiting for things to happen. As a result, you need very low power consumption, which advanced nodes aren't very good at."
Today, most AI chips rely on built-in SRAM that is fast. "However, using SRAM to fit millions of chips in a standalone digital edge processor is very expensive, regardless of the technology used." Vineet Kumar Agrawal, director of design for Cypress's IP business unit, said, "It's 500 times more expensive to get data from DRAM than it is from internal SRAM."
Meanwhile, many AI edge chip suppliers are using or looking for another memory type: NOR. NOR is a non-volatile flash memory used in standalone and embedded applications. NOR is typically used for code storage.
NOR technology is mature, but requires additional and expensive masking steps at each node. And it is difficult to scale NOR beyond 28nm/22nm. However, some companies are using today's NOR flash memory to develop a technology called analog memory computing. Most of these devices start at the 40nm node.
"Look at traditional digital AI architectures where the two main sources of power consumption are both computation: multiplication and addition. And then, secondly, moving data from memory to the compute unit and then back again." Gwennap of the Linley Group explains, "People's attempts are addressing both of those problems. They put the computation directly into the memory circuit, so the data wouldn't have to be moved too far. Instead of using a traditional digital multiplier, they used an analog technique that allows current to run through a variable resistor. Ohm's law is then used to calculate the product of the current and the resistance."
Analog technology within the memory is expected to reduce power consumption. However, not all NORs are created equal. For example, some NOR technologies are based on floating-gate architectures.
Microchip has developed an in-memory analog computing architecture for machine learning using a NOR-based floating-gate approach. The technology incorporates a multiply-accumulate (MAC) processing engine.
"With this approach, users do not need to store model parameters or weights in SRAM or external DRAM." Vipin Tiwari, director of embedded memory product development for Microchip's SST division, said, "The input data is provided to the array for MAC calculations. Doing so eliminates the storage bottleneck in the MAC calculation because the calculation is done where the weights are stored."
There are other NOR approaches. For example, Cypress has long offered another embedded NOR flash technology called SONOS. SONOS is based on charge-trap flash, a dual-transistor technology that can change threshold voltage by adding or removing charge from the nitride layer, and it is available for a variety of nodes up to 28nm.
SONOS is optimized for use as an embedded memory for machine learning. "Two SONOS multi-bit embedded non-volatile memory cells can replace up to eight SRAM cells, or 48 transistors. This is very efficient, and you can also increase power efficiency and throughput by a factor of 50-100." Cypress's Agrawal said, "SONOS is programmed using a highly linear and low power tunnel-through process that is capable of targeting Vts with a high degree of control to produce nanoamp bit cell current levels. This is in contrast to floating grids that use hot electrons, where you can't control the amount of current flowing into the cell. Plus, you have a much higher cell current."
Using new memory technology
Since NOR cannot scale beyond 28nm/22nm, AI edge chip suppliers are working on several next-generation memory types, such as phase change memory (PCM), STT-MRAM, ReRAM, etc.
For AI, these memories also run machine learning applications with neural networks.
These memories are attractive because they combine the speed of SRAM with the non-volatility of flash memory for unlimited endurance. However, because the new memories use complex materials and switching schemes to store data, they take longer to develop.
"Semiconductor manufacturers face new challenges when migrating from charge-based memories (SRAM, NOR) to resistive memories (ReRAM, PCM)," said Masami Aoki, regional director for Asia at KLA Process Control Solutions, "These emerging memories consist of new elements that require precise control of material properties and new defect control strategies to ensure performance uniformity and reliability, especially for large-scale integration."
Intel has long shipped 3D XPoint, a type of PCM. micron also sells PCM. non-volatile memory PCM stores data by changing the state of the material and is faster than flash memory, which has better endurance.
PCM is a challenging technology, although suppliers have addressed these issues. "With 3D XPoint phase change memory, the sulfur group is exceptionally sensitive to environmental conditions and process chemistry." Rick Gottscho, Lam Research executive vice president and chief technology officer, said, "There are a variety of technical strategies to deal with all of these issues."
PCM is also a target for AI. in 2018, IBM published a paper on using PCM to process 8-bit precision memory multiplication techniques. Although no one is selling products in volume yet, IBM and other companies are still developing PCM for AI edge applications.
STT-MRAM, also on sale, has the speed of SRAM with the non-volatility of flash memory and unlimited endurance. It uses the magnetism of electron spins to provide non-volatility in the chip.
STT-MRAM is ideal for embedded applications and is designed to replace NOR at 22nm and higher wavelengths. "Look at the new memory, MRAM is the best choice for low density (less than 1Gb).MRAM is the best embedded memory. It is better than NOR, although you can go NOR on a 28nm or larger chip. NOR adds more than 12 masks, so MRAM is the preferred choice for embedded from a cost, density and performance perspective." said Mark Webb, head of MKW Ventures Consulting.
However, some experts believe that MRAM supports only two levels and is therefore not suitable for in-memory computing. Some have a different view, says Diederik Verkest, a distinguished technician at Imec: "It's true that an MRAM device can only store one bit. However, in in-memory computing, it is important to understand the differences between the storage device and the compute unit. The computational unit performs the multiplication of the stored weights and the input activation. In the best case, the storage device inside the computation unit can store multiple weight levels. However, it is possible to use multiple storage devices to make a computational unit that stores weights. If 3 levels of weights are used (then the weights can be -1, 0, 1), then two storage devices can be used and the computational unit will consist of two storage devices and some analog circuitry around that storage unit for calculating the product weight values and activations. Thus, MRAM devices can be used inside the computational cell to store multilevel weights and build in-memory computational solutions."
ReRAM is another option. The technology offers lower read latency and faster write performance than flash memory. reRAM applies a voltage to the material stack, which causes a change in resistance and records the data in memory.
At the recent IEDM conference, Leti presented a paper about chip technology using analog and ReRAM to develop an integrated Spiking Neural Network (SNN). 130nm test chips had a peak power consumption of 3.6pJ each, an R&D device using 28nm FD-SOI.
SNN is different from traditional neural networks," said Linley Group's Gwennap. "It doesn't have any power consumption unless the input changes. So, in theory, it's ideal if you have a security camera facing your front yard. Nothing changes unless someone walks by."
Leti's SNN devices are ideal for the edge, says Alexandre Valentian, a research engineer at Leti: "Exactly what the edge means remains to be seen, but I can say that ReRAM and SNN are specifically tailored to endpoint devices. reRAM and pulse coding are well suited because this coding strategy Simplifies memory computation. Instead of using a DAC at the input (e.g. matrix vector multiplication), it can simplify the ADC at the output (fewer bits) or eventually remove it altogether if the neuron is analog."
However, ReRAM is difficult to develop. Only a few parts are available. "In our opinion, ReRAM is theoretically suitable for 1T1R designs (embedded) and, in the future, for 1TnR using a suitable crosspoint selector. the difficulty is that the development of actual products has been very slow over the last two years. We believe that this is due to retention issues and interference (relative to cycles) in the storage elements themselves. These issues need to be addressed and we need products with 64Mbit embedded and 1Gbit crosspoints." MKW's Webb said.
In summary, there is no consensus on which category of next-generation memory is better suited for AI edge applications. The industry continues to explore current and future options.
For example, Imec recently enabled a 10,000 TOPS/W matrix vector multiplier using an analog memory computing architecture called AiMC after evaluating several options.
Imec evaluated three options: SOT-MRAM, IGZO DRAM, and projected PCM. spin-orbit torque MRAM (SOT-MRAM) is the next generation of MRAM. and indium gallium zinc oxide (IGZO) is a novel IGZO is a new type of crystal structure.
There are various devices that store the weights of DNNs," says Imec's Verkest. These devices use different mechanisms to store the weight values (magnetic, resistive, capacitive) and employ different implementations of AiMC arrays."
It is not clear which current or next-generation memory technology is the winner. Perhaps all technologies have a place. there is a place for SRAM, NOR and other conventional memory.
But there is little room for dozens of AI chip suppliers. There are already signs of major upheaval, with large companies starting to buy startups. As with all new chip sectors, some companies will succeed, some will be acquired, and some will fail.