Three ways 3D Chip Technology is Disrupting Computing.
Update Time: 2022-07-08 18:20:32
AMD, Graphcore, and Intel showcase why this industry-leading vertical is evolving.
A new direction to continue Moore's Law is coming. Each generation of processors needs to perform better than the previous generation, which means integrating more logic on the silicon. But this will face two problems: our ability to shrink transistors and their constituent logic and memory blocks are slowing down. The other is that the chip has reached its size limit. Photolithography tools can only pattern an area of about 850 square millimeters, equivalent to an Nvidia GPU's top size.
For several years now, system-on-a-chip developers have been breaking down their increasingly large designs into smaller, smaller chips and linking them together in the same package to effectively increase silicon area and other advantages. In CPUs, most of these connections are so-called 2.5D, where chips are placed next to each other and connected using short, dense interconnects. This integration will only gain momentum as most major manufacturers have agreed on a standard for 2.5D chip-to-chip communications.
But transmitting as much data as on the same chip requires shorter, denser connections, which can only be achieved by stacking one chip on top of another. Connecting two chips face-to-face means thousands of connections per square millimeter.
This requires a lot of innovation to make it work. Engineers had to figure out how to prevent the heat from one chip in the stack from killing the other, decide which features should go where and how they should be made, and prevent the occasional bad chip from leading to a large number of expensive, useless systems, and deal with the attendant complexity of solving all these problems at once.
Here are three examples, from simple to complex, that shows the current state of 3D stacking.
AMD's Zen 3
AMD's 3D V-Cache technology connects a 64-megabyte SRAM cache [red], and two blank-structured small chips to a Zen 3 compute chip.
PCs have long offered the option of adding more memory to provide faster speeds for ultra-large applications and data-heavy work. Thanks to 3D chip stacking, AMD's next-generation CPU chips also offer that option. Of course, it's not an aftermarket addition, but ordering a processor with extra cache memory may be a good option if you want to build a computer with extra power.
Although both the Zen 2 and new Zen 3 processor cores are built using the same TSMC manufacturing process - and therefore have the same size transistors, interconnects, and everything else - AMD has made so many architectural changes that it allows them to even without the extra of cache, Zen 3 offers an average performance increase of 19%. One of the architectural gems is the inclusion of a set of silicon via vias (TSVs), vertical interconnects that run directly through most of the silicon. The TSVs are built into Zen 3's highest level cache, an SRAM block called L3, which sits in the middle of the computing chip and is shared between all eight cores.
In processors used for data-heavy workloads, the backside of the Zen 3 wafer is thinned until the TSVs are exposed. Then, a 64-megabyte SRAM chip is attached to the exposed TSV using hybrid bonding, a process like cold soldering copper together. The result is a dense set of connections that can approach 9 microns. Finally, a blank silicon chip is attached to the rest of the Zen 3 CPU chip for structural stability and thermal conduction.
Adding additional memory by setting it next to the CPU chip is not an option, as data takes a long time to reach the processor core. "Despite the tripling of the L3 [cache] size, the 3D V-Cache only adds four [clock] cycles of latency - which can only be achieved with 3D stacking," said John Wuu, senior design engineer at AMD, speaking at the IEEE International Solid-State Circuits Conference.
A bigger cache has a place in high-end gaming. Using desktop Ryzen CPUs and 3D V-Cache can increase gaming speeds by an average of 15 percent at 1080p. It is also suitable for more serious work, reducing the run time of difficult semiconductor design calculations by 66%.
Wu notes that the industry's ability to shrink SRAM is slowing down compared to SRAM's ability to shrink logic. As a result, you might expect future SRAM expansion kits to continue to be made using more mature manufacturing processes while compute chips are pushed to the forefront of Moore's Law.
Graphcore's Bow AI Processor
3D integration can speed up computation even if there is no single transistor on a chip in the stack. UK-based AI computer company Graphcore has dramatically increased its system performance simply by installing powered silicon on its AI processors. The addition of powered silicon means the combined chip, called Bow, can run faster (1.85 GHz versus 1.35 GHz) and at a lower voltage than its predecessor. Compared to its predecessor, the computer can train neural networks 40% faster and consume 16% less energy. Importantly, users do not need to change their software to get this improvement.
The power management chip consists of a combination of capacitors and silicon vias. The latter only provides power and data to the processor chip. It is the capacitor that makes the difference. Like the bit storage elements in DRAM, these capacitors are formed in deep, narrow trenches in the silicon. Because these charge banks are so close to the processor's transistors, power transfer becomes smooth, allowing the processor core to run faster at lower voltages. Without the power transfer chip, the processor would have to increase its operating voltage above its nominal level to operate at 1.85 GHz, thus consuming more power. With the power chip, it can reach this clock frequency and consume less power.
The manufacturing process used to make BoW is unique but unlikely to stay that way. Most 3D stacks connect one chip to another while one of the chips is still on the wafer, called a chip on a wafer (see "AMD's Zen 3" above). Instead, Bow uses TSMC's wafer-to-wafer, where one wafer is combined with another and then sliced into smaller pieces, and Graphcore CTO and co-founder Simon Knowles says this is the first chip on the market to use this technology, which allows for a higher density of connections between two chips than can be achieved using an on-chip process.
Although the power delivery chips do not have transistors, they may be forthcoming. Knowles said, "using the technology for power delivery alone is only a first step for us, and it will go much further shortly."
Intel's Ponte Vecchio supercomputer chip
The Aurora supercomputer was designed to be the first U.S. high-performance computer (HPC) to break the exaflop (billion times per second high-precision floating-point calculations) barrier. To get Aurora to such heights, Intel's Ponte Vecchio packaged more than 100 billion transistors on 47 pieces of silicon in a single processor. Using 2.5D and 3D technologies, Intel compressed 3,100 square millimeters of silicon into 2,330 square millimeters, almost the equivalent of four Nvidia A100 GPUs.
Intel Research Wilfred Gomes told engineers attending the IEEE International Solid-State Circuits Conference that the processor pushes Intel's 2D and 3D chip integration technology to its limits.
Each Ponte Vecchio is two sets of mirrored chips connected using Intel's 2.5D integration technology, Co-EMIB. Co-EMIB forms a bridge of high-density interconnects between two 3D chip stacks. The bridge is a small slice of silicon embedded in a packaged organic substrate. The Co-EMIB chip also connects high-bandwidth memory and an I/O chip to the "substrate," which is the largest chip on which the rest are stacked.
The base tile uses Intel's 3D stacking technology, called Foveros, on which small chips of computing and cache are stacked. The technology creates a dense array of chip-to-chip vertical connections between two chips. These connections can be 36 microns, except for short copper posts and solder micro-bumps. Signals and power enter this stack through-silicon vias, with fairly wide vertical interconnects running directly through most of the silicon.
This is not easy, says Gomes, who has innovated in yield management, clock circuitry, thermal regulation, and power delivery. For example, Intel engineers chose to provide the processor with a higher-than-normal voltage (1.8 volts) so that the current was low enough to simplify the package. The circuitry in the base block reduces the voltage to close to 0.7 V for use in the compute block, and each compute block must have its power domain in the base block. The key to this capability is new high-efficiency inductors called coaxial magnetic integrated inductors. Because these are built into the package substrate, the circuitry meanders back and forth between the base block and the package before supplying voltage to the compute block.
Gomes said it took a full 14 years to go from the first petaflop supercomputer in 2008 to this year's exaflops machine. Advanced packaging technologies, such as 3D stacking, could help cut the next thousand-fold improvement in computing to six years.
Ratings and Reviews
AU80610004671AA S LBMHIntel
FPGA MAX 10 Family 8000 Cells 55nm Techn >
SATA SSDs Controller 6Gb/s >
- WGI210IT S LJXT
- FW82801BA SL5PN