What are the challenges of creating a large keyboard chip?
Published time: 2019-12-20 10:37:32
Currently, the largest computer chips can usually be placed in the palm of your hand, and some small ones can be placed on your fingertips. The smaller and smaller chips seem to be the overall development trend and general concept of the industry.
Now, a startup in Silicon Valley, Cerebras, is challenging this idea. On Monday, the company announced what it claims is the largest computer chip ever. It's as big as a plate—about 100 times that of a regular chip—and can't be placed on a person's lap.
Engineers developing the chip believe it can be used in large data centers and help accelerate the development of artificial intelligence (AI), from autonomous vehicles to Amazon's Alexa, which can benefit from its emergence.
Many companies are manufacturing new chips for AI, including traditional chip makers such as Intel and Qualcomm, as well as other start-ups in the US, UK and China.
Google has created the chip and applied it to multiple artificial intelligence projects, including Google assistants and Google translate, which recognize voice commands on Android phones and will Translate the language into another language.
Celebras CEO and founder Andrew Feldman said, "The growth in this area is amazing." He is a chip industry veteran who previously sold a company to chip giant AMD.
The new AI system relies on neural networks. These complex mathematical systems are loosely based on neural networks and can be learned by analyzing large amounts of data. For example, neural networks can learn to recognize cats by pinpointing patterns in thousands of cat photos.
This requires a special computing power. Today, most companies analyze data with the help of GPUs. These chips were originally designed to render images for games and other software, but they are also good at running mathematical operations that drive neural networks.
About six years ago, with tech giants such as Google, Facebook and Microsoft investing heavily in artificial intelligence, they began buying a large number of NVIDIA GPUs. In the year to the summer of 2016, NVIDIA's average sales in the United States was $143 million, more than double the previous year.
But these companies want more processing power. Google has developed a chip specifically for the Tensor Processing Unit (TPU), and several other chip makers are pursuing the same goal.
The AI system works with many chips. The trouble is that moving large blocks of data between chips can be slow and limits the speed at which the chip analyzes the information.
Subramanian Iyer, a professor of artificial intelligence chip design at UCLA, said, "Connecting all of these chips together will actually slow them down and consume a lot of energy."
Hardware manufacturers are exploring many different options. Some people are trying to broaden the pipeline between chips.
Cerebras, a company with only three years of history and more than $200 million in funding, has taken a novel approach. The idea is to keep all the data on a huge chip so that the system can run faster.
Using a large chip is very difficult. The computer chip is typically mounted on a circular wafer of approximately 12 inches in diameter. Each wafer typically contains approximately 100 chips.
Many of these chips, when removed from the wafer, are thrown away and are no longer needed. Etching the circuit into silicon is such a complicated process that manufacturers cannot eliminate defects. Some circuits do not work. This is one of the reasons why chip makers keep chips as small as possible - reducing the space for errors, so they don't have to abandon that much.
Cerebras said they have built a wafer-sized chip.
Others have tried this approach, most notably a startup called Trilogy, which was founded in 1980 by the famous IBM chip engineer Gene Amdahl. Despite receiving more than 230 million U.S. dollars in funding, I finally felt that this task was too difficult and went bankrupt five years later.
Cerebras plans to ship hardware to a few customers next month, Feldman said, which can train artificial intelligence systems 100 to 1000 times faster than existing hardware.
He and his engineers have divided their giant chips into smaller parts, or cores, because they know that some cores are not working.
The company's hardware has major problems. Feldman's statement about chip performance has not been confirmed, and he did not disclose the price of the chip.
The price will depend on the efficiency of the chip produced by Cerebras and its manufacturing partner TSMC.
Brad Paulsen, senior vice president of TSMC, said the process "needs more labor." A chip as large as this consumes a lot of energy, which means that keeping it cool will be difficult and expensive. In other words, building a chip is only part of the task.
"This is a challenge for us," Paulson said. "This is also true for them."
Cerebras plans to sell the chip as part of a larger machine that includes precision equipment for cooling silicon with frozen liquids. This is completely different from the way that large technology companies and government agencies are accustomed to collaborating.
"It's not that people haven't been able to make such chips," said Rakesh Kumar, a professor at the University of Illinois, who is also working on large chips for artificial intelligence. "The problem is that they didn't make a commercially viable." chip."
To this day, a new generation of stealth silicon company, Cerebras, has been seeking to make training a deep learning model, just as fast as buying toothpaste from Amazon. After nearly three years of quiet research and development, Cerebras today launched its new chip - this is an excellent chip. "Wafer-level engine" is 1.2 trillion transistors (most ever), 46,225 square millimeters (the largest ever), including 18 gigabytes of on-chip memory (the most current chip on the market) and 400,000 processors Core (estimated to be the most advanced).
It caused a lot of sensation at the Hot Chips conference at Stanford University. The Hot Chips conference is one of the major conferences held by the silicon industry for product introductions and roadmaps. There are different levels of “ooh” and “aah”. You can learn more about this chip from Fortune magazine's Tiernan Ray, or read the Cerebras white paper.
This afternoon, I sat down with the company's founder and CEO, Andrew Feldman, to discuss what his 173 engineers had quietly done on the street with a $112 million venture capital investment from Benchmark and other companies over the past few years.
Being big means the challenge
First, let's take a brief look at how chips made for mobile phones and computers are made. Wafer foundries like TSMC use standard-sized silicon wafers, which use light to etch transistors onto the wafer and then split them into individual chips. The wafer is circular and the chip is square, so subdividing the circle into a clear single chip array involves some basic geometric knowledge.
A major challenge in lithography is that errors can penetrate the manufacturing process, require a lot of testing to verify quality, and force the fab to throw away chips that are not performing well. The smaller and more compact the chip, the less likely a single chip will fail and the higher the fab's output. High returns equal high profits.
Cerebras proposed the idea of etching a series of individual chips on a single wafer, rather than just using the entire wafer itself as a huge chip. This allows all of these individual cores to be directly connected to each other—significantly speeding up the critical feedback loop for deep learning algorithms—but this creates and manages these chips at the expense of huge manufacturing and design challenges.
Cerebras' technical architecture and design is led by co-founder Sean Lie. Feldman and Lie co-founded a company called SeaMicro, which sold it to AMD for $334 million in 2012.
According to Feldman, the first challenge the team encountered was dealing with communication between "dashes." Although the Cerebras chip contains a complete wafer, today's lithography equipment still has to work like etching a single chip on a silicon wafer. As a result, the company had to invent new technologies that would allow these individual chips to communicate with each other across the wafer. In cooperation with TSMC, they not only invented new communication channels, but also had to write new software to handle chips with more than one trillion transistors.
The second challenge is yield. When a chip covers the entire silicon wafer, any defect in the wafer etch may cause the entire chip to be inoperable. This is a problem for the entire wafer technology for decades: according to the laws of physics, it is almost impossible to repeatedly etch a trillion transistors with perfect precision.
Cerebras solves this problem by adding extra cores to the chip, which are used as backups when there are errors in the chips near the core. Feldman explained to me: "You only need to hold an additional core of 1% and 1.5% of the total." Leaving an extra core makes the chip basically self-healing, bypassing lithography errors, making the entire wafer silicon chip feasible.
Entering the unknown field of chip design
The first two challenges—the line communication between chips and the processing yield—have plagued chip designers for decades. But they are all known issues, Feldman said, by reprocessing them with modern tools, they are actually easier to solve the expected problem.
However, he likened this challenge to climbing Mount Everest. "Just like the first people failed to climb Mount Everest, they said, 'Damn, the first part is really difficult.' Then the next group of people said: 'What counts. The last hundred yards is a problem. ”
In fact, according to Feldman, the most difficult challenge for Cerebras is the next three, because no other chip designer can use the line communication to find out what happened next.
The chip will become very hot during operation, but different materials will expand at different speeds. This means that the connector connecting the chip to the motherboard also needs to be thermally expanded at the same speed to avoid cracks between the two.
Feldman said: "How do you find a connector that can withstand this pressure? No one has ever done this before, so we need to invent a material. Therefore, we have a Ph.D. in materials science, we must invent a material that can resolve Some of these differences."
Once the chip is manufactured, it needs to be tested and packaged and shipped to original equipment manufacturers (OEMs), which are added by the original equipment manufacturer to the end customer (whether it is a data center or a consumer laptop). in. However, there is also a challenge: absolutely nothing on the market is designed to handle the entire wafer chip.
At this stage, no one has such a large printed circuit board, connectors, cooling plates, and no software or tools to debug them. Feldman explained. “So we designed the entire production process because no one has ever done this before. “Cerebras technology is not just the chip it sells, it also includes all the related mechanical equipment that is used to make and package these chips. of.
Cerebras' chips run at 15 kW, which is a huge power consumption for a single chip, albeit comparable to modern-sized AI clusters. All of these features also require cooling, and Cerebras must design a new way to provide both of these features for such a large chip.
It basically solves this problem by flipping the chip over, and Feldman calls it "using the z dimension." "Our idea is that, unlike traditional horizontal power and cooling devices on a chip, power and cooling devices are transmitted vertically at all points on the chip to ensure uniform access."
So this is the three challenges that the company has worked on over the past few years – thermal expansion, packaging and power/cooling.
From theory to reality
Cerebras has a demo chip (it's about the size of our head) and it is reported that it has begun delivering prototypes to customers. However, like all new chips, the biggest challenge is to expand production to meet customer needs.
This is a bit unusual for Cerebras. Because it incorporates so much computing power on a single wafer, customers don't have to buy dozens or hundreds of chips and stitch them together to create a compute cluster. Instead, they may only need a small amount of Cerebras chips to meet their deep learning needs. The company's next phase is to achieve scale and ensure stable delivery of its chips. The company encapsulates the chip as a complete system "device" that also includes its proprietary cooling technology.
More details about Cerebras technology are expected to be heard in the coming months, especially as the debate over future deep learning processing workflows continues to heat up.Tag: chip