Jotrin Electronics
Description Quantity Total (USD) Operation
Shopping cart products
Shopping cart products : 0
Home > Storage technology > Storage and GPU performance have both grown exponentially, but why is

Storage and GPU performance have both grown exponentially, but why is IO performance not improving?

Update Time: 2021-07-12 11:17:44

Storage and GPU performance have both grown exponentially, but why is IO performance not improving?

Along with the increasing demand for HPC, autonomous driving, deep learning and VR/AR, IO performance is gradually becoming a bottleneck, especially the read/write between GPU and storage. Processor speed has evolved from KHz to GHz, VRAM has evolved from KB to GB, and IO speed has evolved from KB/s to GB/s, yet the dramatic improvement in GB/s still looks like MB/s from an intuitive perspective.

For example, in wired-connected VR applications, graphics need to be processed by the computer and then displayed on the VR screen via wired transmission, which raises issues such as high latency and long read times. This makes people start to wonder whether we are really applying hardware performance effectively when CPU, GPU and storage have all been innovated and replaced. For this reason, both Microsoft and Nvidia have proposed the concept of direct storage to improve the status of IO.

Microsoft: DirectStorage on Windows

Microsoft highlighted DirectStorage technology at its recent Windows 11 launch event, a DirectX API originally designed for consoles, and now Microsoft is bringing this technology to PCs as well.

With the current evolution of NVMe SSD and PCIe technology, storage bandwidth far exceeds that of older hard drive storage technologies, with speeds of 10MB per second in the past reaching several gigabytes per second. But the graphics workload on the PC is also evolving, and the increased volume of data is putting higher demands on reads. In the past, reading large amounts of data required only a small number of IO requests, but today's graphics rendering divides resources such as materials into small chunks, loading only the parts needed when requested by the scene, so that while it improves efficiency, it introduces more IO requests.

Storage and GPU performance .png

                                                                                         The current GPU resource reading process / Microsoft

The current storage APIs are not optimized for large numbers of IO requests, thus slowing down NVMe and making read and write bottlenecks more and more obvious. Even with high-end PC hardware, the storage bandwidth advantage cannot be saturated. In addition, the data often needs to be compressed and transferred to the next stage, and after it is passed into memory, it has to be partially decompressed by the CPU and finally passed into the GPU memory, which results in efficiency loss at each node.

DirectStorage uses a new path, where the data read from storage is passed to memory and then directly to GPU memory. The GPU is much faster than the CPU for decompression of this data, so it greatly optimizes IO performance.

NVIDIA: RTX IO and Magnum IO GPUDirect Storage

NVIDIA has introduced RTX IO on RTX 30 series graphics cards for the consumer market to improve read speeds in gaming scenarios. NVIDIA says RTX IO will combine with Microsoft's DirectStorage to increase IO performance by a factor of 100 compared to storage APIs under traditional hard drives. The work that used to require dozens of CPU cores is all handed over to the RTX GPU to handle.

It's worth noting that while NVIDIA's RTX IO also uses Microsoft's DirectStorage, the technology does not transfer data to memory, but moves directly from the SSD to the GPU. a Microsoft graphics developer said at the GSL 2021 conference that the goal of future DirectStorage is also to bypass system memory.

Storage and GPU performance -2.png

                                                                                                                       GDS Technology / NVIDIA

In addition to the consumer market, NVIDIA has also launched a corresponding direct storage technology in the HPC market, Magnum IO GPUDirect Storage (GDS). GDS technology is also a technology that bypasses the CPU, unlike consumer GPUs, HPC scenarios often use multiple GPUs, and in this way are more affected by IO latency and CPU. GDS establishes a direct data channel between local storage and GPU memory, eliminating the latency and read/write bottlenecks introduced by the CPU.

Storage and GPU performance -3.png

                                                                                               GDS vs. CPU transfer to GPU read performance / Nvidia

With the application of GDS, bandwidth gains of up to 1.5x are achieved, as well as a 2.8x increase in CPU utilization compared to the traditional CPU rebound buffered data path.

NVIDIA has now added this technology to its HGX AI supercomputing, three companies, DDN, VAST and WEKA, have started mass production of related products, and five vendors, including IBM and Micron, are actively introducing this technology. Vendors such as Samsung, Armor Man, Western Digital and Dell have also started early integration and certification programs for GDS.


Direct storage technology further amplifies the advantages of GPU vendors and storage vendors. The current HPC market is promising, and NVIDIA's profitability in related businesses has shown it business opportunities. Not only GPUs, but NVIDIA's Grace CPUs with Arm architecture also introduce data transfer improvement solutions like NVLink. With such performance improvements, NVIDIA's GPUs are likely to become the first choice for HPC applications even if the storage solution is different.


Previous: Why LoRa will be the key technology for IoT

Next: How does Verilog implement low-power designs?



Account Center


Live Chat