FPGA-based logic function to realize the design of high-speed and large-capacity storage system
Published time: 2020-07-22 11:11:47
In measurement technology, a large number of digital images captured by high-speed digital cameras require high-speed, large-capacity image storage devices to quickly store them in real time. Using traditional magnetic tape to record data, its efficiency and safety are not high; static memory is convenient to read and write, but the stored data will be lost due to power failure, so it is not conducive to long-term storage of data. In recent years, the flash memory, which has come out in recent years, has gradually entered the storage system with its advantages of large storage capacity, small size, and high reliability.
1. Design principle
In the design, the camera outputs LVDS serial data through receiving level conversion and serial-to-parallel conversion to obtain a 10-channel × 8 bit parallel data stream. The maximum data stream rate is 66 MHz. Then according to the performance indicators of the camera MC1311, the storage capacity requirements and storage speed requirements of the Camera Link high-speed interface and the data storage system can be calculated, namely: the single frame data volume is 1280×1024×8 bits; the maximum data volume per second is 500×1280 ×1024×8 bit; The single-channel data rate of Camera Link high-speed interface is 65.536 MByte. If you need to store 60 seconds of video data continuously, you need 40 G Byte memory capacity. The storage capacity of 100 GByte can store up to about 2.5 minutes of video data. Figure 1 shows the system structure diagram.
Figure 1 System structure diagram
2. Core device selection
The memory chip in the design adopts the NAND FLASH chip K9 K8G08UOI of SAMSUNG Company. Its external interface has a maximum speed of 40 MHz and an interface width of 8 bits. Each chip has 8192 blocks, each block has 64 pages, and the size of each page is (2K+64) Bytes, of which 64 Bytes is the free area and the storage capacity is 8Gbit. It reads and writes in units of pages and erases in units of blocks. The control core FPGA adopts the EP2S30F672I4 of ALTERA's STRATIX II series. It has a wealth of flip-flops and LUTs, which is very suitable for the design of complex sequential logic. It has a built-in storage RAM of 1.3 Mbit, which can cache a certain amount of data.
3. System design
The data storage of NAND Flash adopts the page programming mode to write, the page write sequence of K9K8G08UOI is shown as in Fig. 2.
Figure 2 Page write timing of K9K8G08UOI
According to the timing shown in Figure 2, the data storage rate of the single-chip K9K8G08UOI can be estimated. The minimum tWC is 25 ns, tADL is 75 ns, tWB is 100 ns, and tPROG is 200 μs. Therefore, the time required to write a page is approximately: 200μs+100 ns+(2048+64) ×25ns+75 ns=252.975μs. Since the data of one page is (2K+64)B, the data per Byte of K9K8G08UOI on a single chip The storage rate is: 1/(252.975μs/(2048+64)B)=8.4152 MHz. It can be seen that the write speed of a single chip K9K8G08UOI cannot meet the interface requirements of Camera and Link, so multiple chips are needed for data bit expansion. In order to solve the problem of slow NAND Hash access data, 10 adjacent 8 bit video data can be expanded to 80 bit in the FPGA for access. The storage speed requirement per Byte of NAND Flash can be reduced to 6.6 MHz, so it can meet the storage speed requirement of a single-chip K9K8G08UOI. Every 10 pieces of K9K8G08UOI in the system form a 1 G×80 bit Flash module, a total of 10 Flash modules. Each Hash module shares a set of control lines, and the data lines are respectively connected to the FPGA. The composition principle of a single Flash module is shown in Figure 3.
Figure 3 Schematic diagram of single Flash module interface circuit
The connections of Flash1～10 in the circuit are the same, that is, the data bus is independently connected to the FPGA common control bus (with drive). It can be seen from Figure 3 that if the single-chip Flash storage rate needs to be further reduced, the number of Flash memory chips can be further expanded. Due to the limited number of I/Os of a single FPGA, 10 Flash modules can be arranged on 5 expansion memory boards, corresponding to one data (66 MHz×8 bit) obtained by Camera Link serial-parallel conversion. Each memory board is connected to an FPGA and two Flash modules and then connected to the FPGA on the control circuit board for data transmission. Five of the expansion memory boards use the same structure.
4. FPGA logic design
The basic operation of FLASH is divided into two stages: loading time and programming time. The time bottleneck of writing is not the loading time, but the programming time of FLASH. In order to resolve the contradiction between high-speed data and low-speed FLASH, data stream serial-to-parallel conversion and parallel processing of multiple modules can be used. 10 dual-port RAMs can be built inside the FPGA to cache data. Each dual-port RAM corresponds to a FLASH. The data from the camera is first cached in the RAM and then written to the FLASH. Data can be written to RAM by pipeline operation. Data is written to the first dual-port RAM, and then to the second RAM. When the tenth RAM is written, the data of ten RAM buffers are written to the corresponding FLASH. in. The FLASH operation is carried out at the maximum speed of 40MB, and the loading time is 51.2μs. After that, the programming time is entered, and the next data continues to be cached in the first RAM. The camera clock is 66 MHz, and the depth of each RAM is 2048 Bytes, then the time to write 10 RAMs is 310μs, that is, the FALSH programming time can reach 310-51.2=258.8μs, which can meet the typical FLASH programming time of 200μs. The FLASH pipeline operation is shown in Figure 4.
Figure 4 FLASH pipeline operation diagram
5. FLASH bad area management
Perform dynamic management on the bad area of FLASH, and open up 1 K×8 bit bad area address storage in the FPGA of each storage circuit board. The bad area addresses of all storage circuit boards are the same. Check whether the current area is a bad area before the FLASH write operation, if it is a bad area, skip this area and enter the next area.
The bad area can be detected by reading and verifying after writing a specific number. The storage circuit board writes 8 bit data in the entire area according to the command. The write sequence is performed at the highest rate designed by the system. Then read them in order and verify them. If there is a data error, register the compressed area as a bad area. Figure 5 shows the workflow of bad block detection management.
Figure 5 Workflow of bad block detection management
This article uses a combination of multi-stage pipeline and parallel processing, while using the cache inside the FPGA to make multiple FLASH memories work in parallel, thereby greatly improving the storage rate. 100 pieces of FLASH memory can work at the same time to meet the speed requirement of 660 MB/S. After testing, the system can reliably store digital images and shield bad areas.Tag: FPGA