Memory-based MemSeg to Detect Surface Defects in Industrial Products
Update Time: 2022-06-01 15:58:57
In a semi-supervised framework, researchers have proposed an end-to-end memory-based segmentation network (MemSeg) to detect surface defects of industrial products. Considering the slight intra-class variation of products in the same production line, MemSeg introduces artificially simulated anomaly samples and memory samples to assist the network's learning from the perspective of differences and commonalities. In the training phase, MemSeg explicitly learns the potential differences between normal and simulated abnormal images to obtain a robust classification hyperplane. Meanwhile, inspired by the human memory mechanism, MemSeg uses a memory pool to store the general patterns of standard samples.
By comparing the similarities and differences between the input samples and the memory samples in the memory pool, the abnormal regions are effectively guessed; in the inference stage, MemSeg directly determines the abnormal areas of the input images in an end-to-end manner. Through experimental validation, MemSeg achieves state-of-the-art (SOTA) performance on the MVTec AD dataset, with AUC scores of 99.56% and 98.84% at the image level and pixel level, respectively. In addition, MemSeg benefits from an end-to-end, straightforward network architecture, which also provides significant advantages in inference speed and better meets the real-time requirements of industrial scenarios.
Product surface anomaly detection in industrial scenarios is critical to developing industrial intelligence. Surface defect detection is the problem of locating abnormal areas in images, such as scratches and stains. However, in practical applications, anomaly detection by traditional supervised learning is more difficult due to the low probability of weird samples and the diverse forms of anomalies. Therefore, surface defect detection methods based on semi-supervised techniques have more advantages in practical applications, where only standard samples are required in the training phase. Specifically, from a different perspective, similar to self-supervised learning, MemSeg introduces artificially simulated anomalies in the training phase, allowing the model to consciously distinguish between normal and abnormal without requiring the simulated anomalies to be consistent with those in the actual scene. MemSeg completes model training using both normal and simulated abnormal images and directly determines the abnormal regions of the input images without any auxiliary tasks in the inference phase. The figure below shows the data usage during the training and inference phases.
Meanwhile, from the perspective of commonality, MemSeg introduces a memory pool to record the general pattern of normal samples. During the training and inference phases of the model, the similarities and differences between the input samples and the memory samples in the memory pool are compared to provide more effective information for the localization of abnormal regions. In addition, to coordinate the information from the memory pool and the input images more effectively, MemSeg introduces a multi-scale feature fusion module and a novel spatial attention module, which greatly improves the model's performance.
3. Analysis of the new framework
The above figure shows the overall framework of MemSeg. MemSeg is based on U-Net architecture and uses pre-trained ResNet18 as the encoder. memSeg introduces simulated anomalous samples and memory modules from the perspective of differences and commonalities to assist the model learning more directionally to accomplish the semi-supervised surface defect task end-to-end. At the same time, to fully integrate the memory information with the high-level features of the input image, MemSeg introduces the Multi-Scale Feature Fusion Module (MSFF Module) and the novel Spatial Attention Module, which greatly improves the model accuracy anomaly localization. Anomaly Simulation StrategyIn industrial scenarios, anomalies appear in multiple forms, and it is impossible to cover them all when performing data collection. It is impossible to cover them all when performing data collection, limiting the use of supervised learning methods for modelling. However, in a semi-supervised framework, using only normal samples without comparing them with non-normal samples is not sufficient for the model to understand what is a normal pattern. In today's sharing, inspired by DRAEM, the researcher then designs a more effective strategy to simulate abnormal samples and introduce them in the training process to accomplish self-supervised learning.MemSeg summarizes the patterns of normal samples by comparing non-normal patterns to mitigate the drawbacks of semi-supervised learning. The figure below shows that the proposed anomaly simulation strategy is divided into three main steps.
The two-dimensional Berlin noise P is binarized to generate Mp, and the normal graph I is binarized to generate MI. The two are combined to generate M. This processing makes the generated anomaly graph similar to the real anomaly graph.
Fusing the normal map and M to make it close to the real anomaly map is done using the formula：
Invert M (black to white, white to black), do the element product with I, and do the element sum to generate IA.
By the above anomaly simulation strategy, the simulated anomaly samples are obtained from both texture and structure perspectives, and most of the anomaly regions are generated on the target foreground, which maximizes the similarity between simulated anomaly samples and real anomaly samples.
N normal graphs are selected via ResNet as the stored information. The parameters of block1/2/3 of ResNet are frozen to ensure that the high-dimensional features are unified with the memory information, and the rest can still be trained. In the training and inference phase, the following equation compares the distances:
Each of the N stored information includes three feature maps generated by block1/2/3. The three input feature maps are compared with those in N to find the three feature maps in N with the smallest distance. The input three feature maps are connected to the three feature maps with the smallest distance to form the CI. after the multi-scale feature fusion block. They are connected to the decoder via the U-Net jump.
Involving spatial attention blocks, weights are added to the three feature maps to reduce feature redundancy by the following equation:
Considering that it is a concatenation of two kinds of information in the channel dimension and comes from different locations of the encoder with different semantic and visual information, feature fusion is performed using the channel attention CA-Block and multi-scale strategy.
L1 loss and focal loss. L1 retains more edge information than L2, focal alleviates the sample imbalance problem and allows the model to focus on the segmentation rather than the samples.
4. Experimental Visualization
FPGA RT ProASIC3 Family 600K Gates 781.2 >
IC MPU 32Bit 25MHz 196CQFP >
Power Management, AEC-Q100, 2.7V to 3.6V >
The SST26VF064B Serial Quad I/O (SQI) fl >
Development Boards & Kits - AVR ATMEGA32 >
32bit microAptiv UC 32 bit Microcontrol >
IC TXRX ETHERNET 48-QFN >
IC SW DISTRIBUTION SOT23-6L >
LDO Regulator Pos 3.3V 0.3A Automotive 4 >
8 Bit MCU, AVR ATmega Family ATmega16X S >
EEPROM Serial-2Wire 32K-bit 4K x 8 1.8V/ >
Ethernet ICs 10GbE XAUI or XGMII to XFI >
IC REG LDO 3.3V 0.3A TSOT23-5 >
Ethernet Switch 5Port 10Mbps/100Mbps 128 >