1. Materials and Methods
1.1 Test Materials
The experimental research subjects are normal capsules and capsule shells severely exceeding the limit of heavy metal chromium. All types of samples should be stored in sealed containers. Based on current information, no correlation analysis has been found between color and heavy metals, so the influence of color on spectra is not currently considered.
1.2 Hyperspectral imaging system
Hyperspectral analysis can be conducted using the FS13 product from Hangzhou Caipu Technology Co., Ltd. The spectral range is between 400-1000nm, with a wavelength resolution better than 2.5nm and up to 1200 spectral channels. The acquisition speed can reach up to 128FPS across the entire spectrum, with a maximum of 3300Hz after band selection (supporting multi region band selection). To avoid interference from surrounding light sources, the visual system is placed in a sealed cabinet with a black paint surface.
2. Principal Component Analysis of Hyperspectral Images
Due to the fact that hyperspectral data consists of multiple band images, each image can be regarded as a feature. If dimensionality reduction is applied to hyperspectral data, it will cause the original data to change to a new coordinate system, maximizing the difference in image data and resulting in significantly different results from the original image. This technology is highly effective in enhancing information content, isolating noise, and reducing data dimensionality. The first four principal components obtained after PCA dimensionality reduction of hyperspectral images are shown in Figure 1.
After PCA transformation, the hyperspectral data shows that although the image of the first principal component contains the most information, followed by the second principal component, the contrast between the two capsules is not obvious. On the contrary, the third principal component can better highlight the two different capsules. But the characteristics exhibited by this method may be based on the results obtained from different colors of the capsules, as only the capsule cap is more prominent in PC3 (principal component 3). Therefore, the PCA method can be used as a reference for detecting "toxic capsules" and normal capsules. To comprehensively analyze hyperspectral data, spectral data also needs to be considered, which is the advantage of hyperspectral analysis.
3. Spectral analysis
The advantage of hyperspectral imaging is that it not only contains image information, but also spectral information. To obtain spectral information, first select regions of interest for each sample, and each region of interest has its own spectral response curve. Due to the different colors between the capsule cap and the capsule body, in order to eliminate the influence of color on the results, two regions of interest are selected for each capsule (one region of interest is selected on the capsule cap and one region of interest is selected on the capsule body). The regions of interest can be randomly selected on the hyperspectral image of the capsule, and the number of pixels in each region ranges from 2 to 6. The spectral data of the regions of interest is used to calculate the average value of all pixels in the region. The spectral curves of four different regions (including the capsule body and capsule cap of normal capsules and "toxic capsules") are shown in Figure 2.
From Figure 2, it can be seen that the spectral curves of the capsule cap of the "toxic capsule" and the capsule body of the normal capsule are relatively disordered. The spectral curves of the capsule cap of the normal capsule and the "toxic capsule" are significantly different, with a cross at around 620 nm; The spectral curves of the capsule bodies of the two types of capsules intersect at around 550 nm and 700 nm. However, it is difficult to correctly distinguish between "toxic capsules" and normal capsules solely through spectral curves. Therefore, a comprehensive analysis of the spectral range is necessary to identify the most effective spectral feature data that can distinguish between the two, and ultimately perform discriminant analysis.
Due to the large amount of data on the spectral features of each sample (which is consistent with the number of hyperspectral images), effective features within the spectrum must also be processed through corresponding feature extraction algorithms. In this paper, PLS is used to reduce the dimensionality of the data, and the contribution rate of each PLS operator can be obtained. The ranking from high to low is shown in Table 1. The simplest method to determine the number of PLS operators, also known as latent variables, is the root mean square error representation. The main methods include cross validation of root mean square error and correction of root mean square error. LV is a combined feature of features after PLS dimensionality reduction. Usually, the contribution rate of LV to the overall feature decreases continuously. When the accuracy obtained by using several LVs as input features meets the requirements and satisfies certain generalizability, the number of selected LVs is taken as the optimal feature.
When modeling, 60% of the samples are used as the training set, and the remaining 40% are used as the testing set. The relationship between the RMSECV and RMSEC values ¹ 5 of "toxic capsules" and normal capsules and the number of LV selected is shown in Figure 3. From Figure 3, it can be seen that the RMSECV and RMSEC values obtained from normal capsules and "toxic capsules" show a significant downward trend when the number of LVs is between 1 and 6; When the number of LVs is greater than 6, the values of the two change very slowly. From the curve of this graph, it can be considered that selecting 6 LVs as input features is more appropriate. While obtaining the contribution rate of LV, RMSECV, and RMSEC, it is also necessary to consider the correct classification accuracy and correlation coefficient ² in the test set, as shown in Table 1.
When taking 4 LVs as input features, the correlation coefficient r ² between cross validation (CV) and prediction (Pr) can reach 0.9 or above, which is significantly higher than selecting 3 LVs, and the classification error rate is 0. At the same time, all indicator parameters did not significantly increase, therefore, selecting 4 LVs in this article is appropriate.
4. Conclusion
In the hyperspectral data of 450-900 nm, the spectral data of normal capsules and "toxic capsules" were obtained by selecting the region of interest. They were first normalized and then subjected to dimensionality reduction and discriminant analysis using PLS-DA. When four PLS operators were selected as input features, the recognition rate of normal capsules and "toxic capsules" reached 100%, and the specificity and sensitivity were also 100%; It can be inferred that the PLS-DA discrimination method can be used to distinguish between normal capsules and "toxic capsules". The use of hyperspectral imaging technology for detecting "toxic capsules" can greatly reduce the complexity of traditional methods.
In addition, to improve credibility, it is necessary to use a wider spectral range for sample detection, such as in the fluorescence or ultraviolet bands. While qualitatively analyzing "toxic capsules", quantitative research is also needed. When quantifying, it is possible to consider making gelatin templates with different chromium contents, finding a correlation model between the template chromium content and spectral data, and using this model to predict the heavy metal chromium content of unknown "toxic capsules". Due to the subsequent impact of the "toxic capsule" incident, it is difficult to find samples, but in order to improve the effectiveness of detection, it is necessary to conduct experiments using capsule samples with various chromium contents.