Abstract:
Based on machine learning algorithms, a random forest regression model was constructed using terahertz time-domain spectroscopy data from coal samples to efficiently and accurately predict volatile matter content. Principal component analysis (PCA) and its variants, including kernel PCA (KPCA), sequential PCA (SPCA), and incremental PCA (IPCA), were employed to optimize dimensionality reduction and feature selection of the spectral data. Subsequently, four regression models were developed using random forest algorithm: RF-PCA, RF-KPCA, RF-SPCA, and RF-IPCA. The models' accuracy and precision were ensured through ten-fold cross-validation and hyperparameter optimization. Among them, the RF-SPCA model demonstrated superior predictive accuracy with an
R2 of 0.985, RMSE of 1.949, and MAE of 0.913. Further analysis of learning curves indicated model stability with increasing training samples, while residual plots showed uniform distribution of prediction errors around zero, further validating the model's excellent generalization performance. The research provides an effective analytical approach for intelligent coal mine analysis.