Validity of data estimation methods in large-scale insurance datasets


  • Jinrong Zhao
  • Daibin Lan
  • Tengfei Meng
  • Wanting Kan



Large-scale datasets; Insurance industry; Risk assessment; Data pre-processing; VIKOR methodology.


In the insurance industry, accurately processing and analysing large-scale datasets is critical for risk assessment and decision-making. In this study, a comprehensive database was built by collecting and integrating weather-related data, insurance industry data, and attribute-specific data to support in-depth analyses of the impact of extreme weather events. In the data preprocessing stage, we adopted mean-filling and plurality-filling methods to deal with missing data, while applying the 3σ rule to deal with outliers in the data to ensure the completeness and consistency of the dataset. In addition, we used multi-criteria decision analysis methods (VIKOR) and hierarchical clustering models (BRICH algorithm). These advanced analysis techniques not only optimise the data processing process, but also improve the reliability and accuracy of the analysis results. Through these methods, we are able to effectively identify and classify different risk levels, which in turn provides a scientific basis for the pricing of insurance products and risk management. This study shows that advanced data estimation methods can provide effective and accurate support in processing large-scale insurance datasets, which is of great practical significance to the development of the modern insurance industry.


Download data is not yet available.


Cauchois, M., Gupta, S., Ali, A., & Duchi, J. C. (2024). Robust validation: Confident predictions even when distributions shift. Journal of the American Statistical Association, 1-66.

Liu, H., Zhu, Z., Iwamoto, N., Peng, Y., Li, Z., Zhou, Y., ... & Zheng, B. (2022, October). Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis. In European conference on computer vision (pp. 612-630). Cham: Springer Nature Switzerland.

Jain, S., Siramshetty, V. B., Alves, V. M., Muratov, E. N., Kleinstreuer, N., Tropsha, A., ... & Zakharov, A. V. (2021). Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. Journal of chemical information and modeling, 61(2), 653-663.

McCarty, D. A., Kim, H. W., & Lee, H. K. (2020). Evaluation of light gradient boosted machine learning technique in large scale land use and land cover classification. Environments, 7(10), 84.

Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard, N., Rossi, V., ... & Pélissier, R. (2020). Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nature communications, 11(1), 4540.

Gu, J., Meng, X., Lu, G., Hou, L., Minzhe, N., Liang, X., ... & Xu, H. (2022). Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark. Advances in Neural Information Processing Systems, 35, 26418-26431.

Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., & Jiang, S. (2022). Logodet-3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(1), 1-19.

Liu, C., Wu, D., Li, Y., & Du, Y. (2021). Large-scale pavement roughness measurements with vehicle crowdsourced data using semi-supervised learning. Transportation Research Part C: Emerging Technologies, 125, 103048.

Rasmy, L., Xiang, Y., Xie, Z., Tao, C., & Zhi, D. (2021). Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1), 86.

Chen, L., Lu, Y., Sheng, Q., Ye, Y., Wang, R., & Liu, Y. (2020). Estimating pedestrian volume using Street View images: A large-scale validation test. Computers, Environment and Urban Systems, 81, 101481.




How to Cite

“Validity of data estimation methods in large-scale insurance datasets” (2024) Transactions on Computer Science and Intelligent Systems Research, 4, pp. 105–111. doi:10.62051/vdknqp32.

Similar Articles

1-10 of 66

You may also start an advanced similarity search for this article.