Isolation Forest Anomaly Detection Algorithm Based On Multi-level Sub-subspace Partition

Authors

  • Shangfei Wang

DOI:

https://doi.org/10.62051/ijcsit.v4n2.20

Keywords:

Anomaly detection, Isolation forest, Random forest

Abstract

In the research of credit loan fraud detection, the isolation forest algorithm has attracted much attention because of its ability to efficiently process large-scale data sets. However, when facing high-dimensional data, the performance of the isolation forest algorithm is easily affected, resulting in deviation of the detection results. In order to solve the above problems, this paper proposes an isolation forest anomaly detection algorithm based on multi-level sub-subspace division. Firstly, the random forest algorithm is used to evaluate the importance of each feature, and the data is divided into different subspaces according to the importance of each feature, and the corresponding weight is assigned to each subspace. Then, the isolation forest algorithm is applied in each subspace for anomaly detection, and the anomaly score of each subspace is obtained. Finally, the anomaly score and weight of each subspace were combined to obtain the final anomaly detection score. In order to evaluate the effectiveness of the algorithm, the proposed algorithm was compared with other four algorithms on the credit loan fraud data set. The results show that the AUC index, accuracy, recall rate and F1 score of the proposed algorithm are higher than those of the comparison algorithms, showing high effectiveness.

Downloads

Download data is not yet available.

References

[1] Gorle, Venkata Lakshmi Narayana, and Suvasini Panigrahi. "A semi-supervised Anti-Fraud model based on integrated XGBoost and BiGRU with self-attention network: an application to internet loan fraud detection." Multimedia Tools and Applications 83.19 (2024): 56939-56964.

[2] Ting, Kai Ming, et al. "Isolation distributional kernel: A new tool for point and group anomaly detections." IEEE Transactions on Knowledge and Data Engineering 35.3 (2021): 2697-2710.

[3] Xu, Hongzuo, et al. "Deep isolation forest for anomaly detection." IEEE Transactions on Knowledge and Data Engineering 35.12 (2023): 12591-12604.

[4] Barbariol, Tommaso, et al. "A review of tree-based approaches for anomaly detection." Control Charts and Machine Learning for Anomaly Detection in Manufacturing (2022): 149-185.

[5] Lesouple, Julien, et al. "Generalized isolation forest for anomaly detection." Pattern Recognition Letters 149 (2021): 109-119.

[6] Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended isolation forest." IEEE transactions on knowledge and data engineering 33.4 (2019): 1479-1489.

[7] Fernández, Ángela, Juan Bella, and José R. Dorronsoro. "Supervised outlier detection for classification and regression." Neurocomputing 486 (2022): 77-92.

[8] Carletti, Mattia, Matteo Terzi, and Gian Antonio Susto. "Interpretable anomaly detection with diffi: Depth-based feature importance of isolation forest." Engineering Applications of Artificial Intelligence 119 (2023): 105730.

[9] Pang, Guansong, et al. "Learning representations of ultrahigh-dimensional data for random distance-based outlier detection." Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.

[10] Ahn, Jeongyoun, Myung Hee Lee, and Jung Ae Lee. "Distance-based outlier detection for high dimension, low sample size data." Journal of Applied Statistics 46.1 (2019): 13-29.

Downloads

Published

10-10-2024

Issue

Section

Articles

How to Cite

Wang, S. (2024). Isolation Forest Anomaly Detection Algorithm Based On Multi-level Sub-subspace Partition. International Journal of Computer Science and Information Technology, 4(2), 149-159. https://doi.org/10.62051/ijcsit.v4n2.20