Application of Machine Learning Methods in Predicting Chemical Solubility

Authors

  • Shengye Shi

DOI:

https://doi.org/10.62051/b1df7k17

Keywords:

Solubility; machine learning; artificial intelligence.

Abstract

Solubility is an important property in chemistry, especially in drug development, as it affects how well a drug can work in the human body. Traditionally, scientists predicted solubility by studying the structure of molecules, but these methods are not always accurate. In recent years, machine learning has become popular in many fields, and this study explores how it can help improve solubility prediction. A dataset from Kaggle with four features including MolLogP, MolWt, NumRotatableBonds, and AromaticProportion was used. The study tested three models: Random Forest, Decision Tree, and Linear Regression. The data was split into training and test sets. Among the models, Random Forest performed best, with an accuracy of 0.90 and AUC of 0.89. It also showed that MolLogP was the most important factor affecting solubility. Visual tools like heatmaps and boxplots helped explain the results clearly. This research shows that machine learning is a useful tool in chemistry and can support better drug design in the future.

Downloads

Download data is not yet available.

References

[1] Bhattachar S N, Deschenes L A, Wesley J A. Solubility: it's not just for physical chemists. Drug Discovery Today, 2006, 11 (21–22): 1012–1018.

[2] Chai J, Chu F, Chow T W, Liang B M. Chemical solubility and flexural strength of zirconia-based ceramics. International Journal of Prosthodontics, 2007, 20 (6).

[3] Hunt E B. Artificial Intelligence. Academic Press, 2014.

[4] Jiang Y, Li X, Luo H, Yin S, Kaynak O. Quo vadis artificial intelligence?. Discover Artificial Intelligence, 2022, 2 (1): 4.

[5] Fetzer J H. What is artificial intelligence?. In: Artificial Intelligence: Its Scope and Limits. Dordrecht: Springer Netherlands, 1990: 3–27.

[6] Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 2001, 23 (1): 89–109.

[7] Rajula H S, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina, 2020, 56 (9): 455.

[8] Van Calster B, Wynants L. Machine learning in medicine. New England Journal of Medicine, 2019, 380 (26): 2588–2589.

[9] Kaggle. Delaney Solubility, Stacking Importances. 2023. https://www.kaggle.com/code/philippebillet/delaney-solubility-stacking-importances/notebook.

[10] Rigatti S J. Random forest. Journal of Insurance Medicine, 2017, 47 (1): 31–39.

Published

10-07-2025

How to Cite

Shi, S. (2025) “Application of Machine Learning Methods in Predicting Chemical Solubility”, Transactions on Computer Science and Intelligent Systems Research, 9, pp. 639–643. doi:10.62051/b1df7k17.