Machine Learning-Assisted Discovery of Novel Anti-HIV Drug Candidates: An Analysis Using Molecular Datasets

Authors

  • Zeyu Gou

DOI:

https://doi.org/10.62051/cfngys46

Keywords:

Machine Learning, Anti-HIV Drug Prediction, Extended-Connectivity Fingerprints (ECFP), Simplified Molecular-Input Line-Entry System (SMILES), DTP Antiviral Screen Databases, Principal Component Analysis, Support Vector Machine, Ensemble Models.

Abstract

Human Immunodeficiency Virus (HIV) serves as a crisis of global public health, necessitating new anti-HIV agents due to the virus's rapid mutation and subsequent drug resistance. While current Combination Antiretroviral Therapy (CART) has helped control the infection and mortality rates, traditional drug development approaches are costly and inefficient. This study aims to address this issue by applying machine learning algorithms for lead compound discovery using the extensive, quality-assured DTP Antiviral Screen Databases. Three molecular datasets—Extended-Connectivity Fingerprints (ECFP), Simplified Molecular-Input Line-Entry System (SMILES), and 2D molecular IMAGES—were processed using Principal Component Analysis (PCA), train-test splitting, and dataset balancing. Six machine learning algorithms were employed, including linear and nonlinear models, optimized through 5-fold cross-validation. The Area Under the Receiver Operating Characteristic (AUROC) curve was utilized to evaluate the models' performance, as well as the macro averaged precision, averaged recall, averaged F1 score, and balanced accuracy metrics. The ensemble models were constructed from the top-performing individual models. The best individual model, a SVM model trained on the ECFP dataset, achieved performance metrics of 0.78 on macro-averaged precision; 0.68 on macro-averaged recall; 0.72 on macro-averaged F1 score; 0.71 on balanced accuracy; and 0.75 on AUROC when evaluated on the testing data. The best ensemble model, fused with SVM, kNN, and logistic regression trained on the ECFP dataset, achieved performance metrics of 0.82 on macro-averaged precision; 0.67 on macro-averaged recall; 0.72 on macro-averaged F1 score; and 0.70 on balanced accuracy when evaluated on the testing data. The models were then applied to a Pubmed-extracted drug dataset, identifying several promising anti-HIV drug candidates, fulfilling the study's objective to improve the efficiency and success rate of new anti-HIV drug screening and discovery. In summary, this research demonstrates the transformative potential of machine learning in accelerating and optimizing the drug discovery process for HIV treatment.

Downloads

Download data is not yet available.

References

Vergis, E. N., & Mellors, J. W. (2000). Natural history of HIV-1 infection. Infectious disease clinics of North America, 14(4), 809–vi. https://doi.org/10.1016/s0891-5520(05)70135-5 10. Menéndez-Arias, L., & Delgado, R. (2022). Update and latest advances in antiretroviral therapy. Trendsinpharmacologicalsciences, 43(1),16–29. https://doi.org/10.1016/j.tips.2021.10.004.

World Health Organization. HIV and AIDS. https://www.who.int/data/gho/data/themes/hiv-aids. Published 13 July 2023.

UNAIDS. The Path That Ends AIDES. https://unaids.org/en. Published 2023.

Menéndez-Arias, L., & Delgado, R. (2022). Update and latest advances in antiretroviral therapy. Trendsinpharmacologicalsciences, 43(1),16–29. https://doi.org/10.1016/j.tips.2021.10.004

Tompa, D. R., Immanuel, A., Srikanth, S., & Kadhirvel, S. (2021). Trends and strategies to combat viral infections: A review on FDA approved antiviral drugs. International journal of biological macromolecules, 172, 524–541. https://doi.org/10.1016/j.ijbiomac.2021.01.076

Arribas J. R. (2004). The rise and fall of triple nucleoside reverse transcriptase inhibitor (NRTI) regimens. TheJournalofantimicrobialchemotherapy, 54(3),587–592. https://doi.org/10.1093/jac/dkh384

Li, G., Wang, Y., & De Clercq, E. (2022). Approved HIV reverse transcriptase inhibitors in the pastdecade. ActapharmaceuticaSinica.B, 12(4),1567–1590. https://doi.org/10.1016/j.apsb.2021.11.009

Scarsi, K. K., Havens, J. P., Podany, A. T., Avedissian, S. N., & Fletcher, C. V. (2020). HIV-1 Integrase Inhibitors: A Comparative Review of Efficacy and Safety. Drugs, 80(16), 1649–1676. https://doi.org/10.1007/s40265-020-01379-9

Walmsley S. (2007). Protease inhibitor-based regimens for HIV therapy: safety and efficacy. Journal of acquired immune deficiency syndromes (1999), 45 Suppl 1, S5–S31. https://doi.org/10.1097/QAI.0b013e3180600709

Xiao, T., Cai, Y., & Chen, B. (2021). HIV-1 Entry and Membrane Fusion Inhibitors. Viruses, 13(5), 735. https://doi.org/10.3390/v13050735

Domingo, P., & Vidal, F. (2011). Combination antiretroviral therapy. Expert opinion on pharmacotherapy, 12(7), 995–998. https://doi.org/10.1517/14656566.2011.567001

Nomaguchi, M., Doi, N., Koma, T., & Adachi, A. (2018). HIV-1 mutates to adapt in fluxing environments. Microbesandinfection, 20(9-10),610–614. https://doi.org/10.1016/j.micinf.2017.08.003.

Bandera, A., Gori, A., Clerici, M., & Sironi, M. (2019). Phylogenies in ART: HIV reservoirs, HIV latency and drug resistance. Current opinion in pharmacology, 48, 24–32. https://doi.org/10.1016/j.coph.2019.03.003.

Berdigaliyev, N., & Aljofan, M. (2020). An overview of drug discovery and development. Future medicinal chemistry, 12(10), 939–947. https://doi.org/10.4155/fmc-2019-0307.

Umscheid, C. A., Margolis, D. J., & Grossman, C. E. (2011). Key concepts of clinical trials: a narrativereview. Postgraduatemedicine, 123(5),194–204. https://doi.org/10.3810/pgm.2011.09.2475

. Saeidnia, S., Manayi, A., & Abdollahi, M. (2015). From in vitro Experiments to in vivo and Clinical Studies; Pros and Cons. Current drug discovery technologies, 12(4), 218–224. https://doi.org/10.2174/1570163813666160114093140

Giordano, D., Biancaniello, C., Argenio, M. A., & Facchiano, A. (2022). Drug Design by Pharmacophore and Virtual Screening Approach. Pharmaceuticals (Basel, Switzerland), 15(5), 646. https://doi.org/10.3390/ph15050646

Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., Li, B., Madabhushi, A., Shah, P., Spitzer, M., & Zhao, S. (2019). Applications of machine learning in drug discovery and development. Nature reviews. Drug discovery, 18(6), 463–477. https://doi.org/10.1038/s41573-019-0024-5.

Model medicine. Model Medicines' Oral Anti-COVID-19 Drug Candidate MDL-001 Found to Significantly Reduce Viral Load in Lungs; Accepted into NIH's Antiviral Program for Pandemics (APP). https://www.prnewswire.com/news-releases/model-medicines-oral-anti-covid-19-drug-candidate-mdl-001-found-to-significantly-reduce-viral-load-in-lungs-accepted-into-nihs-antiviral-program-for-pandemics-app-301477223.html. Published 08 Feb, 2022.

Shi, Z., Ma, X. H., Qin, C., Jia, J., Jiang, Y. Y., Tan, C. Y., & Chen, Y. Z. (2012). Combinatorial support vector machines approach for virtual screening of selective multi-target serotonin reuptake inhibitors from large compound libraries. Journal of molecular graphics & modelling, 32, 49–66. https://doi.org/10.1016/j.jmgm.2011.09.002.

Weislow, O. S., Kiser, R., Fine, D. L., Bader, J., Shoemaker, R. H., & Boyd, M. R. (1989). New soluble-formazan assay for HIV-1 cytopathic effects: application to high-flux screening of synthetic and natural products for AIDS-antiviral activity. Journal of the National Cancer Institute, 81(8), 577–586. https://doi.org/10.1093/jnci/81.8.577

Downloads

Published

13-11-2023

How to Cite

Gou, Z. (2023). Machine Learning-Assisted Discovery of Novel Anti-HIV Drug Candidates: An Analysis Using Molecular Datasets. Transactions on Materials, Biotechnology and Life Sciences, 1, 122-139. https://doi.org/10.62051/cfngys46