Research on Sequence Clustering and Alignment in DNA Storage Based on the K-Means Model
DOI:
https://doi.org/10.62051/d1xfwh47Keywords:
K-Means clustering; DNA storage; photobiology.Abstract
The technology of DNA storage uses artificially synthesized deoxynucleotide chains to store information, ensuring precise and error-free reading. Compared with traditional electronic information storage, it has advantages in terms of capacity, density, and energy consumption. In this study, a K-Means clustering model is constructed with the aim of accurately clustering DNA sequences after DNA storage sequencing. To objectively evaluate the effectiveness of the model, clustering results are compared in detail with the correct DNA sequences. Experimental data show that when processing 100,000 DNA storage sequencing sequences, the accuracy of the model exceeds 90%, and the entire clustering process only takes 10 seconds. This result fully demonstrates the important role of the K-Means model in restoring original information sequences in DNA storage and provides a solid theoretical and practical foundation for future research.
Downloads
References
[1] Zhirnov V, Zadegan R M, Sandhu G S, et al. Nucleic acid memory [J]. Nature Materials, 2016, 15 (4): 366.
[2] Panda D, Molla K A, Baig M J, et al. DNA as a digital information storage device: hope or hype? [J] 3 Biotech, 2018, 8 (5): 239.
[3] Extance A. How DNA could store all the world's data [J]. Nature, 2016, 537 (7618).
[4] Bar-Lev D, Orr I, Sabary O, et al. Deep DNA storage: Scalable and robust DNA storage via coding theory and deep learning [J]. arxiv preprint arxiv: 2109.00031, 2021.
[5] Clermont D, Santoni S, Saker S, et al. Assessment of DNA encapsulation, a new room-temperature DNA storage method [J]. Biopreservation and biobanking, 2014, 12 (3): 176-183.
[6] Howlett S E, Castillo H S, Gioeni L J, et al. Evaluation of DNAstable™ for DNA storage at ambient temperature [J]. Forensic Sci Int Genet, 2014, 8 (1): 170-178.
[7] Doricchi A, Platnich C M, Gimpel A, et al. Emerging approaches to DNA data storage: Challenges and prospects [J]. ACS nano, 2022, 16 (11): 17552-17571.
[8] Wang S, Mao X, Wang F, et al. Data Storage Using DNA[J]. Advanced Materials, 2024, 36 (6): 2307499.
[9] Mayer C, Mcinroy G R, Murat P, et al. An Epigenetics-Inspired DNA-Based Data Storage System [J]. Angew Chem Int Ed Engl, 2016, 128 (37): 11310-11314.
[10] Goldman N, Bertone P, Chen S, et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. [J]. Nature, 2013, 494 (7435): 77-80.
[11] Church G M, Gao Y, Kosuri S. Next-generation digital information storage in DNA [J]. Science, 2012, 337 (6102): 1628-1628.
[12] Erlich Y, Zielinski D.DNA fountain enables a robust and efficient storage architecture [J]. Science, 2017, 355 (6328): 950-954.
[13] Shipman S L, Nivala J, Macklis J D, et al. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria [J]. Nature, 2017, 547 (7663): 345-349.
[14] Organick L, Ang S D, Chen Y J, et al. Random access in large-scale DNA data storage [J]. Nature biotechnology, 2018, 36 (3): 242-248.
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







