In the next three years, the total amount of digital data will triple to 175 billion terabytes. It will be almost impossible to reliably store such an amount of information on hard drives and magnetic tapes, the construction of data centers will not save the situation either, so scientists are now actively working on data storage technology in DNA.
In this article, we will explain what DNA storage devices are, what are their advantages and disadvantages, and we will analyze step by step how to store information in a molecule and extract it from it.
What is DNA storage
DNA data storage is a technology that uses the DNA molecule to store data. One gram of DNA can store up to 215 million gigabytes of data. In 2019, scientists from the American startup Catalog assessed the potential of the approach and encoded 16 GB of English-language Wikipedia text into synthetic DNA.
In the future, when the technology becomes mass, DNA will be able to replace flash drives, hard drives, magnetic tapes, and other storage devices that take up a lot of space, quickly fail, and do not allow information to be stored for several hundred years.
Technology Benefits
- High storage density. The storage density of DNA is 1009 times greater than that of a hard disk, and this is not yet the maximum potential capacity.
- Long-term storage. The storage life of the data depends on the ambient temperature. So, at a positive temperature, a molecule will last 2,000 years, and at -20°C – 2,000 centuries.
- The constancy of the structure of DNA. Technologies become obsolete over time, unlike the structure of the DNA molecule, which has not changed for 3 billion years.
- Environmental friendliness. Servers leave a carbon footprint, and DNA does not require electricity to function, so the technology has little to no environmental impact.
Technology Disadvantages
- High price. The cost of downloading one megabyte is about $1. But by 2030 the cost of storing DNA data could drop to $1 per terabyte.
- Low download speed. Now the data download speed is low. But scientists have already created a DNA chip prototype that can record up to 20 GB per day.
- Low search speed. Searching for information in DNA takes a lot of time. But with the help of a chemical method, scientists plan to speed up the process a thousand times.
- Large-size device. Catalog’s Shannon device occupies the space of a small room. Now scientists are working with Seagate to reduce its size and create a “laboratory on a chip.”
How information is stored in DNA
The digital data that is stored in a computer is encrypted as sequences of zeros and ones. To write data on DNA, it is necessary to convert them from a binary system to a quaternary system. Line them up in a chain and transfer them to a molecule. More details about the process are given below.
Data conversion
The first step involves converting the original data. Which is usually represented in binary form (consisting of 0s and 1s), into a format that can be represented in DNA. There are four bases in DNA: adenine (A), cytosine (C), guanine (G), and thymine (T). One common approach is to map every two bits of binary data to one of the four DNA bases. For example, you can use A for 00, C for 01, G for 10, and T for 111.
DNA synthesis
After the data has been converted into a DNA base sequence, the corresponding DNA molecules must be created. This is done through DNA synthesis, which involves the chemical “building” of a DNA molecule one base at a time. This process is carried out with the help of special machines – DNA synthesizers. The sequence is created according to the order A, C, G, and T which was determined in the data transformation step.
Data storage
Once DNA molecules have been synthesized, they can be stored. The DNA molecule is dense and stable. Which allows it to store a huge amount of data in a small space. It is also resistant to many of the problems that can destroy traditional forms of data storage such as magnetic or optical media. If DNA is stored in a controlled environment (for example, at a low temperature), data can be stored for hundreds to thousands of years.
Data Extraction
Extracting data from DNA is essentially the reverse process of transformation. DNA is sequenced by synthesis (SBS), which reads the A, C, G, and T sequences into DNA. This sequence is then converted back to binary and recreates the original digital data.
Current DNA sequencing technologies take a long time to read data. So DNA data storage is currently more suitable for long-term archival storage rather than data that needs to be accessed frequently.
Error Correction
During synthesis and sequencing, errors such as missing or extra bases can occur. To solve this problem, researchers have developed error correction algorithms that allow them to be detected and corrected. For example, redundancy (keeping multiple copies of the data) and parity (adding extra “check” bits that can be used to check the accuracy of the data).
Prospects for data storage technology in DNA
The digital DNA memory development market reached $105.5 million last year and is projected to grow by 69.8% per year. Research in this area is carried out by technology companies. Such as GenomTech, HelixWorks, and Catalog Technologies, scientific institutions and even the US National Security Agency.
In 2023, Russian scientist Maxim Nikitin of the Moscow Institute of Physics and Technology (MIPT) made a discovery that revolves around the idea of molecular switching. This effect makes it possible to regulate the functions of genes with enormous diversity.
Molecular switching could revolutionize DNA storage by providing a new mechanism for storing and retrieving information. This could potentially increase the efficiency of DNA data storage, make it more commercially viable, and speed up its adoption.
Moreover, we can already say that the potential of DNA storage is enormous. If all films ever made were formatted in DNA, they could fit in less than a sugar cube. While the technology is still in its infancy. DNA data storage could be mainstream in as little as five years, around 2028.