Honey, I shrank the data. How DNA storage could reduce the data centre to just a few particles

Honey, I shrank the data. How DNA storage could reduce the data centre to just a few particles

Capacity, cost, and performance. Three of your top data storage priorities? No wonder, and it’s the same for many. The volume of data we produce is in overdrive, we’re churning out a staggering 2.5 quintillion bytes of data each day – and that impacts the size of your data centre, how far your budget will stretch, and how quickly data can be moved.

This growth is a major challenge for many data storage vendors today: requirements for storage capacity and longevity continue to expand with no signs of slowing down in sight. Enter DNA storage. A technology promising to store millions upon millions of gigabytes in just a few particles, for thousands of years.

Sounds too good to be true? We gathered a group of experts, vendors, researchers, and analysts, to unpick this new medium’s claims and help you stay ahead of this mind-blowing technology.

Futurama: not so far off?

If you think the data we are producing now is off the scale, consider where we’ll be in ten, 20 or even 30 years’ time. CIOs are likely to be juggling much larger volumes of data then than their counterparts are now. Thomas Ybert, CEO and co-founder of DNA Script, a world leader in Enzymatic DNA Synthesis (EDS), explains: “Emerging technologies, such as autonomous cars and artificial intelligence, will further increase the need for unprecedented data storage requirements. The anticipated growth in data storage needs cannot be addressed by current resource-intensive technologies.”

And what will happen to that data if there is nowhere to store it? Over to Curtis Anderson,

Software Architect at Panasas, the data engine for innovation: “We are currently, mostly, able to keep up … by prudent pruning of low-value data and the natural growth in the density of individual drives, along with the growth in the number of those drives. We will almost certainly require progressively deeper and deeper pruning of low-value data to stay ahead of that explosion. If DNA-based storage lives up to its promise, we will be able to stop pruning and simply store everything.”

Sabine Sykora, PhD. Application Scientist and Business Developer at Kilobaser, a team committed to the democratisation of DNA and RNA synthesis, points out we need to consider more than just capacity: “Unfortunately, all current archival storage media face fundamental limitations. Tape drives, for instance, suffer from media obsolescence. Data stored on tape must constantly be migrated to account for equipment failures and technology upgrades, thus making tape drives a highly inefficient storage medium that produces considerable amounts of electronic waste. Existing storage technology is becoming far too costly while at the same time leading to considerable negative impact on the environment.”

Our experts continue to return to the environmental cost, and that organisations will have to consider this impact more carefully in the future. Thomas Heinis, Professor in Data Management at Imperial College London, confirms: “Data centres today are considerable contributors to CO2 emissions…DNA as a storage medium does not need to be cooled and requires no electricity.”

Surely he can’t really mean DNA, the building blocks of life? In fact, that is exactly what he means: digital data stored in synthetic DNA and, because that data is said to be retrievable for millennia, it is perfect for archiving large data sets for long periods of time.

Do not write this off as wishful thinking. DNA storage is already in use, according to Giorgio Regni, CTO of Scality, which unifies data management, from edge to core to cloud. “End users already benefiting from DNA storage include researchers who need to store large volumes of data, such as genome sequences,” he explains. “In the future, DNA storage may be used by businesses and individuals to store data for long periods of time.”

Kilobaser’s Sykora expands: “The application of DNA storage is already being realised for long-term storage… basically the backup of the backup. From this backup, everyone that stores information on DNA benefits as there is an alternative storage medium in use. However, there are currently only a few companies that can provide DNA storage with their own coding and infrastructure. This severely limits the number of end-users at the time being.”

Bernard Peultier is VP innovation at data protection software provider, Atempo. He agrees with Sykora’s assertion that DNA storage is for the long haul. “You don’t store data you want to keep for a few days or a few months in DNA. DNA is part of what I call alternative long-term storage. When you archive long-term, any archivist will tell you, you must store information in different types of storage to guarantee its durability . Don’t store copies on the same type of storage, quite simply because if there is an unforeseen deterioration in the time of the storage, well then, you are in trouble.”

Daniel Chadash sits on the board of the DNA Storage Alliance, which is helping to address the growing demand for archival storage by using DNA as a storage medium. He explains where this type of storage is likely to be adopted: “The first users that will benefit from it will probably be in the digital preservation industries, media & entertainment, healthcare, and advanced scientific research that needs to preserve data for decades, if not forever.” But any sector that is expected to store ever-growing data long-term could be in the frame.

Sergei Serdyuk, VP of Product Management at NAKIVO, a fast-growing software company for protecting physical, virtual, and cloud environments, adds: “As the need for high-density, low-maintenance storage keeps growing, more and more businesses from various sectors are investing in its development. Although energy-efficient, lasting data storage will have a positive impact on many industries, organisations in the data management sector will probably be the first to benefit once the technology moves from the lab to production.”

So, yes to archiving data, but DNA storage promises much more, according to Newcastle University’s Professor of Computing Science and Synthetic Biology, Natalio Krasnogor. “I strongly believe that the most exciting applications have not yet been uncovered and that those will start to emerge sometime in the next few years. My lab is focusing on DNA data structures rather than archival storage. We believe that what is needed is the ability to not only store large amounts of data, but also to uniformly manipulate, sort, search and, more generally, to process that data, and to do so where the data is produced, stored, and consumed.”

A word of caution though, from research company Omdia’s Principal Analyst in Advanced Computing for AI, Alexander Harrowell: “The main benefits are that the storage is very efficient, a lot of information is packed into a tiny volume of DNA, and it is stable without electrical power for the long term. The downsides are that accessing it is extremely slow and requires equipment, chemicals, and skills that are far rarer than say, those required to manage a tape drive.”

Here comes the science bit

That slow retrieval is in part what our experts say will limit adoption. When you shrink data usually stored in data centres that can rival a small town in size and power consumption, to fit into a test tube, there is going to be a trade-off. For now at least.

But why does it take so long to retrieve data stored in DNA? Scality’s Regni sets out the detail: “DNA storage works by encoding digital data into DNA sequences, which are then synthesised and stored. The data can be retrieved by sequencing the DNA and decoding the data. DNA encapsulated with salt remains stable for decades at room temperature and should last much longer in controlled environments like a data centre. DNA doesn’t require maintenance, and data stored in DNA can easily be copied at low cost.”

DNA Script’s Ybert elaborates further: “DNA is composed of four bases which can be used to encode information. The encoded data would need to be synthesised as DNA using the available nucleic acid printing technologies. DNA Script uses Enzymatic DNA Synthesis (EDS), which prints nucleic acids without the toxic chemicals used by conventional phosphoramidite chemistry printing methods. EDS is a ‘clean’ chemistry process that doesn’t require the strict environmental conditions, hazardous waste disposal, or physical footprint like the conventional process.”

Another positive step towards greener data storage? Perhaps, but there is a way to go before we realise the potential benefits of this technology, according to the majority of our experts.

Science fiction – or fact?

Although work is underway to bring DNA storage into the mainstream – DNA Script’s collaboration with Harvard to manufacture DNA on a semiconductor chip is just one example – it could be some time before it is a staple in your data centre. The stumbling blocks? Time and cost.

Kilobaser’s Sykora: “To make DNA storage widely available, both processes – writing and reading – need to be moved out of the lab and into the office. To achieve this transition, both writing and reading need to become easier, cheaper, and faster.”

She continues: “The European Union has decided to fund several research projects aimed at making DNA synthesis cheaper and faster. The result of these projects will be new technologies that will bring DNA storage one step closer to broad application.”

Alessia Marelli is CTO at DNAalgo, a team of storage industry veterans developing DNA storage. She explains more: “There are a lot of companies working on DNA data storage right now. The DNA Data Storage Alliance is doing a great job trying to consolidate roadmaps and standards in this sector. Biotech companies are working on synthesis and sequencing to make it more affordable as costs and speed. Other companies like DNAalgo are working on encoding and decoding to make DNA data storage reliable. I think that in the following years a lot of progress will come, resulting in a real commercial product in five to ten years.”

ICL’s Heinis supports Marelli’s timeline: “We believe it will be available in around five years. DNA synthesis is too expensive and we will need to find ways to reduce the cost, by also adapting the technology to data storage, i.e., departing from the life science use case.”

Chadash of the DNA Storage Alliance also agrees: “Based on public announcements, we assume that at least by the end of this decade (2030), if not earlier, DNA storage solutions will be widely available and at a competitive price. Right now, many companies, most of them alliance members, are working hard to make it more affordable and scalable. The work is focused on writing DNA, to improve current technologies and commercialise new technologies, such as enzymatic synthesis and reading DNA, new companies are emerging with new technologies, and existing companies are launching new products that reduce the price of reading.”

But not everyone takes this view. Enterprise Strategy Group’s Practice Director, Cloud, Infrastructure, and DevOps, Scott Sinclair is doubtful. “Count me as a sceptic,” he says. “Until the format becomes cheaper and faster to leverage, it will be difficult to leverage effectively. I’m not saying it won’t happen, DNA storage just needs multiple innovations to be in the realm of consideration.”

Omdia’s Harrowell is pragmatic: “There are demonstration projects that actually work, if you’re willing to spend a lot of money per write and wait a long time for your data.”

But every technology starts somewhere, reminds Chadash of the Data Storage Alliance: “Some claimed that the internet would not succeed and would remain a science project between universities, and some doubted our need for flat screens or flash-based hard drives.”

He continues: “We are already past the science fiction phase as a few companies, including Twist Bioscience and Microsoft, have already shown that the technology works. The challenge now is to scale and commercialise it, just like with any other new technology.”

What is next?

When asked whether DNA storage will move out of its archival comfort zone, Ybert of DNA Script is sanguine: “We will see… We do not anticipate data storage to change quickly or overnight. We predict a coexistence between ‘hot’ data that is frequently accessed and stored on traditional systems and ‘cold’ data, which is archival, rarely accessed, and doesn’t require immediate access. We’ve seen estimates that 70% of data stored is considered archival.”

ICL’s Heinis agrees: “Archival is clearly the lowest hanging fruit as it builds on the longevity and durability of DNA. For it to come nearline, however, the process needs to be made substantially smaller and more efficient. It is a long road, but it is possible.”

Although the focus is on archival storage for now, there is potential for further applications, perhaps through parallelisation. Randy Kerns, Senior Strategist at Evaluator Group, explains: “The parallelism could be in sequencing which will speed the process, allowing it to perform at a level beyond the expectations for archival storage. Also, parallel operations against DNA to allow other uses is done with operations across multiple elements of data at the same time, similar to the arguments for computation storage with SSDs. An example may be implementing a search function that could run in parallel.”

That’s a wrap

So, although it is not completely the exclusive of sci-fi, DNA storage still has some way to go before it becomes a standard feature of most data centres. Currently in its infancy, with high costs and long retrieval times, the technology offers promise. And it might be the most futuristic storage that we have access to right now. When asked what comes next, let’s give the last word to Anderson of Panasas: “I’m not sure yet, I’ll have to ask my AI assistant.”

By A3 Communications