Tech Turns to Biology as Data Storage Needs Explode
Interest by Microsoft and others in DNA–based storage could deliver post-silicon electronic memory within a decade
By Prachi Patel
Researchers have decoded the genomes of mammoths and a 700,000-year-old horse using DNA fragments extracted from fossils in the past few years. DNA clearly persists far longer than the bodies for which it carries the genetic codes.
Computer scientists and engineers have long dreamed of harnessing DNA’s tininess and resilience for storing digital data. The idea is to encode all those 0s and 1s into the molecules A, C, G, and T that form the twisted, ladder-shaped DNA polymer—and this decade’s advances in DNA synthesis and sequencing have bought the technology forward by leaps and bounds. Recent experiments indicate that we might one day be able to encode all the world’s digital information into a few liters of DNA—and read it back after thousands of years.
Now interest from Microsoft and other tech companies is energizing the field. Microsoft Research announced last month that it would pay synthetic biology start-up Twist Bioscience an undisclosed amount to make 10 million DNA strands designed by Microsoft’s computer scientists to store data. Top memory manufacturer Micron Technology is also funding DNA digital storage research to determine whether a nucleic acid–based system can expand the limits of electronic memory. This influx of money and interest could lead to research and progress that eventually drive down today’s prohibitively high costs and make DNA data storage possible within the decade, researchers say.
Humans will generate more than 16 trillion gigabytes of digital data by 2017, and much of it will need to be archived: Think: legal, financial and medical records as well as multimedia files. Data is stored today on hard drives, optical disks or tapes in energy-hogging, warehouse-size data centers. These media last anywhere from a few years to three decades at most. Plus, says Microsoft Research computer architect Karin Strauss, “we’re producing a lot more data than the storage industry is producing devices for, and projections show that this gap is expected to widen.”
Enter DNA. It lasts for centuries if kept cold and dry. And it could in theory pack billions of gigabytes of data into the volume of a sugar crystal. Magnetic tapes, today’s densest storage medium, hold 10 gigabytes in the same amount of space. “DNA is an unbelievably dense, durable, nonvolatile storage medium,” says Olgica Milenkovic, an electrical and computer engineering professor at the University of Illinois at Urbana–Champaign.
That is because each of its four building-block molecules—adenine (A), cytosine (C), guanine (G) and thymine (T)—is only a cubic nanometer in volume. Using a coding system—at its simplest, say A represents bits ‘00,’ C represents ‘01’ and so on—scientists can take the strings of 0s and 1s that form digital data files and design a DNA strand that maps an image or video. (Of course, the actual coding techniques scientists use are much more complex.) Synthesizing the designer DNA strand is the data-writing part. Scientists can then read the data by sequencing the strands.
Harvard University geneticist George Church jump-started the field in 2012 by encoding 70 billion copies of a book—one million gigabits—in a cubic millimeter of DNA. A year later researchers at the European Bioinformatics Institute showed that they could read, without any errors, 739 kilobytes of data stored in DNA.
A few teams have demonstrated fully functioning systems in the past year. In August researchers at E.T.H. Zurich encapsulated synthetic DNA in glass, exposed it to conditions simulating 2,000 years and recovered its coded data accurately. In parallel, Milenkovic and her colleagues reported storing the Wikipedia pages of six U.S. universities in DNA and—by giving the sequences special “addresses”—selectively reading and editing parts of the written text. Such random access to data is critical to avoid having to “sequence a whole book to read just one paragraph,” she says.
In April Microsoft’s Strauss and computer scientists Georg Seelig and Luis Ceze at the University of Washington reported being able to write three image files, each a few tens of kilobytes, in 40,000 strands of DNA using their own encoding scheme—and then reading them individually with no errors. They presented this work in April at an Association for Computing Machinery conference. With the 10 million strands Microsoft is buying from Twist Bioscience, the team plans to prove that DNA data storage can work on a much larger scale. “Our goal is to demonstrate an end-to-end system where we encode files to DNA, have the molecules synthesized, store them for a long time and then recover them by taking DNA out and sequencing it,” Strauss says. “Start with bits and go back to bits.”
Memory maker Micron is exploring DNA as a post-silicon technology. The company is funding work by Harvard’s Church and researchers at Boise State University to explore an error-free DNA storage system. “The rising cost of data storage will drive alternate solutions, and DNA storage is one of the more promising solutions,” says Gurtej Sandhu, director of Advanced Technology Development at Micron.
These researchers are still looking into cutting the error rates in encoding and decoding data. But the major pieces of the technology are in place. So what is keeping us from shoe box–size data vaults containing DNA-loaded glass capsules? Cost. “The writing process is about a million times too expensive,” Seelig says.
Here’s why: Making DNA involves stringing together its nanometers-size molecules one by one with high precision—not an easy task. And although the cost of sequencing has plummeted due to the booming demand for medical applications such as disease screening and diagnostics, DNA synthesis has not had a similar market driver. Milenkovic paid about $150 to get a string of 1,000 nucleotides synthesized. Sequencing a million nucleotides costs about a cent.
Interest in data storage from Microsoft and Micron might be just the kind of impulse needed to start lowering costs, Seelig says. Clever engineering and new technologies such as microfluidics and nanopore DNA sequencing, which help miniaturize and speed things up, will also be key. Right now it takes several hours to sequence a few hundred nucleotide pairs—days to synthesize them—using multiple instruments and manual preparation of DNA. “You’d want all of this in a pretty small box, otherwise you’d lose the advantage of DNA’s storage density,” Seelig explains.
If it all works out, Microsoft’s Strauss imagines companies offering archival DNA storage services within the next decade. “You could open your browser and upload files to their site or get your bytes back, like cloud storage,” she says. Or, with as yet unrealized breakthroughs in DNA synthesis and sequencing, “you could buy a DNA drive instead of a disk drive.”