Molecular Computation
Home DNA Computing - Links Molecular Biology - Links Computing Types

Abstract

Computing Science is in the middle of a major paradigm shift, driven by Molecular Biology. Adleman by his breath-taking paper announced the arrival of computers based on biochemical operations and has showed that a large class of difficult and computationally hard problems is best solved not by pushing electrons through wires in a computing laboratory, but by mixing solutions in test tubes in a molecular biology laboratory. As the computationally hard problems are the stumbling blocks for the contemporary Von Neumann computers, the DNA based computation is poised to play a greater role in computing. This article discussed about this novel idea of DNA based computation.

1. Introduction

Today's computers are millions of times more powerful than their crude ancestors in the 40's and 50's. Almost every two years, computers have become twice as fast whereas their components have assumed only half the space and however, it has also been realized that integrated circuit-technology is running against its limits and it has been a hotly debated question whether computers of an entirely new kind, quantum-mechanical computers or computers based on Molecular Biology is in the offing. One of the recently introduced unconventional paradigms, which promises to have a tremendous influence on the theoretical and practical progress of computer science is DNA computing, which under some circumstances might be an elegant alternative to the classical Turing/Von Neumann notion of computing. It is sure that the union of two of science's most fertile fields, molecular biology and computer science is to produce some remarkable offsprings.

In 1994, Adleman invented a method for solving a small instance of a Directed Hamiltonian Path (DHP) Problem by an in vitro DNA-recombination assay which he performed experimentally using hybridization, several agarose-gel separations, and PCR by handling DNA sequences in a test tube. Before discussing about this experiment, here is an overview about DNA molecules, which make the way for this sort of innovative computing model.

2. The Structure and manipulation of DNA

DNA (deoxyribonucleic acid) encodes the genetic information of cellular organisms. It consists of polymer chains, commonly referred to as DNA strands. Each strand may be viewed as a chain of nucleotides, or bases. An n-letter sequence of consecutive bases is known as an n-mer or an oligonucleotide of length n. The four DNA nucleotides are adenine, guanine, cytosine and thymine, commonly abbreviated to A,G,C and T respectively.

Each strand has, according to chemical convention, a 5' and a 3' end, thus any single strand has a natural orientation. The classical double helix of DNA is formed when two separate strands bond together. Bonding occurs by the pairwise attraction of bases; A bonds with T and G bonds with C. The pairs (A,T) and (G,C) are therefore known as Watson-Crick complementary base pairs.

Thus a hypothetical DNA molecule sequence is

AACGCGTACGTACAAGTGTCCGAATGGCCAATG
TTGCGCATGCATGTTCACAGGCTTACCGGTTAC

3. Operations on DNA sequences

The following operations can be done on DNA sequences in a test tube to program the DNA computer

Synthesis: synthesis of a desired strand
Separation: separation of strands by length
Merging: pour two test tubes into one to perform union
Extraction: extract those strands containing a given pattern
Melting/Annealing: break/bond two single strand DNA molecules with complementary sequences.
Amplification: use PCR to make copies of DNA strands
Cutting: cut DNA with restriction enzymes
Ligation: Ligate DNA strands with complementary sticky ends using ligase.
Detection: Confirm presence/absence of DNA in a given test tube.

The Hamiltonian Path Problem(HPP) may be phrased as follows: given a set of n cities connected by one-way and two-way roads, does there exist a path through this network starting at the first city and ending at the last city such that each city is visited once and only once?.

The problem is deceptively easy to describe, but in fact belongs to the notoriously intractable class of NP-complete problems, which signifies the class of problems solvable in Nondeterministic Polynomial(NP) time. Typically, these problems involve a search where at each point in the search there is an exponential increase in the number of possibilities to be searched through, but where each possibility can be searched through polynomial time.

Consider a map with five cities linked by one-way and two-way roads. Adleman's approach was to encode each city and each route between two cities in DNA strands, put into a test tube.

For example, the strand coding for cities 1 and 2 could be AATGCCGG, TTTAAGCC respectively. A road from city 1 to 2 is encoded in such a way that the first part is the complementary strand to the second half of strand for city 1, and the second part is the complementary strand to the first half of the strand for city 2, i.e. GGCCAAAT.

That is, GGCC is the complementary version of the last four bases of city 1, and AAAT is the complementary version of the first four bases of city 2. Thus the edge joining the cities 1 and 2 is being encoded as follows.

GGCCAAAT
AATGCCGGTTTAAGCC

Similarly the DNA molecules strands can be formed for all the nodes and edges representing all possible routes in the directed graph in the test tube. The first stage of Adleman's computation was to extract those long strands which start with city 1 and store these in a separate test tube.

The second stage was to extract those strands which corresponded to a certain length which signified exactly 5 cities being passed through. If each city is represented by 8 DNA bases, all strands of 40 bases would be extracted and stored in a separate test tube.

The third stage is to extract all those strands containing the DNA sequence for city 1, then those containing the DNA sequence for city 2, and so on. If there is a solution to this route problem, it will be found in the strands extracted for the last city 5.

4. The case for DNA computing

The possible advantages of DNA-based computer architecture became immediately apparent:

Computing with DNA offers the advantage of massive degrees of miniaturization and parallelism over conventional silicon-based machines. For example, a square centimeter of silicon can currently support around a million transistors, whereas current manipulation techniques can handle to the order of 1020 strands of DNA.

Size: the information density could go up to 1 bit/nm3.
High parallelism: every molecule could act as a small processor on nano-scale and the number of such processors per volume would be potentially enormous. In an in vitro assay we could handle easily with about 1018 processors working in parallel.
Speed: although the elementary operations(electrophoretic separation, ligation, PCR-amplifications) would be slow compared to electronic computers, their parallelism would strongly prevail, so that in certain models the number of operations per second could be of order 1018 operations per second, which is at least 100,000 times faster than the fastest supercomputers existing today.
Energy efficiency: 1019 operations per Joule. This is about a billion times more energy efficient than today's electronic devices.

The research in this field had both experimental and theoretical aspects. The experiments that have actually been carried out are not numerous so far. P.Kaplan replicated Adleman's experiment, a Wisconsin team of computer scientists and biochemists made a partial progress in solving a 5-variable instance of SAT problem, an another NP-complete problem, by using a surface-based approach. F.Guarnieri and his team have used a horizontal chain-reaction for DNA-based addition.

Also, various aspects of the implementability of DNA computing have been experimentally investigated: the effect of good encodings on the solutions of Adleman's problem were addressed; the complications raised by using the Polymerase Chain Reaction (PCR) were studied; the usability of self-assembly of DNA was studied; the experimental gap between the design and assembly of unusual DNA structures was pointed out; joining and rotating data with molecules was reported; concatenation with PCR was studied; evaluating simple Boolean formulas was started; ligation experiments in computing with DNA were conducted.

The theoretical work on DNA computing consists, on one side, of designing potential experiments for solving various problems by means of DNA manipulation. Descriptions of such experiments include the Satisfiability Problem, breaking the Data Encryption Standard (DES), expansions of symbolic determinants, matrix multiplication, graph connectivity and knapsack problem using dynamic programming, road coloring problem, exascale computer algebra problems, and simple Horn clause computation.

On the other side, the theoretical research on DNA computing comprises attempts to model the process in general, and to give it a mathematical foundation. To this aim, models of DNA computing have been proposed and studied from the point of view of their computational power and their in-vitro feasibility. Some of them are given below.

5. Models and Formats of DNA Computation

In the two years that followed, a lot of theoretical work has been done on generalizing Adleman's approach in order to define a general-purpose DNA-based molecular computer that could also be implemented by an in vitro system. Lipton generalized Adleman's model and showed how his model can encompass solutions to other NP-complete problems. The other model is by splicing operation proposed by Head and vigrously followed by many researchers using formal language theory. It is shown that the generative power of finite extended splicing systems is equal to that of Turing Machines. Afterwords, Paun and others introduced the so-called sticker model. Unlike previous models, the sticker mode has a memory that can both read and written to, and employs reusable DNA. Also there is a proposal about the tendency of DNA structures to self-assemble as a computational tool. They show that the self-assembly of complex branches known as double cross-overs into two-dimensional sheets or three-dimensional solids is a computationally powerful model.

However, there are some impediments to effective computation by these models. It is a common feature of all the proposed implementations that the biological operations to be used are assumed to be error-free. An operation central to and frequently employed in most models is the extraction of DNA strands containing a certain sequence (known as removal by DNA hybridization). The most important problem with this method is that extraction is not 100% efficient and may at times inadvertently remove strands that do not contain the specified sequence. Especially for a large problem, the number of extractions required may run into hundreds, or even thousands resulting a high probability of incorrect hybridization.

Thus, a novel error-resistant model of DNA computation has been proposed by Alan Gibbons and his team that obviates the need for hybridization extraction within the main body of the computation.

Like previous models, this model is particularly effective for algorithmic description. It is sufficiently strong to solve any of the problems in the class NC and the authors have given DNA algorithms for 3-vertex-colorability problem, Permutations Problem, Hamiltonian Path Problem, the Subgraph isomorphism problem, and the Maximum clique and maximum independent set problem.

There are two general formats in which complex combinatorial sets of DNA molecules may be manipulated.

  • in solution ( solution-phase format)
  • attached to a surface (solid-phase format)

The solid-phase format possesses many important advantages over the solution-phase format.

  • Facilitated sample handling.
  • With the DNA molecules attached to a support, the experimental manipulations are very simple. They are addition of a solution to the support and removal (washing) to a solution from the support. These steps are readily automated.
  • Decreased losses during sample handling
  • Reduction of interference between oligonucleotides
  • Solid-phase chemistry permits facile purification of the DNA molecules at every step of the process.
6. Pitfalls of DNA Computing

The idea of using DNA to solve computational problems is certainly intriguing and elegant, and DNA does provide a massive parallelism far beyond what is available on existing silicon-based computers. However, there are many technological hurdles to overcome. We give below one of the huge fundamental problem to be solved to attain the goal of designing universally programmable molecular computer.

The fundamental problem is that, the function of 2n is exponential whether it counts time or molecules. It has been estimated that Adleman's Hamiltonian path problem, if enhanced to 50 or 100 cities, would require tons of DNA. The minimum amount of required DNA for Lipton's SAT method needs a few grams of DNA molecules for 70 variables. If this is increased to 100 variables, the minimum DNA requirement of millions of kilograms.

Thus raw idea of brute-force enumeration is not going to work beyond modest problem sizes. Thus it is imperative to bring forth new revolutionary ideas to make this notion of DNA-based computing to work realistically. Only time and investment will tell where the initial ideas for DNA computing from those experts will lead. Many enhancive ideas have been published but all of them suffer under this fundamental problem. Hopefully the future molecular computation methods may bring forth new revolutionary ideas to overcome this very fundamental as well as significant hurdle

7. The future of DNA Computing

The significance of this research is two-fold: it is the first demonstrable use of DNA molecules for representing information, and also the first attempt to deal with an NP-complete problem. But still much more work needs to be done to develop error-resistant and scalable laboratory computations. Designing experiments that are likely to be successful in the laboratory and algorithms that proceed through polynomial-sized volumes of DNA is the need of the hour. It is unlikely that DNA computers will be used for tasks like word processing, but they may ultimately find a niche market for solving large-scale intractable combinatorial problems. The goal of automating, miniaturizing and integrating them into a general-purpose desktop DNA computer may take much longer time.