Computational Molecular Biology

Computation on molecular sequence data (strings) is at the heart of computational molecular biology. Existing and emerging algorithms for string computation provide a significant intersection between computer science and molecular biology. This subject started to flourish on the basis the following two assumptions.

The first one is that biologically meaningful results could come by considering DNA as a one-dimensional character string, abstracting away the reality of DNA as a flexible three-dimensional molecule, interacting in a dynamic environment with protein and RNA, and repeating a life-cycle in which even the classic linear chromosome exists for only a fraction of the time. The second assumption existed for protein, holding that all the information needed for correct three-dimensional folding is contained in the protein sequence itself, essentially independent of the biological environment the protein lives in.

There are a variety of biologically important problems defined primarily on sequences, that is, in the computer science vernacular on strings: reconstructing long strings of DNA from overlapping string fragments; determining physical and genetic maps from probe data under various experimental protocols; storing, retrieving, and comparing DNA strings; comparing two or more strings for finding similarities; searching biological databases for homologies; defining and exploring different notions of string relationships; looking for new or ill-defined as well as conserved patterns occurring frequently in DNA; looking for structural patterns in DNA and protein sequences; determining secondary structure of RNA; and more. There are some distinct statements such as:

The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of A's, C's, G's and T's. and

Molecular biology is all about sequences and it tries to reduce complex biochemical phenomena to interactions between defined sequences and

The ultimate rationale behind all purposeful structures and behavior of living things is embodied in the sequence of residues of nascent polypeptide chains.

Computational Biology Articles

Computational Biology Links and Resources