Bioinformatics - An Overview

 HOMEBioinformatics - LinksBiological DatabasesArticles


Biology is in the middle of a major paradigm shift driven by computing technology. Although it is already an informational science in many respects, the field has been rapidly becoming much more computational and analytical. Rapid progress in genetics and biochemistry research combined with the tools provided by modern biotechnology has generated massive volumes of genetic and protein sequence data.

Bioinformatics has been defined as a means for analyzing, comparing, graphically displaying, modeling, storing, systemizing, searching, and ultimately distributing biological information, which includes sequences, structures, function, and phylogeny. Thus bioinformatics may be defined as a discipline that generates computational tools, databases, and methods to support genomic and post genomic research. It comprises the study of DNA structure and function, gene and protein expression, protein production, structure and function, genetic regulatory systems, and clinical applications. Bioinformatics needs the expertise from Computer Science, Mathematics, Statistics, Medicine, and Biology.

Knowledge Base in Biology

In the last 10 years or so, numerous innovations have seen light and the consequence is the development of a new biological research paradigm, one that is information-heavy and computer-driven. As the genetic information is being made as computerized databases and their sizes are steadily growing, molecular biologists need effective and efficient computational tools to store and retrieve the cognate information such as bibliographic or biological information from the databases, to analyze the sequence patterns they contain and to extract the biological knowledge the sequences have. On the other hand, there is a strong need for mathematical methods and computational techniques for challenging computational tasks such as predicting the three-dimensional structure of the molecules the sequences represent, and to construct evolutionary trees from the sequence data. These tools will also be used to learn basic facts about biology such which sequences of DNA are used to code proteins , which other combinations of DNA are not used for protein synthesis, for greater understanding of genes and how they influence diseases.

Biology employs a digital language for representing its information using the four basic alphabets (A, C, G, T). All the chromosomes in an organism' cell have been represented and being identified using these alphabets. The demanding challenge here is to determine how this digital language of the chromosomes is being converted into the three-dimensional and sometimes four-dimensional languages of living and breathing organisms.

Information Technology in Biology

As it was found that performing all these above-mentioned tasks manually is nearly impossible due to the massive volumes of biological data and the preciseness of works, it became mandatory to use computers for these purposes. Thus this subject of bioinformatics deals with designing and deploying efficient software tools for accomplishing the above quoted tasks in a fast and precise manner. So, bridging the gap between the real world of biology and precise logical nature of computers requires an interdisciplinary perspective.

Software and Hardware Advancements in Biology

The tools of computer science, statistics, and mathematics are very critical for studying biology as an informational science subject.

Some of the recent advances happened include improved DNA sequencing methods, new approaches to identify protein structure, and revolutionary methods to monitor the expression of many genes in parallel. The design of techniques able to deal with different sources of incomplete and noisy data has become another crucial goal for the bioinformatics community. In addition, there is the need to implement computational solutions based on theoretical frameworks to allow scientists to perform complex inferences about the phenomena under study.

Genomics in the recent past has triggered the development of high-throughput instrumentation for DNA sequencing, DNA arrays, genotyping, proteomics, etc. These instruments have catalyzed a new type of science for biology termed discovery science.

Human Genome Project - An Introduction

The Human Genome Project has encouraged a series of paradigm changes to the view that biology is an informational science. The draft of the human genome has given us a genetics parts list of what is necessary for building a human: approximately 35,000 genes, their regulatory regions, a lexicon of motifs that are the building block components of proteins and genes, and access to the human variability that make us each different from one user.

Genomes - Discovering Methodology and Study

Discovery science defines all of the elements in a biological system. For example, sequence of the genome, identification and quantitation of all of the mRNAs or proteins in a particular cell type - respectively, genome, transcriptome, and the proteome. Discovery science creates databases of information, in contrast to the more classical hypothesis-driven science that formulates hypotheses and attempts to test them. The high-throughput tools both provide the means for discovery science and can assay how global information sets, for example, transcriptomes or protemes change as systems are perturbed.

The genomes of the model organisms yeast, worm, fly etc., have demonstrated the fundamental conservation among all living organisms of the basic informational pathways. Hence systems can be perturbed in model organisms to gain insight into their functioning, and these data will provide fundamental insights into human biology. From the genome, the information pathways and networks can be extracted to begin understanding their logic of life. Further more, different genomes can be compared to identify similarities and differences in the strategies for the logic of life and these provide fundamental insights into development, physiology and evolution. The first eukaryotic genome that has been fully sequenced and annotated is Saccharomyces cerevisiae. This highly helps to develop biological and computational tools for genomic and postgenomic research.

In the era of automated DNA sequencing and revolutionary advances in DNA sequence analysis, the attention of many researchers is now shifting away from the study of single genes or small gene clusters to whole genome analyses. Knowing the complete sequence of a genome is only the first step in understanding how the myriad of information contained within the genes is transcribed and ultimately translated into functional proteins. In the postgenomic era, functional genomic and proteomic studies helps to obtain an image of the dynamic cell.

System Biology

Biology is a highly informational science. There are mainly two types of biological information.

  • The information of genes or proteins, which are the molecular machines of life
  • The information of the regularity networks that coordinate and specify the expression patterns of the genes and proteins.

All biological information is hierarchical. Initially DNA will change over to mRNA, which in turn goes to protein. Proteins enacts protein interactions, which creates some informational pathways. These pathways form informational networks, which in turn become cells. Now cells forms networks of cells. Finally an individual is a collection of cells. A host of individuals forms population and a variety of populations becomes ecologies. This evolution brings a primary challenge for researchers and scientists to create tools and mechanisms to capture and integrate these different levels of biological information and integrate it towards gaining insight of their curious functionings.

All of these paradigm shift lead to the view that the major challenges for biology and medicine in this new century will be the study of complex systems and the approach necessary for studying these biological complexities. Here comes a viable approach.

  • Identify all elements, such as sequence of genomes in the system with currently available discovery tools
  • Use current knowledge of the system to formulate a model predicting its behavior
  • Perturb the system in a model organism using biological, genetic or environmental perturbations, capture information at all relevant levels, such as DNA, mRNA, protein, protein interactions, etc. and integrate the collected information
  • Compare theoretical predictions and experimental data, carry out additional perturbations to bring theory and experiment into closer apposition, integrate new data into model,
  • Iterate steps iii) and iv) till the mathematical model can predict the structure of the system and its systems or emergent properties given particular perturbations.
System Biology - Challenges Ahead
  • The Integration of technology, biology, and computation.
  • The integration of the various levels of biological information and the modeling .
  • The proper annotation of biological information and its its storage and integration in databases.
  • The inclusion of other molecules, large and small, in the systems approach.
  • The integration imperatives of systems biology presents many challenges to industry and academia.

With the confluence of biology and computer science, the computer applications of molecular biology are drawing a greater attention among the life science researchers and scientists these days. As it becomes imperative for biologists to seek the help of information technology professionals to accomplish the ever growing computational requirements of a host of exciting and needy biological problems, the synergy between modern biology and computer science is to blossum in the days to come. Thus the research scope for all the mathematical techniques and algorithms coupled with software programming languages, software development and deployment tools are to get a real boost. In addition, information technologies such as databases, middleware, graphical user interface(GUI) design, distributed object computing, storage area networks (SAN), data compression, network and communication and remote management are all set to play a very critical role in taking forward the goals for which the Bioinformatics field came into existence.