Research Report to JSPS

JSPS Research Report
Pethuru Raj and Naohiro Ishii
Dept. of Intelligence and Computer Science
Nagoya Institute of Technology
Nagoya, Showa-ku, Japan

We have concentrated mainly on two topics - DNA-based Computation and Bioinformatics. Here I have given a brief of each topic and its related ones. Finally I have given the details of research papers and book chapters for your kind perusal.

Splicing System - A Theoretical Model for Designing DNA-based Computers

Tom Head had initiated a new manner of relating formal language theory to the study of informational macromolecules. His work resulted in a new mathematical generative formalism called as Splicing System (H System), which incorporates the potential effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and reassociated to produce further molecules. A language is associated with each pair of sets where the first set consists of double-stranded DNA molecules and the second set consists of the recombinational behaviors allowed by specified classes of enzymatic activities. A significant subclass of these languages, called as the persistent splicing language, is shown to coincide with a class of regular languages by Tom Head.

We have analyzed and brought out a number of novel algebraic properties, algorithms and characterizations. We have also discussed about its homomorphic, membership, test set cases and formulated the relationships with the other formal languages, such as regular, context-free, and recursively enumerable languages. Finally, we have indicated its competitiveness to be a strong formal theoretical model for designing a programmable universal molecular computer to solve the computationally hard problems.

The Role of Component Technology for Bioinformatics Applications

Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting, and utilizing information from biological sequences and molecules. As the accomplishment of these vital and precise tasks manually had been found not upto the mark and too slow, molecular biologists had to fall back on computer scientists to design better methodologies and developing robust software using the techniques available in mathematics and theoretical computer science and on computers to carry these daunting tasks in an efficient, fast and perfect manner.

There are a number of software development technologies hovering around, such as object-oriented and the latest one known as component technology. As the component-based software development process is getting popular and supporting high reusability, the information technology(IT) professionals are jumping into the bandwagon of component technology, which resulted in a new buzzword "componentware" in the software arena. As this technology is blessed with numerous advantages over the other similar technologies, the bioinformatics people also decided to start designing biological applications using the famous component technology. Also due to the unprecedented growth of Java technology in the last couple of years, it becomes the implementation language for the biological software components. Also Java is blessed with core class libraries and other utilities, it helps to develop nice-looking the client-side user interfaces and to design object-oriented databases.

We have analyzed the quality of component technology for being used to design bioinformatics software modules as part of our research work. We have also emphasized the usage of Enterprise JavaBeans (EJB) architecture for designing server-side components to make use of many unique and novel features specified in EJB architecture.

The Interoperability of Biological Databases

Biological data is filling world wide databases in an exponential rate. Also there are a variety of biological data, such as sequence, maps, probes and so on, being brought out in an alarming speed by DNA sequencing tasks being carried out on different organisms (both macro and micro). The speed is due to the advancement made in Recombinant Technology and due to the arrival of automatic DNA sequencing machines.

Also the designers of biological databases have followed different data schemas such as flat files, relational and object-oriented, for storing and accessing data.

As it is found infeasible to store all the data in one big database due to various factors, the only option left is to make all these database interoperate through some connection technology due to the hard realization of the advantages of interoperability of databases among the molecular biologists. Fortunately we have a number of exciting connection technologies like CORBA, Jini etc. We have done quite a lot of research on this challenging topic and have indicated the basic requirements, such as describing the functionalities being offered by a particular database server in terms of interfaces using OMG's IDL, designing CORBA servers for the databases being connected for interoperation, the client and server implementation by programming languages like Java and C++, and choice of efficient and high-performing middleware, how to do relation to object and the reverse mapping, and so on.

Thus we have emphasized the importance of distributed object computing technology represented by CORBA, a open standard being promoted vigorously by a consortium of more than 700 software firms. This task is being dubbed as enterprise data-level integration being facilitated by the middleware technology.

Integration of Biological Applications

Towards the goal of computerizing the different tasks of molecular biologists, there were a number of robust and exciting application software, such as BLAST, FASTA, FLASH etc. for accomplishing the very difficult and valuable work of searching the whole databases for locating similar sequences for one or more number of query sequences. Also days went by, we came across a number of advancements and enhancements in the software development process and the quality of the finished works got improved dramatically.

For example, software applications VisualBLAST, PowerBLAST and BEAUTY came out to enhance the functionalities of BLAST and for accomplishing pre-processing queries and post-processing of huge results in an easy and efficient way. But one vital gap exists today. That is, there is no integrated view of these software applications loaded in different server machines spread all over the world. That is there is nil coordination among the applications. Having realized the importance of their integration for making them for easy to use among other different purposes, we have focussed on using the Enterprise Application Integration (EAI) technologies to solve this great impediment.

We have discussed different approaches, different technologies, benefits and drawbacks of each of the technologies, and we have coded the interface for some of the applications using OMG's IDL. Thus we have applied EAI technology for integrating biological applications.