The Value of CORBA in Enterprise Application Integration (EAI)

The Value of CORBA in Enterprise Application Integration (EAI)
Pethuru Raj and Naohiro Ishii
Dept. of Intelligence and Computer Science
Nagoya Institute of Technology
Nagoya, Japan
peter@ics.nitech.ac.jp
Abstract

In today's enterprise, an incredible amount of valuable data and vital business logic code are locked away in different applications, platforms and databases. The real challenge is flexibility and responding rapidly to change. Companies today need to be able to quickly build new systems, integrate different and disbursed applications, and to update applications to take advantage of new hot and emerging technologies and forthcoming opportunities. The answer at sight is Enterprise Application Integration, which is emerging as a new, hot area of corporate computing. CORBA, a middleware architecture aiming to provide a versatile infrastructure for the cooperation of distributed and heterogeneous software components in a network, is set to play a very significant role for successful EAI. In this chapter, we have given a brief of EAI and its models and types, its motivation, promising technologies, approaches to make EAI a cost-effective reality and the usage of CORBA for EAI. Finally we have given a prototype for integrating a couple of bioinformatics applications and supplied a generic interface using OMG's IDL towards their integration.
Key Words: CORBA, Java, XML, Enterprise Application Integration, Wrapping and Reusability, Bioinformatics.

1. Introduction

In the early days of computing, enterprise information systems were poorly planned and designed using the technology of the day without properly analyzing how these systems would scale and whether they be flexible enough to face the future. There are organizations with different types of open and proprietary systems, each with its own development, database, networking and operating system type urgently looking for EAI technologies. Another important factor is that there are plenty of Enterprise Resource Planning (ERP) applications such as those from SAP, PeopleSoft, and Baan. These packaged applications have revolutionized the way integrated information technology systems are built within most enterprises and there are several valuable business logic and processes in these packages. The present-day modern enterprises with a view to meet the advanced requirements of end users are bound to use EAI technologies to leverage existing business rules and processes instead of creating new systems from scratch. Thus EAI technologies are poised to play a very significant role in modernizing the enterprise systems in a reliable, flexible, smart and cost-effective manner to meet the user requirements with ease. The ultimate point is that every application, database, methods and other elements of enterprise computing are accessible anytime, anywhere.

EAI means the integration of enterprise applications so they can freely share information and processes. The need for EAI is being felt as the applications have to cooperate across the enterprise, and with customers' and suppliers' systems so that information can be shared as well as providing the infrastructure for the reuse of complex and smart business logic. In short, EAI lets us look at an entire enterprise as a single logical application and data store.

The business advantages of an EAI are numerous. EAI has the capability for updating and integrating applications whenever needed and integrating real-time data from different parts of enterprise to create new types of applications. Also it can extend legacy or proprietary systems to new technologies, including the web and integrating different versions of enterprise applications across different countries, functions, and customized versions so one can pursue a best-of-breed approach.

There are a few enterprise middleware solutions that can provide the end-to-end reliability, manageability and scalability needed for enterprise applications apart from providing flexibility and ease of use. It is true that distributed object computing paradigm plays a very important role in accomplishing successful EAI. The industry standard CORBA for distributed object computing model and technology available today provide EAI architects and developers with the ability to join applications together, tightly or loosely, in a way that is cost effective and reliable. CORBA is the only distributed standard that works in a heterogeneous environment and provides a rich set of tools and services. Hence CORBA is the main factor in binding disparate applications together where as other EAI-enabling technologies failed due to their proprietary nature and other inherent deficiencies.

There are several different systems, applications, networks and information to be integrated. EAI talks about only the applications that need integration.

2. Enterprise Application Integration (EAI)

We have inherited a variety of legacy systems that contain mission-critical applications, several packaged applications with both proprietary and open frameworks in a hodgepodge of hardware and OS platforms and several communication protocols. Also applications and databases are disbursed geographically. In addition, applications across the enterprise are typically loosely coupled and do not necessarily share a common information or data model and hence the environment is really messy and chaotic. Thus the integration of such applications becomes mandatory and can be accomplished by EAI technologies. EAI is the unrestricted sharing of data and business processes among any connected applications and data sources in the enterprise. EAI provides a set of integration-level application semantics. That is, EAI creates a common way for both business processes and data to speak to one another across applications. Hence EAI can beneficially answer the problem of integrating systems and applications.

2.1 Applications for EAI

Basically EAI came as an effective response to decades of creating distributed monolithic, single-purpose applications leveraging a hodgepodge of platforms and development approaches and there is an accumulation of a number of various stovepipe systems. The demand of the enterprise is to share data and processes without making major changes to the applications or data structures. This implies that the EAI technologies have to extract the vital business logic and components and data from these systems and keep them in the integrated system. Here comes a set of systems and applications that normally need integration.

Legacy systems are stovepipe applications that may exist with many other stovepipe applications in a centralized environment. While mainframes continue to make up the majority of traditional systems, minicomputers and even large UNIX systems may also correctly be called legacy systems.

Personal computing systems are applications that exist on thousands of desktops within an organization, each system containing valuable information and processes. The trouble here is that no two microcomputers are exactly alike. Thus arises need for the integration process.

Distributed systems are any number of workstation servers and hosts tied together by a network that supports any number of applications. This definition covers a broad spectrum of computing trends such as client/server, Internet/Intranet and distributed object computing architectures. Distributed systems, though architecturally elegant, are difficult to implement. But this architecture lends itself well to integration when compared with other computing architecture mentioned above due to various unique factors.

Packaged applications are any type of application that is purchased rather than developed. Packaged applications come in all shapes and sizes and they tend to use one of three distinct architectures: centralized, two-tier, and three-tier, which is the most popular architecture as this provides a clean separation between the user interface, the business logic, and the data tiers. The interfaces exposed by packaged applications offer three types of services. They are business services, data services and objects. Also there are different types of interfaces, such as full-service, limited-service, and controlled. These applications contain reusable business processes that represent best-of-breed business models and do not require a full-scale development effort. Enterprise resource planning (ERP) applications top the list of packaged applications and hence become the focal point for EAI.

E-business is an ideally efficient method of conducting commercial transactions in the Internet. E-business encompass both business-to-business inclusive of supply chain integration and business-to-consumer including Internet commerce. Both types of applications especially supply chain integration would certainly benefit from EAI. The EAI technologies are also applicable for most supply chain integration. EAI can extend its reach outside the enterprise to include both trading partners and customers within the enterprise integration architecture.

2.2 Types of EAI

The EAI architect in the organization going for EAI has to understand both business processes and data elements of the applications. He has to select meticulously the business processes and data elements that need integration. Usually EAI can be achieved in the following levels of the application.

Data level
Business model level

The business model may be further subdivided into following three levels.

Application interface level
Method level
User interface level

We give a brief of each level below.

Data-level EAI is the process and the techniques and technology of moving data between data stores. That is, extracting information from one data store, processing that information as needed, and updating it in another data store. The numerous database-oriented middleware products that allow architects and developers to access and move information between databases simplify data level EAI. These tools not only allow for the integration of database systems, such as Oracle and Sybase, but also different database models, such as relational and object-oriented, models. The advent of EAI-specific technology, such as message brokers, database replication software, custom-built utilities, and simple data movement engines gives the enterprise the ability to move data from one place to other - from anywhere to anywhere - without altering the target application source. Lately XML has come out with all the nice features to form documents in a structured format and to transport them among applications over the wire. Finally, there are more complex problem domains, such as moving data between traditional mainframe, file-oriented databases and more modern relational databases, relational databases to object-oriented databases and multidimensional databases.

Application interface-level EAI refers to the leveraging of interfaces exposed by custom or packaged applications. In the early days, applications were built as true monolithic stovepipes. Today, applications invariably are little more than a set of services that are exposed to outside world through interfaces and data. Packaged applications, such as SAP, PeopleSoft, Baan provide some interfaces to allow for outside access and subsequently for integration. Similarly for custom applications, it is possible to define a particular interface or access them through the standard interfaces, such as COM and CORBA. Apart from these, there are hundreds of different interfaces for various functionalities. These interfaces are being integrated in a way to bundle many applications together allowing the business logic and information of the applications to be shared.

Method-level EAI is the sharing of the business process logic that may exist within the enterprise. A business process is any rule or piece of logic that exists within the enterprise that has an effect on how information is processed. Hundreds of business processes reside in the applications of most organizations. In order to implement method-level EAI, there is a need to understand all the processes comprehensively in an organization. Basically, there are four types of processes. They are rules, logic, data and objects. There are several mechanisms such as distributed object computing, Transaction Processing (TP) monitors, application servers and frameworks to accomplish method-level EAI. We have discussed several unique features of distributed objects in accomplishing method-level EAI below. Finally patterns also are useful in the context of method-level EAI. They enable the EAI architect to identify common business processes among the many business processes that already exist within the enterprise and fix them towards the goal of method-level integration.

Frameworks for Method-level EAI: The notion and value of frameworks can be beneficial in integrating an enterprise at the method level. A framework defines a specific method by which multiple objects are used in conjunction to perform one or more tasks that can not be performed by any single object. An object-oriented framework provides a way of capturing a reusable relationship between objects so that those objects do not have to be reassembled within the same context each time they are needed. Frameworks support design reuse while OO technology supports software code reuse. Frameworks are fully debugged and tested software subsystems, centrally located and accessible by many applications. They fit in well with method-level EAI where, in many instances, shared objects, processes, and/or methods are being identified for the purpose of integration. There are a number of types of frameworks fulfilling different functionality. Object-Oriented frameworks and component-based enterprise frameworks stand tall among them. The corresponding enabling-technologies for method-level EAI include application or transaction servers, distributed objects and message brokers, which we have discussed below.

User interface-level EAI allows EAI architects and developers to bundle applications using their user interfaces as a common point of integration. This is the most primitive of all EAI levels and easy to implement. Though this is like accessing and extracting screen information through a programmatic mechanism, many integration projects will have no other option but to leverage user interfaces to access application data and processes. Also the task of extraction is not as easy as it seems. There are two basic techniques for extracting information from screens: static and dynamic. Each has advantages and limitations. Also there are two approaches for getting information from application interfaces: using screens as raw data and using screens as objects. There are a number of enabling technologies for user interface-level EAI. They are terminal emulation software, middleware and software development kits.

2.3 EAI Approaches

There are two distinct approaches for EAI. A non-invasive integration approach is one that does not require any modifications or additions to existing applications. The basic premise is simply to accept existing application interfaces (every application must have a public interface).

A non-invasive approach accepts that although this foundation is inherently sound for its designed purpose, this may be limited - especially when an existing applications need to be used in ways or combinations with other applications, for which it was not designed with the properties like communicating with other systems.

The invasive approach requires the integrators to change the applications. The trouble here is that this change needs modification to the existing applications and/or interface that is expensive in terms of development time and skills.

Thus a solution is invasive or non-invasive will depend on what the existing application interfaces are. The applications include core legacy systems, enterprise resource planning (ERP), and newer Web-based applications. CORBA as an invasive middleware is bound to play a key role in EAI.

2.4 Core Capabilities for EAI

EAI requires several core capabilities to facilitate such types of integration. We list out some of them here: An infrastructure that supports synchronous and asynchronous communication; a way to convert or transform data between applications; supporting services such as security and directory; ways to link in higher-level business processes and workflow; and mechanisms to connect or wrap existing applications and to form a gateway to other technologies. These are the prominent capabilities needed to accomplish EAI successfully as these automatically abstract away many of the internal manipulations at the network and system levels so that the developers can concentrate much of their expertise and time on business logic and glue coding.

2.5 EAI Technologies

Middleware is the underlying technology for EAI. It works well as an enabling mechanism to make one entity to communicate with another entity or entities. The plus point for middleware is that it can abstract away the complexities of the underlying operating system and network protocols from the developer�s shoulders in order to facilitate the easy integration of various systems in the enterprise. Also the same middleware API can be used in many other different types of applications and platforms.

There are several types of middleware capable of solving some particular types of integration tasks. Most enterprises leverage many different EAI technologies as a short-term and long-term solution according to their requirements. They are RPC, MOM, distributed object computing, database-oriented middleware, transactional middleware including both Transaction Processing (TP) monitors and the latest highly popular EJB-compliant application servers. We have discussed more on CORBA, a robust distributed object computing architecture standard.

There are two types of middleware models: logical and physical. And middleware uses two types of communication mechanisms: synchronous and asynchronous. In addition to these main types, there are a number of communication types, such as connection-oriented and connectionless communications, direction communications, and queued communications.

Logical middleware encompass both one-to-one middleware that can work in a point-to-point, as well as many-to-many (including one-to-many) configurations. Each configuration is blessed with both merits and demerits.

Point-to-point middleware: This includes MOM products like IBM MQSeries and Microsoft Message Queue (MSMQ) and RPCs like DCE. This middleware technology ties applications together, but the solution happens to be point-to-point as it creates single links between many applications. The downside with this middleware technology is that integrating two applications can be accomplished easily but integrating additional applications becomes a severe bottleneck. This solution leads to chaotic enterprise. The point here is that enterprises are much more complex and difficult to contend with than one may think. This forces the architects and developers to discover new technology to tackle EAI.

Many-to-many Middleware: Message brokers, Application servers, Distributed objects and TP monitors are examples for many-to-many middleware products. This kind of middleware links many applications to many other applications and this is the best fit for EAI.

Message broker is an another EAI technology offering real promise in breaking a new ground for EAI. These brokers are able to move messages from any type of system to any other type of system by changing the format of the messages so that the target system can understand. Message brokers also assure that messages are delivered in the correct sequence and in the correct context of the application. This solution brings an order to the chaotic environment prevailing in enterprises comprising legacy, packaged, 2- and 3-tier and electronic commerce applications. In addition to above solutions, there are a number of niche technologies such as applications server such as EJB server, intelligent agents and distributed object computing technologies such as CORBA, RMI and DCOM for solving the problem of integrating processes. These technologies allow developers to build and reuse business processes within an enterprise even between enterprises.

Distributed Objects: The single, greatest advantage of distributed objects is the adherence to an application development and interoperability standard. This is due to the fact that CORBA, COM+ are mere specifications, not technologies. The vendors adhere to the established standard and therefore provide a set of technologies that can interoperate.

As the distributed object technology matures, vendors are adding new features that address many previous shortcomings, namely: scalability, interoperability, and communication mechanisms. The OMG is constantly adding new CORBA services and refining and enhancing the existing services. The OMG is now endorsing CORBA�s Object Transaction Service (OTS) while Microsoft is looking for Microsoft Transaction Service (MTS) to bring transactionality to COM. Thus distributed objects are now able to provide better scalability through the ability to support transactions and therefore offer the same scaling tricks as application servers and TP monitors.

Interoperability continues to improve as OMG defined Internet Inter-ORB protocol (IIOP) that facilities for different ORBs talk to each other and COM - CORBA bridges are available nowadays. Earlier there were synchronous communication mechanisms and now both CORBA and COM are introducing asynchronous messaging mechanisms to cater the current and forthcoming needs. Thus distributed objects give EAI architects and developers the ability to create portable objects that can run on a variety of servers and can communicate using a predefined and standard messaging interface.

Database-Oriented Middleware: Database access is a key element to EAI. Database-oriented middleware provides access to any number of databases, regardless of the data model employed in the database or the platform upon which the databases exist. These provide a number of important benefits as told below.

An interface to an application

The ability to convert the application language like Java into SQL, the database can understand The ability to send a query to a remote database server and to be processed Getting back the result sets from the remote server The ability to convert a response set into a format understandable by the requesting application.

Apart from these, database-oriented middleware can also provide to process many simultaneous requests as well as gives scaling features, such as thread pooling and load balancing. Mainly there are two types of database-oriented middleware: call-level interfaces (CLIs) and database gateways. CLIs, such as Object Database Connectivity (ODBC) and Java Database Connectivity (JDBC), provide a single interface to several databases and fulfills the above-mentioned capabilities. Database gateways are able to provide access to data once locked inside larger systems, such as mainframes. They can integrate several databases for access from a single application interface. A number of gateways are currently on the market such as Information Builder�s Enterprise Data Access/SQL (EDA/SQL), in addition to standards such as IBM�s Distributed Relational Data Access (DRDA) and ISO/SAG�s Remote Data Access (RDA). Thus database-oriented middleware is all the software that connects some application to some database.

Java Middleware and EAI: Java, a robust and versatile language for building Web applications a few years ago, is maturing enough to be of benefit to the enterprise and to EAI. Having realized the value of Java on middleware side, JavaSoft with the cooperation from various vendors is now introducing many superior standards including Java RMI, JDBC, JMS, JNDI and Java IDL for CORBA. These standards are all applicable to EAI in that they provide a Java-enabled infrastructure. Also the arrival of EJB-compliant application servers, another form of middleware, has ignited many vendors to take note of what Java as a platform for middleware is in store for them. Here comes a list of various kinds of middleware standards from Java. They are Database-oriented (JDBC), Interprocess (Java RMI), Message-oriented (JMS), Application-hosting (Java-enabled application servers, e.g. (BEA WebLogic, Netscape iPlanet, and Inprise Application Server (IAS)), Transaction-processing (TP monitors, e.g. BEA�s Tuxedo and IBM�s CICS are embracing Java) and Distributed object technology (Java RMI, Java IDL). Thus Java is being shaped to be an ideal and fertile application development and processing environment on which full-blown enterprise class software applications are being generated.

In the next section, we talk about the importance of XML in integrating applications.

3. EAI and XML

In this section, we talk about XML, the latest buzzword in the software industry and its value in EAI. The Hyper Text Markup Language (HTML) magically transform a plain-looking document into a well-organized, good-looking one. The key to all these transformations is a set of tags that are inserted between the text items, giving meaning to the way these text items appear in the Web browser, and the document is displayed as dictated by them. But HTML can not convey any information about the text items. This brings us to the concept of information about information - data that describes a piece of information and gives meaning to it. And if this meta-information can be given in such a way that it can be processed by a computer, all the better. This is the basic underlying motivating factor behind eXtensible Markup Language (XML).

XML is oriented toward giving meaning to information and does not deal with display aspects. One can use eXtensible Stylesheet Language (XSL) with an XML document to describe how documents should be displayed. The great advantage with this is that one can create multiple XSL documents that display the data contained in an XML in different ways to suit the requirements of various user agents.

XML is a subset of Standard Generalized Markup Language (SGML). But unlike HTML, XML is a superb structured document format standard that businesses are using to exchange business data. Data interchange is one area in which XML can and will play a very prominent role. As opposed to a database, in which data may be organized and stored in a proprietary format, an XML file is an ASCII representation of the data, a representation that has the advantage of hierarchical organization yet one that can be easily transmitted. Hence XML is deserved to be a worthy standard integration mechanism.

Also XML has been designed with the purpose for publishing data through Web without the originator knowing about the target system that receives the data. XML provides a common data exchange format, encapsulating both metadata and data. This allows different applications and databases to exchange information without knowing each other. A source system, in order to communicate, simply reformats a message or a data record as XML-compliant text and moves that information to any other system that understands how to read XML. Having realized the value of XML, EAI architects and developers have embraced XML to move information throughout an enterprise. Further on, XML has an application as a common text format to move information between enterprises, supporting supply chain integration efforts.

4. CORBA for EAI

We have discussed above a number of various promising middleware technologies for accomplishing EAI. Distributed object computing architecture is robust and comprises several distinct features for facilitating EAI. There are CORBA, the open standard from OMG, RMI from Javasoft and DCOM from Microsoft. Each one comes with both merits and demerits. RMI is platform independent but language dependent whereas DCOM is language independent but platform dependent. This points towards CORBA. Below we have discussed the synergy between CORBA and EAI in detail.

The Common Object Request Broker Architecture (CORBA) is a set of industry standards for distributed object-based computing that is designed to facilitate reliable, platform-independent execution of object-oriented software in wide- and local area network environments. The key element of CORBA technology is the Object Request Broker (ORB), which acts as a software bus managing access to and from objects in an application, linking them to other objects, monitoring their function, tracking their location and managing communications with other ORBs.

The ORB is the main mechanism for simplifying the development of CORBA standard applications. The simplification is a result of three properties: location independence, platform and language interoperability. Location independence means that an ORB treats all objects it is aware of as local objects, even if they exist on remote systems. Platform interoperability means that objects created on one hardware/software computing platform (for example, those generated on a Pentium-based Windows NT system) can run on any other CORBA-equipped platform (may be Sparc-based Sub Workstation).

Language interoperability means that objects written in one language can interact with applications written in another, thanks to CORBA's interface definition language (IDL) and the availability of mapping between the programming languages and IDL. IDL is a contractual language that defines the interfaces to objects, but not their implementations. Objects themselves can be written in any common language(COBOL, C, C++, Smalltalk, Java) and still work on non-native systems.

Though it is a little difficult to implement CORBA-based systems, it comes on top of the list for EAI technologies. In the following, we discuss some of the finest advantages of using CORBA for integrating some specific applications.

4.1 CORBA works across Heterogeneous Platforms

Many software and hardware vendors started to embrace CORBA. CORBA also includes mechanisms for communication among objects across a network. The objects may be ported in several different platforms. The general inter-ORB protocol (GIOP) specifies message formats and data representations that ensure object interoperability among ORBs. The inter-ORB protocol (IIOP) defines the specific details for using GIOP over TCP/IP. The availability of ORBs on so many different platforms and operating systems makes CORBA unique in that it can fit almost every combination of systems within differing organizations. Thus CORBA is the backbone for enterprise and interorganizational or interdepartmental communication.

4.2 CORBA helps Enterprise Legacy Integration

As discussed above, IDL stands as one of the corner stones for the success of CORBA in the computing arena. When an IDL gets compiled using an IDL compiler that normally accompanies the particular ORB product, there would be one client-side stub and another server-side skeleton. The client invokes methods on this stub that in turn directs the method to the server by marshaling the parameters if necessary. The skeleton is practically the base class the developers use to code the server implementation.

The IDL really plays a significant role in legacy applications integration. As IDL defines the interfaces not implementation, a new and more refined object containing the logic of a particular function can be implemented without altering anything on the interface definitions to serve the clients uninterruptedly. This implies that a legacy system may be wrapped to provide the same desired behavior with the fact that the method invocation looks identical through the common interface for the calling clients.

This approach may be used to add new behavior and features to a legacy application without the need to modify the existing system. It also provides a natural way to migrate away from an old application by slowly removing logic to access the legacy with new logic to perform the same function.

4.3 CORBA for Multi-tier Integration

CORBA taskforces draw only the specifications leaving the implementations open to the vendors. It defines only the interfaces using IDL to the various features, for example, ORB, Object Services etc. This allows several ways to implement an ORB and regardless of the underlying representation, the ORB will be CORBA compliant, as long as the ORB provides the standard interface. There are a few flavors of ORBs such as client and implementation resident, server-based, system-based and library-based ORBs.

ORBs are available today on desktops with Java, on NT and Unix workstations, and even on mainframes. This availability makes it feasible to design two, three and distributed systems. A client on a desktop can connect and access a Unix server machine through an ORB in two-tier model of computing or the server machine can be middle-tier and the backend database server is third-tier. Further on, the middle-tier can be a collection of connected server machines. Thus CORBA helps in designing applications for multi-tier systems and integrating those systems also using ORBs.

4.4 Core Capabilities of CORBA for EAI

Apart from these unique features, there are a number of facilities that helps immensely in designing and integrating multi-tier applications. We would like to list out a few here excluding those mentioned features such as IIOP and IDL above. OMG has come out with a number of valuable and essential services called CORBA Object Services (COS) and CORBA Facilities. There are services such as naming, trading, location, property, timing, event, security, lifecycle, persistent object services fulfilling various needs of designing and deploying distributed objects. The CORBA 2.4 version, the latest one, contains a few novel features such as CORBA Component Model (CCM), Objects by Value (OBV), Java-to-IDL Mapping, Quality of Service (QOS) and a few interesting enhancements including Interoperable Naming Service (INS). In addition to these benefits, there are CORBA domain frameworks that are bound to meet the particular needs of domain industries such as Banking, E-business and Telecom. Thus CORBA is evolving and becoming mature to be a solid rock on which software development becomes easy and productive. CORBA technology is all set to capture the attention of EAI architects in a big way in the days to come.

4.5 CORBA for Integration of Components

Components are reusable entities that can be combined to create more complete and new applications. Components may be designed using object-oriented techniques and sometimes are described as coarse-grained objects. There are components for both client-side (front-end) and server-side (back-end) applications. There are programming languages and container-providing architecture for designing and deploying components. Components have some unique capabilities such as introspection, properties, events and Interoperability. We give here a couple of key component technologies.

We all know that Java is evolving rapidly into a more complete platform for bug-free, portable, reusable, flexible and robust software development language. JavaBeans are front-end components written using Java. The bean Application Programming Interfaces (API) allow the components to be combined into other components, applets, stand-alone Java applications, or servlets. Infobus is a specification that allows JavaBeans to communicate with other JavaBeans and Enterprise JavaBeans (EJBs), which is a bean model for server-side components.

Enterprise JavaBeans specification targets developing and deploying portable EJB components for enterprises. EJB adds mission-critical features and robustness to components, making them ideal for back-end server development. EJB takes care of many of the essential features relevant for the server side computing especially like transactions and security as well as connection pooling and threading, naming, and persistence.

There are some component technologies from Microsoft such as COM, DCOM and the latest COM+. These work well with Microsoft platform only. As told above, CORBA in its latest version, talks about CORBA component model.

5. CORBA and Bioinformatics

It is now increasingly common for computational tasks within life sciences research to be carried out in a heterogeneous, distributed computing environment. Information must be moved from one machine to another or disks cross-mounted so that a variety of programs can be run on multiple systems. Often programs have to be recoded in another programming language to be compiled and executed on another architecture. Also the molecular biology data, which is being brought out in an exponential rate due to the advancement made in DNA sequencing technology adopted by different genome projects, are to put severe strain on existing systems. Maintaining a working software system in this computational jungle is a laborious and time-consuming practice, drying up valuable resources.

Also bioinformatics researchers have been advocating the use of componentry as a technique for constructing genome informatics systems. Components are independently developed programs (such as databases, user interfaces, analysis programs, and programs associated with laboratory instruments) that are designed to be used as modular, �plug-and-play'' building blocks. In order to cope with the growing need for distributed, portable, fault tolerant and reusable components, the OMG has defined CORBA for distributed object computing architecture. Thus, CORBA with its utilization is an essential consideration when developing the next generation of integrated biological applications. Here, we have given a viable proposal for integration of BLAST-related software tools for the improvement in the quality of accessing and making use of the biological knowledge available in public databases by the molecular biologists remotely.

The task of managing the massive biological data is critical, yet the task of presenting this data smartly to researchers is of even greater importance. The Internet has been playing an indispensable role in the development of the discipline of bioinformatics both as a means of disseminating results of sequencing and in fostering international collaborations. The Web is now used as the main mechanism by which the vast amount of biological data can be delivered to the researchers' desktop. Every major bioinformatics database now has a CGI interface that allows users to query the database using web forms.

One of the reasons for this popularity of the web is that it is platform neutral and as long as the client machine has a web browser like Netscape Navigator, the client can read web pages on any server that is connected irrespective of the type of computer or operating system at either end.

The Web offers a single user interface to data sharing across heterogeneous and autonomous databases, but it was not designed to handle the rigid DBMS protocols and data formats used by relational and object-oriented databases. Hypertext links between web pages do provide a kind of index for interactive browsing but it will be more efficient to send selected conditions across to a remote part of a distributed database and to send back just the items required. Also it becomes accepted practice for bioinformatics researchers to construct libraries of sharable components, and the proliferation of programming languages and operating systems indicates the importance of a dramatic change in the computing paradigm. Distributed Object Computing (DOC) addresses the need for software integration in heterogeneous environments in a direct way, using the power of object-based technology coupled with the packaging and distributing capability.

The Common Object Request Broker Architecture(CORBA) provides a cross-language, cross-operating system, cross-platform, cross-application method for describing services and making use of those services. Modeling applications as objects interacting in a network (with CORBA hiding all of the details of the network and service access) has come as a bonanza for distributed applications in bioinformatics.

6. Integration of Bioinformatics Applications

The GenBank contains all known nucleotide and protein sequences with supporting bibliographic and biological annotations. Apart from GenBank, there are a number of research laboratories, government organizations, and academic institutions all over the world doing DNA sequencing projects producing biological data in plenty. As it is found inadequate and inefficient to do this huge work manually, several software tools were designed for gathering and retrieving the biological data, accessing and managing the databases, and for extracting the biological knowledge from the biological data in a neat and precise manner. Currently, these software products providing various services are available in different server machines located in different parts of the world and unfortunately there is nil coordination among them.

Here is a list of software tools being used by database designers and molecular biologists. For building GenBank, the Web-based data submission tools like BankIt and the platform-independent submission program called Sequin have been designed by NCBI. For retrieving GenBank data, an integrated database retrieval system that accesses DNA and protein sequence data, MEDLINE references (PubMed), genome data, the NCBI taxonomy and protein structures from the Molecular Modeling Database (MMDB) is made available by NCBI. For sequence similarity searching, NCBI offers the BLAST family of search programs. Also, there are scores of software tools for pre and post-BLAST search on the database. BEAUTY (BLAST Enhanced Alignment UtiliTY) is an enhanced version of BLAST database search tool that facilitates identification of the functions of matched sequences.

The detailed analysis of database search results made with BLAST becomes exceedingly time consuming and tedious due to the resultant file containing a list of hundreds of potential homologies. There is a program called Visual BLAST which can facilitate and accelerate the interactive analysis of full BLAST output files containing sequence alignments. This tool also includes a pairwise sequence alignment viewer, a Hydrophobic Cluster Analysis plot alignment viewer and a tool displaying a graphical map of all database sequences aligned with the query sequence.

PowerBLAST includes a number of options for masking repetitive elements and low complexity subsequences. It also has the capacity to restrict the search to any level of NCBI's taxonomy index, thus supporting �comparative genomics'' applications. Post-processing of the BLAST output using the SIM series of algorithms produces optimal, gapped alignments, and multiple alignments when a region of the query sequence matches multiple database sequences. The two programs MSPcrunch and Blixem also helps greatly in post-processing the output files of BLAST search. Also there is a tool for discrimination between orthologues and paralogues within BLAST output lists of homologous sequences. Using these tools in its present form is not very much user-friendly. Also sharing the precise data and processes are not possible in this set up. Here comes the dire need of EAI and CORBA. We have discussed several models of integration levels. It is our view that there are two nice solutions.

One solution is to integrate these tools at user-interface level as all these
software tools have their own graphical user interfaces.

The other solution is to redesign all these software tools as CORBA objects
exposing the functionalities through the IDL interface in order to facilitate
sharing of vital data and the processes in a cost effective, reliable,
interoperable and easy to use manner.

Every CORBA object has its own IDL interface. The OMG's IDL is the language used to describe the interfaces that client objects call and object implementations provide. The separation of client and server communicating through an agreed interface is the cornerstone of CORBA distributed software design allowing concurrent use of different languages and operating systems, allowing both clients and servers to improve implementations and add new features independently of each other according to the users requirements. If, eventually, the original specification is found to be restrictive, new interfaces can be written and implemented while still supporting the old. The IDL language also allows inheritance so simple IDL specifications can be extended to create more complex derived interfaces.

Specifically for this unified application, the above mentioned software tools will have to be redesigned as CORBA objects. This modifying process can be accomplished in a relatively easy way by writing the wrappers over the functionality provided by these tools. Once the IDL file is provided, any client on knowing the functionality being provided by the CORBA objects expressed through the IDL file will get the object reference for those objects and proceed them as local objects. This is in a way like extracting and reusing the vital components from the legacy applications that have been hanging around fulfilling various needs of molecular biologists. Thus preparing an IDL file describing all the methods for each software tool becomes the first and foremost job.

In this way, all the major applications need to be CORBA-compliant and IDL definitions should be consistent between similar information sources. Thus, the client can query the biological databases through these applications for a variety of information, send a query sequence to extract the similar sequences from the databases and to manipulate the resulting sets of sequences.

Finally the object implementation for all the services declared in the IDL interfaces should be developed using a object-oriented programming language such as Java. The CORBA objects should be registered with a naming service server. There are a number of Object Request Brokers (ORBs) with Portable Object Adapter (POA) satisfying the CORBA 2.3 specifications. The implementation languages and operating platforms can be chosen accordingly.

7. Conclusion

Considering the value of EAI and EAI-enabling technology such as CORBA, enterprises must clearly understand both EAI's opportunities and its risks. Being able to share data and business logic that is common to many applications, and thus integrate those applications, is a tremendous advantage. However, to be successful, one must be armed with the through knowledge of the correct approach and the value of the enabling technology. In this chapter, we have made it clear the advantages of CORBA in integrating enterprise applications over other existing competing technologies. Finally we have extended this theoretical information for achieving integration of a couple of bioinformatics applications using CORBA practically.

Reference:

David S. Linthicum,Enterprise Application Integration, Addison-Wesley, Reading, Massachusetts, 2000.
P.Durand et al., ``Visual BLAST and Visual FASTA: graphic workbenches for interactive analysis of full BLAST and FASTA outputs under Microsoft Windows 95/NT'', CABIOS, 13, 407-413, 1997.
Stephen Misener and Stephen A. Krawetz, ``Bioinformatics Methods and Protocols'', Humana Press, Totowas, New Jersey, 2000.
Ron Zahavi, Enterprise Application Integration with CORBA Component and Web-Based Solutions, Wiley Computer Publishing, John Wiley & Sons Inc., New York, 1999.
C.Pethuru Raj and Naohiro Ishii, ``Interoperability of Biological Sequence Databases using CORBA'', presented in IEEE International Conference on Information, Intelligence and Systems, Washington, 1999.
Dirk Slama et al., ``Enterprise CORBA'', Prentice Hall PTR, Upper Saddle River, NJ, 1999.
Kim C.Worley et al., ``BEAUTY: An Enhanced BLAST-based Search Tool that Integrates Multiple Biological Information Resources into Sequence Similarity Search Results'', Genome Research, 5, 173-184, 1995.
Jinghui Zhang and Thomas L.Madden, ``PowerBLAST: A New Network BLAST Application for Interactive or Automated Sequence Analysis and Annotation'', Genome Research, 7, 649-656, 1997.

Visit my Web Page in peterindia.com

Appendix

Interfaces coded using OMG�s IDL for BLAST software

Here, we have given an IDL file for the software tool BLAST. The IDL file for the other tools like BEAUTY, VisualBLAST, etc. can be coded similarly.

module GenBank
{
module HomologySearching
{
typedef sequence fileFlow; //File Bytes
{
interface BLAST
{
//attributes
attribute string program;
attribute string accessNumber;
attribute string database;
attribute boolean complexity;
attribute float expected;
readonly attribute string description;
struct BLAST_Result
{
string accession;
string the_sequence;
float probability;
};
typedef sequence accessionNumbers;
typedef sequence string biosequences;
typedef sequence BLAST_Results;
//exceptions
exception InvalidID {string reason;};
//methods
void submit(in string newsequence);
boolean exists(in string ID);
accessionNumbers getaccessionNumbers(in string query_seq);
string getBases(in string ID) raises (InvalidID);
fileFlow getGZIPFile(in string ID) raises (InvalidID);
BLAST_Results do_BLAST\_Search(in string query_seq);
};