Distributed Object Computing by JAVA

HOME

Distributed Computing

Distributed Object Computing

CORBA

Abstract
Distributed Computing Architecture
Distributed Computing - An Introduction
Motivations for Distributed Computing
Parts of Distributed Computing Applications
Requirements for developing Distributed Applications
Languages and Protocols for Distributed Systems
Distributed Object Computing Technologies
The Role of Java in Distributed Object Computing
Conclusion

As information systems have come to play an increasingly important role in helping organizations succeed in their goals in the changing environment, the need has emerged for a more flexible, efficient and cost-effective application architecture - one that integrates seamlessly with new and enables the myriads of databases, operating systems, and computer networks to interoperate. In this article, we are to discuss about the elegant software architecture called distributed computing. We give a brief introduction to distributed computing, its origin, necessity and motivations. Also we have given a short overview to each of the main parts of distributed software application and the basic requirements such as programming languages, communication protocols, security and multithreading for developing and deploying distributed applications. Most importantly, we have explained how the object technology is beneficial for developing distributed computing applications. Lastly, we have told about the emergence of Java as a programming language and platform and its varied innovative support for developing and deploying highly portable, scalable, enterprise-level distributed computing systems.

Distributed Computing Architecture

Software architecture is the art and science of designing and constructing programs. It also refers to the structure of the system and the style used in its construction. The terms "component-based architecture", "object-oriented architecture", "pipeline architecture" and so on, refer to ideas and ways of constructing systems that have evolved during the short history of software engineering. Thus software architecture definitely occupies a very vital position in software engineering as computing systems have to adapt, be flexible, and take into account constant change and new demands from the user community. A good architecture should facilitate the construction of systems that support abstraction, extensibility, scalability, interoperability, and components.

The age of information processing started with mainframe application architecture. The main drawback in this architecture is the accumulation of redundant data as mainframe computers do not share data. This architecture had been found inefficient, inflexible and costly and hence the arrival of relational database technology and client/server architecture made a strong impact in the computing arena for quite a long time. Ultimately this one also met the same fate as the earlier one due to the nature of inherent inflexibility and maintenance of the clients, which are actually fat, turned out to be costly and time-consuming. The code reuse, which has been the mantra and an interesting requirement of software engineering, failed miserably due to various weaknesses found in this two-tier architectural model.

Distributed computing architecture has embarked firmly on changing all these said deficiencies and is all set to be a innovative evolutionary architecture for the betterment of software quality and software reusability. Coupled with a powerful communication infrastructure, distributed applications can interoperate across disparate networks and operating systems. They can incorporate applications coded in multiple programming languages and integrate legacy computing systems. The major benefit of this architectural paradigm is exploitation of vital business logic found in legacy and other applications, such as packaged applications, in developing new applications through reusability and integration, which helps immensely to reduce the software developmental cost, time and scale of work needed. There are a number of novel software tools, methods, technologies, software packages, protocols and above all the software languages like Java to support this flourishing field.

Distributed Computing - An Introduction

Distributed computing has been one of the biggest buzz words and a hot topic among the software designers and EAI architects for the last 10 years or so. In the mean time, network and data communication technology flourished due to the arrival of the Internet and made tremendous strides in terms of wired and wireless technologies - from satellite and cellular communications and Metropolitan Area Networks (MANs) to the standardization of protocols like Ethernet, TCP/IP, and the Asynchronous Transfer Mode (ATM). On application level, there are large networks connecting thousands of workstations and personal computers to accomplish huge business and engineering tasks. Surely it would be prudent, beneficial and cost-effective if we are able to use our networks of smaller computers to work together on larger tasks.

Basically we need the cooperation of two computers (a client and a server) for reading a web page and other computers to make sure the data gets from one location to the other. However, simple browsing, that is, a largely one-way data exchange, is not what we usually mean when we talk about distributed computing model. We usually mean something where there is more explicit interaction between the systems involved. One can think about distributed computing in terms of breaking down an application into individual computing agents that can be distributed on a network of computers, yet still work together to do cooperative, enterprise-level tasks.

On system level, the computing systems with multiple CPUs that are physically close together are generally said to have a parallel architecture and to be parallel systems, while systems that are geographically distributed are generally said to be distributed systems. Parallel computing systems are usually designed from the ground up to provide best cost-performance, and they are likely to be quite uniform in machine architecture. Distributed systems, on the other hand, often arise out of a need to tie together preexisting systems in different locations. As a result, these machines are quite likely to be heterogeneous, with entirely different individual platforms and operating systems.

A distributed application is being built upon several layers. At the lowest level, a network connects a group of host computers together so that they can talk to each other. Network protocols like TCP/IP let the computers send data to each other over the network by providing the ability to package and address data for delivery to another machine. Higher-level services can be defined on top of the network protocol, such as directory services and security protocols. Finally, the distributed application itself runs on top of these layers, using the mid-level services and network protocols as well as the computer operating systems to perform coordinated tasks across the network.

Motivations for Distributed Computing

Here are a few of the more common motivating factors for distributed computing.

Computing things in parallel by breaking a problem into smaller pieces enables one to solve larger problems without resorting to larger computers. Instead, one can use smaller, cheaper, easier-to-find computers.
Large data sets are typically difficult to store and manage. Also it is easier to do management tasks if the databases are located locally and small in size. Thus the data bases are stored in different locations and be accessed remotely. Also the interoperability of databases can be achieved.
Redundant processing agents on multiple networked computers can be used by systems that need fault tolerance. If a machine or agent process goes down, the job can still carry on.

Parts of Distributed Computing Applications

At the application level, a distributed application can be broken down into the following parts:

Processes: A typical computer operating system on a computer host can run several processes simultaneously. A process is created by describing a sequence of steps in a programming language, compiling the program into an executable form, and running the executable in the operating system. While it is running, a process has access to the resources of the computer (such as CPU time and I/O devices) through the operating system. A process can be completely devoted to a particular application, or several applications, or several applications can use a single process to perform tasks.

Threads: Every process has atleast one thread of control. Some operating systems support the creation of multiple threads of control within a single process. Each thread in a process can run independently from the other threads, although there is usually some synchronization between them. One thread might monitor input from a socket connection, while another might listen for user events (keystrokes, mouse movements, etc.). At some point, input from the input stream may require feedback from the user. At this point, the two threads will need to coordinate the transfer of input data to the user's attention. Java facilitates multithreading capability

Objects: Programs written in object-oriented languages are made up of cooperating objects. One simple definition of an object is a group of related data, with methods available for querying or altering the data, or for taking some action based on the data. A process can be made up of one or more objects, and these objects can be accessed by one or more threads within the process. An object can also be logically spread across multiple processes, on multiple computers located in different locations.

Agents: An agent refer to significant functional elements of a distributed application. That is, an agent is a higher-level system component, defined around a particular function, or utility, or role in the overall system. A remote banking application, for example, might be broken down into a customer agent, a transaction agent and an information brokerage agent. Agents can be distributed across multiple processes, and can be made up of multiple objects and threads in these processes. Agents can also belong to more than one application at the same time. One may be developing an automated teller machine application, which consists of an account database server, with customer request agents distributed across the network submitting requests. The account server agent and the customer request agents are agents within the ATM application, but they might also serve agents residing at the financial institution's headquarters, as part of an administrative application.

Thus a distributed application can be thought of as a coordinated group of agents working to accomplish some goal. Each of these agents can be distributed across multiple processes on remote hosts, and can consist of multiple objects or threads of control.

Next, we are to talk about the basic requirements and efficient techniques to face the challenges ahead successfully in the world of distributed applications.

Requirements for developing distributed applications

Partitioning and Distributing data and functions

If one thinks of the computer hosts and network connections available for a distributed application to use as a virtual machine, then one of the primary tasks one has to do is to engineer an optimal mapping of processes, objects, threads and agents to the various parts of this virtual machine. In some cases, a straightforward client/server partitioning based on data requirements can be used. Computational tasks can be distributed based on the data needs of the application: maximize local data needed for processing, and minimize data transfers over the network. In other most compute-intensive applications, one can partition the system based upon the functional requirements of the system, with data mapped to the most logical compute host. This method of partitioning is especially useful when the overhead associated with data transfers is negligible compared to the computing time spent at the various hosts.

Thus, in the best of all possible worlds, one could develop modules based upon either data- or functionally driven partitioning. One could then distribute these modules as needed throughout a virtual machine comprised of computers and communication links, and easily connect the modules to establish the data flow required by the application. These module interconnections should be flexible and transparent as possible, since they need to be adjusted at any point during developments or deployment of the distributed system.

Flexible, Extendible communication protocols.

The type and format of the information that is sent between agents in a distributed system is subject to many varied and changing requirements. The allocation of tasks and data to agents in the distributed system has a direct influence on what type of data will need to be communicated between agents, how much data will be transferred, and how complicated the communication protocol between agents need to be. If most of our data is sitting on the host where it is needed, then communications will be mostly short, simple messages to report status, instruct other agents to start processing, etc. If central data servers are providing lots of data to remote agents, then the communication protocol will be more complex and connections between nodes in the system will stay open longer.

Multithreading Requirements

Agents often have to execute several threads of control concurrently, either to service requests from multiple remote agents, or block on O/I while processing data, or for any number of other reasons. Multithreading is often an effective way to optimize the use of various resources, such as CPU time, local storage devices, or network bandwidth. The ability to create and control multiple threads of control is especially important in developing distributed applications, since distributed agents are typically more asynchronous than agents within a single process on a single host. The environments in which agents are running can be very heterogeneous too and we do not want every agent in a distributed application to be a slave to the slowest, most heavily loaded agent in the system. We do not want our multiprocessor compute server to be sitting idle while it waits for a slow client desktop to read and render the results of an analysis. We would want a single thread on the compute server to be servicing the slow client, and while the client is crawling along trying to read data and draw graphs on its display, other realized threads on the compute server can be doing useful work, like analyzing the data from other clients.

Security

The information transactions that occur between computing agents often need to be secure from outside observation, when information of a sensitive nature needs to be shared between agents. In situations where an outside agent, which is not under the direct control of the host, is allowed to interact with local agents, it is also wise to have reasonable security measures available to authenticate the source of the outside agent, and to prevent the agent from wrecking havoc once it gains access to local processing sources. So, at a minimum, a secure distributed application needs a way to authenticate the identity of agents, define resource access levels for agents, and encrypt data for transmission between agents.

Languages and Protocols for Distributed Systems

Also there are a number of types of distributed systems. The main types are Remote Procedure Call (RPC), Socket-level programming and Message queuing. RPC is one technique which helped to design distributed system earlier for accessing the procedures loaded in the remote systems.

Also the distributed applications can be coded using various software programming languages such as COBOL, C, C++, Smalltalk, Delphi, Java etc. The languages mentioned above are satisfying the different paradigms such as procedural, structural and the current object-oriented software development. It has been realized that the software products being developed via object-oriented methodology is cost-effective, elegant, scalable, secure, flexible,and reusable. Here comes some plus points of OO technology.

As object-oriented (OO) technology does hold much promise, software development using OO languages gets significant attention among the developers. One of the benefits of OO design is that objects tend to model artifacts that exist in the real world. Another important eventuality that comes out of OO is this: As reuse increases, software costs will decrease and reliability will increase. With a strong collection of business objects in place, creating a new application will no longer mean building from the ground up; it will be more akin to linking objects together in a new and useful way.

Thus an application made up of objects, which are distributed over the network, and these objects interact together using the interfaces (An interface depicts the signature of the functions being implemented by that object) fixed for each object to accomplish the various services of that application is the next logical step to be promoted vigorously. In the following sections, we talk about that revolutionary idea evolved from the concept of distributed computing.

Distributed Object Computing Technologies

Distributed objects facilitate the construction of multi-tiered architectures. The OO concepts of encapsulation and polymorphism translate well to the world of network-based components. As told above, each distributed object comes with a public interface. There is no way that a user of the object can see the implementation. Also a distributed object is an object that can be accessed as it were a local object, although its actual location may be local or remote.

Java RMI, CORBA and EJB

There is an overview of Java Remote Method Invocation, Common Object Request Broker Architecture (CORBA) and Java Enterprise JavaBeans (EJB).

The next section comes with some of the finest advantages of Java to support the easy and bug-free development of valuable distributed applications.

The Role of Java in Distributed Object Computing

The original design motivations behind Java were concerned mainly with reliability, simplicity and architecture neutrality. Subsequently, the potential for Java as an Internet programming language has been realized and support for networking, security and multithreaded operations was incorporated in Java. All of these features of the Java language make well for a very powerful distributed application development environment. Here we list some of the nice features of Java that are of particular interest in distributed applications.

Java is a pure object-oriented language. That is, the smallest programming building block is a class. A data structure or function can not exist or be accessed at run time except as an element of a class definition. This results in a well-defined, structured programming environment in which all domain concepts and operations are mapped into class representations and transactions between them. This is advantageous for systems development in general and in particular for distributed system development. An object, an instance of a class, can be thought of as computing agent. Its level of sophistication as an autonomous agent is determined by the complexity of its methods and data representations, as well as its role within the object model of the system and the runtime object community defining the distributed system. Distributing a system implemented in Java, therefore, can be thought of as simply distributing its objects in a reasonable way, and establishing networked communication links between them using Java built-in network support.

Interfaces

Java's support for abstract object interfaces is another valuable tool for developing distributed systems. An interface describes the operations, messages, and queries a class of objects is capable of servicing, without providing any information about how these abilities are implemented. A class has to implement the methods specified in an interface according to Java language specifications. The advantage of implementation-neutral interfaces is that other agents in the system can be implemented to talk to the specified interface without knowing how the interface is actually implemented in a class. By separating the class implementation from the interface, it is possible to plug more advanced implementation as needed. If a class has to be moved to a remote host, then the local implementation of the interface can act as a stub, forwarding calls to the interface over the network to the remote class.

Certain key packages in the core Java API, such as the Java security package also make use of interfaces to all for specialized implementations by third-party vendors. The Java Remote Method Invocation package uses abstract interfaces to define local stubs for remote objects.

Platform Independence

Code written in Java can be compiled into platform-independent bytecodes using Java compilers. These bytecodes run on the Java Virtual Machine ( VM) and run on any platform with a Java VM. This happens to be a boon for all, since it allows virtually any available computing system to be home to an agent in a distributed system. Once the elements of the system have been specified using Java classes and compiled into Java bytecodes, they can migrate without recompilation to any of the hosts available. This makes for easy data- and load-balancing across the network. There is even support in the Java API for downloading a class definition (its bytecodes) through a network connection, creating an instance of the class, and incorporating the new object into the running process. This is possible because Java bytecodes are runnable on the Java VM, which is guaranteed to be underneath any Java application or applet.

Network Support and Security

The Java programming language API includes multilevel support for network communications. Low-level sockets can be established between agents and data communication protocols can be layered on top of the socket connection. The java.io package contains several stream classes intended for filtering and preprocessing various input and output data streams. APIs built on top of the basic networking support in Java provide higher-level networking capabilities, such as distributed objects, remote connections to database servers, directory services, etc.

Runtime environment and Remote transactions

In addition to that Java facilitates the distribution of system elements across the network, it makes it easy for the recipient of these system elements to verify that they can not compromise the security of the local environment. If Java code is run in the context of an applet, then the Java VM places rather severe restrictions on its operation and capabilities. Also, any class definitions loaded over the network, whether from a Java applet or application, have to go through a stringent bytecode verification process.

Java makes it easy to create, manipulate, and extend the network communication sockets. This capability of the environment makes it easy to add user authentication and data encryption to establish secure network links. The java.security package provides a framework for implementing the authentication and encryption algorithm.

Multithreading Support

Basically the operating system is fitted with the interesting and essential concept of multithreading. Java for the first time brought this innovative idea into the language level. Thus, the ability to generate multithreaded agents became a fundamental feature of Java. Any class that one creates can extend the java.lang.Thread class by providing its own implementation of a run() method. When the thread is started, this run() method will be called and the class can do its work within a separate thread of control. There is one another method using the interface Runnable.

Java with these incredible and distinct advantages can help to design enterprise-level,reusable and portable distributed applications relatively easy to other similar languages.

Conclusion

Architecture encompasses the organization and composition of a system, assignment of functionality, development and use of frameworks, selection of design and decisions for deployment. The architecture for the software systems evolved from the monolithic to multi-tiered and distributed components. As the level of abstraction embodied within these technologies increased, the sophistication of the systems designed got enhanced. Thus it so happened that there are a variety of computing systems with different operating system. Due to this unavoidable heterogeneous nature, distributed computing architecture is poised to play a vibrant and vital role in developing and deploying software applications. As the tools and software packages are being continuously brought out for facilitating the ease of use for end users and a lot of interesting and challenging applications in different domains need distributed object computing methodology, it is bound to acquire a very significant role in the world of computing

God never fails you!