Extensible Markup Language

Extensible Markup Language (XML)

XML - Abstract

XML - An Introduction

XML - The Origin

Component Interoperability

New Component Technology

XML and Transcoding Technologies

Future of XML

Abstract

It is being keenly watched that the Web is to change in two ways: first, to become more collaborative and, second, to move from presentation to multipurpose processing. While a more collaborative Web requires browsers to morph into full-scale multimedia editors, the second change to the Web requires re-engineering of the lingua franca of the Web. That is, where HTML is a presentation language, the next-generation Web requires a richer communication set that can be automatically processed by computers, traveling software agents and end-user browsers. So, the key idea is that the next generation Web will be easily traversable by computer programs as it is surfed by browsers.

XML - the Origin
The first age of the Web coincided with raw Connectivity, mostly achieved through TCP/IP and the higher-level HTTP protocol. A proliferation of HTML pages became available for download, giving people the capability to access and provide information across the globe, wherever an HTTP web server was up and running.
During the second age (1997 - 1999), HTML was standardized as the universal Web language. It was enriched with new robust features and a number of enriching technologies blossomed around it. Also came a number of multi-tier architectures such as Windows DNA and CORBA and modern distributed applications. The more that Web applications evolve and increase their basic set of functionality, the greater the need for interactivity, better performance, and programmability. In a single word, we need interoperability that is, any software program, running on any platform and written in any programming language, should be able to access the information contained in any Web page, exposed by any Web server, running on any hardware/software platform.
HTML is only a presentation language that is good for providing graphic page layout and can't provide the actual information as a completely separate component due to the fact that within HTML, data and graphical elements are mixed together. However, there are cases where we just want the data and do not need any of the additional formatting that HTML provides. For example, say we want to get up-to-date stock values - we could navigate to a URL, type in the stock symbols, submit the request and we get a slightly modified version of the same page with the information we need. However, if we just need the data so that we can use it in another application, HTML can't be adequate. Finding the solution to this problem moved us towards the third age of the Web: programmability. A programmable Web site is made of programmable languages such as Microsoft ASP, Sun JSP or Java servlets. By the expression "programming the Web", we mean being able to interrogate existing pages or UR! Ls. This immediately faces at least two classes of problems. They are platform incompatibilities and component interoperability. There is one more subtle problem: what are we going to use as the format of the data that caller and data provider should exchange. The global software community has found a valid answer to these problems, which relies heavily on HTTP and XML.
We have described about platform incompatibilities here.
Component Interoperability
Component technology makes it easier for developers to build distributed and cross-platform applications with the same ease as for Internet applications. HTML can't meet the requirements of designing Web components. HTML was fine and effective as long as the Web was a self-contained world per se - the second age of the Web. Then, the Web was a sort of meta-application, with HTML as its presentation language and its user interface. When we enter that world we can have read-only data, updateable data, client-server interaction, made-to-measure pages and more. However, the third age of the Web pushes and requires programmability and HTML turns out to be the wrong language. HTML is good in presenting data, but not to describe it. If we want instead to be able to manipulate data as-is from the client, HTML is not the proper tool. We want now our data to be available as a separate entity from the page layout and this is exactly what XML allows us to do.
An Introduction to XML
XML was developed by the World Wide Web Consortium's (W3C's) XML Working group with the aim of making it an application and vender neutral - which ensures maximum portability. XML, like HTML, is a subset of Standard Generalized Markup Language (SGML). XML is a widely supported, open technology for data exchange. XML brings several distinct advantages Web programming. HTML is for describing how content is rendered but XML is for describing structured data - content is separated from presentation. Because an XML document contains only data, applications decide how to display the data according to the type of client machines. That is, the data would be displayed differently if the client is a personal digital assistant (PDA) instead of a laptop computer.
HTML has a set of fixed tags to display the content, but XML allows document authors to create their own set of markup tags for any sort of application towards describing any data for the application apart from presenting them using Extensible Stylesheet Language (XSL). This extensibility helps document authors to create entirely new markup languages to describe specific types of data, including mathematical formulas, chemical molecular structures, music, recipes etc. Some of the markup languages developed using XML include MathML for mathematics, CML for chemistry, VoiceXML for speech recognition, the Synchronous Multimedia Interface Language (SMIL) for multimedia presentations, and Extensible Business Reporting Language (XBRL) for financial data exchange.
As XML documents describe the data they contain, it is possible to search, sort, manipulate, and render an XML document using a number of technologies such as XML parsers and XSL. XML documents are highly portable due to the fact that XML is text language. There is no necessity for any special software tool to open and read an XML document. ASCII/Unicode characters can be used to create an XML document and hence it is both human readable and machine-readable.
In order to process an XML document, a software program called an XML parser or an XML processor is required. There are a number of such kinds of software tools coded using different programming languages, such as Java, Python, C etc. Parsers are bound to check an XML document's syntax and can support the Document Object Model (DOM) and/or the Simple API for XML (SAX). DOM-based parsers build a tree structure containing the XML document's data in memory. This facilitates the data to be manipulated programmatically. SAX-based parsers process the document and generate events when tags, text, comments, etc are encountered. These events return data from the XML document.
An XML document can reference an optional Document Type Definition (DTD) file, which defines how the XML document is structured. When a DTD is provided, some special parsers, called validating parsers, are able to read the DTD and check the XML document structure against it. If the XML document conforms to the DTD, then the XML document is valid. Parsers that can't check for document conformity ignore the DTD and called non-validating parsers. If an XML parser is able to successfully process an XML document that does not have a DTD, the XML document is considered well formed, that is, it is syntactically correct. By definition, a valid XML document is also a well-formed XML document. To be usable by an application, an XML document must, as a minimum, be well formed.
New Component Technology
XML is a key element towards building a new Web-oriented component technology. XML is much more than just a powerful language for data description. Mostly due to the all-encompassing meaning of the word "data", XML is about to become a sort of component technology for distributed data. This technology is set to revolutionize the way we deal with data. XML is text-based and, as such, human friendly. It is easier to read text-based documents. As text, it is portable to any platform, especially once we have encoded it properly. Further more, XML is about data description and we can model any data through XML. That means, we can even define an XML data schema to describe a remote function invocation. Once we do this, we are close to the ultimate goal of obtaining a distributed, platform and language independent component technology. No special run-time or API is needed. All of the necessary infrastructure to process the method invocation could be buried into the Web sever, the ope! Rating system, the ORB or even elsewhere.
Thus in this new millennium, we can't do anything without XML. XML is successful because it is simple, effectively easy to author, and more general than HTML. It is a data description language that can describe everything including method invocation on remote objects. Also XML is supported on almost any platform. Finally XML is suited to transporting the method call from client to server, once again regardless of the platform and the component technology used to design the server classes. Once the server has understood and processed the request, the return data will be packed in XML and transported back to the client through the same mechanism.
XML and Transcoding Technologies
The current trend on the client side is to have a wide range of clients oriented toward specific purposes, such as smart cards, personal digital assistants, smart telephones, set-top boxes, and smart automobiles. The usefulness of these new clients increases greatly when they have a wide range of content available to them and this content is often provided by Internet servers. Since these clients are mobile, wireless connectivity and the ability to support disconnected operation are important to them. These diverse clients have differing requirements for communicating and presenting data. When connected with Web servers, the best approach for working with these clients is to provide an easy means of translation and tailoring of data to meet specific client needs, a job that is easily handled by XML and transcoding technologies.
Future of XML
There will be a lot of work to be driven by the ebXML initiative and by the World Wide Web Consortium. The ebXML initiative is an international one established by UN/CEFACT and the Organization for the Advancement of Structured Information Standards (OASIS). This initiative is developing a comprehensive set of XML middleware standards for business process modeling, core components, registry and repository and transport. There will be a number of industry-specific domain standards for XML.

Click for an overview of Components of XML
Click for a host of Benefits of XML
Click for Internet resources for XML
Click for an overview of SOAP