Sunday, June 05, 2005

What is xml?

So this 3 letter word is doing storms in the IT world since its inception in late 90's. Now what is it all about? I hear lot of folks talking around, defining, trying to understand, trying to explain other, of what XML is. Even I my self have indulged in all such discussions. I kept hearing lot of definitions floating in the air some saying "Its the standard to encode data", "its enhanced version of html", "its extensible HTML, you can create your own tag", but why on the earth would I need to create these tags? What for?

The understanding which I build up in this due course of discussion and reading was that "XML is a standard way to encode the data which is pertaining to anything ranging from transaction details, list of entity, a message for some application, configurations, metadata etc. and the only way it differs from the a simple text file is that in XML data is stored in hierarchical fashion and an XML document is bound to some schema or DTD which specifies the structure and content of this hierarchy." XML is a way to package a data. Now this packaged data could sit in a file or network packet or a message or a database table or anything.

I didn’t worked with XML per se. As such there is nothing like working with XML. XML is not a programming language which one can use to create some application or neither its meant for presentation like HTML. As some one has said that one will encounter XML everywhere. Even when your car will break down, it will send an message in XML to the nearest service center for necessary help.

XML is meant for nothing in specific but for everything. I kept seeing XML everywhere in the last couple of years,
- Configuration of various applications/servers
- Web services are sending request and response in XML
- The report I create using some tool gets stored in XML file. Its not just reports but any meta data generated using any wiggy wizard tools gets stored in the XML.
- I write my descriptor file of EJB in xml, my strut config is in XML
- The WML is again an XML
- The process flows are getting stored in the XML
- The presentation information is getting stored in the XML and is transformed to particular rendering device using some translation
- I export data from the database in XML and import it to any database.

These and many more. I wonder why use XML everywhere if the bare simple text files can do the same? Okay what could be a bare text file look like which stores the list of books and their details:

Option 1 (attribute value):

Book: Abc
Author: Xyz1
Price: 100
Pages: 252
Book: Abc1
Author: Xyz2
Price: 150
Pages: 531

Option 2 (comma separated):

Abc, Xyz1, 100, 252
Abc1, Xyz2, 150,531



And may be there could be some more.
For both of the above options the application has to make necessary assumption when consuming or generating this text format. In the first one, all the attributes for one book should be placed together vertically and in the second all the attribute for a book should be place horizontally together in a particular order. And in case if this file needs to be extended to store some more attributes for a book, for example Publisher information. So what all needs to be changed? Application? File? We understand it. It will be heck of work.

On the other hand, XML is also the bare text with some structure and some syntax to follow. That’s it. Something, which I have found till now, which is bit convincing to me and which stands XML better then simple text encoding:
1. The structure of a xml document is extensible without effecting much of application. You can extend the xml document to store some more information without effecting the application which is using it
2. The data is stored in hierarchical fashion something like:
<?xml version="1.0"encoding="utf-8"?>
<Books xmlns="http://tempuri.org/XMLFile1.xsd">
<Book>
<Name>Abc</Name>
<Author>Xyz1</Author>
<Price>100</Price>
<Pages>252</Pages>
</Book>
<Book>
<Name>Abc1</Name>
<Author>Xyz2</Author>
<Price>150</Price>
<Pages>531</Pages>
</Book>
</Books>

This could be very well extended to store the new attributes without really bothering the application.

3. Availability of lot of parsers and DOM (Document Object Model, API for accessing for processing XML document) for various programming languages. So generating and consuming XML document is easy

There are two guidelines, which every XML document has to follow:
1. The XML document should be correct. This means every opening tag should have a closing tag. The structure should be correct. And the tags are case sensitive.
2. The XML document should be valid. This means that the arrangement of tags, there attributes and there values have to follow certain scheme. This scheme is specified in the DTD or XML schema, which is associated with the XML document. In the above example it is http://tempuri.org/XMLFile1.xsd which specifies the schema of the XML document.

One can Google and find tons of commentary on XML, XML toturial, applications of XML, current happenings etc. www.w3c.org is the place to get the latest on what’s happenings in XML world.

No comments: