An XML primer (August 1999)

XML ‘Xspert’ Antoine Rizk has kindly supplied this introduction to the new language. You can check out the fruits of his labors on the Euroclid website http://www.euroclid.fr (you may like to have a French dictionary to hand).

What is XML?

XML, the Extensible Markup Language, defines a universal standard for electronic data exchange. Described as “the ASCII of the year 2000”, XML may be the solution to problems of heterogeneous databases and data structures. XML specifies a rigorous, text-based manner of representing the structure inherent in data, so that it can be authored and interpreted unambiguously. Its’ simple, tag-based approach leverages developers' familiarity of HTML, while providing a flexible, extensible mechanism that can handle "digital assets" from highly structured database records to unstructured documents and everything in between.

W3C

XML is an Internet Standard way of tagging data. As a web-centric subset of the authoritative SGML ISO standard, XML is based on a proven technology with a good track record. The Worldwide Web Consortium (W3C) recommended the XML 1.0 standard in February 1998, and it is being widely and rapidly adopted as a standard for document and data exchange in a variety of markets.

support

XML is gaining wide industry support as well from vendors including Oracle, IBM, Sun, Microsoft, Netscape, SAP and others, as a platform and application-neutral format for exchanging information. XML is ‘extensible’ and has spawned several derivative standards, for defining schemas, presentation style-sheets, hypertext links, API manipulation, and XML query language.

XML vs. HTML

The following example shows the difference between XML and HTML and demonstrates the advantages of using XML for archiving, transferring and querying data. The HTML in the snippet below uses tags to present data in a row of a table. But there are many ambiguities. Is "Document Engineering" the name of a book? A university course? A job skill? one cannot be sure by looking at the data and tags on the HTML page. A computer program cannot figure it out either.

HTML code snippet

<HTML>
    <BODY>
        <TABLE>
            <TR>
                <TD>Document engineering</TD>
                <TD>DESS IDM</TD>
                <TD>Antoine Rizk</TD>
            </TR>
            </TABLE>
    </BODY>
</HTML>

If we look at the analogous XML example below. It's exactly the same data, but the tags indicate what information the data represents, not how it should be displayed. It's clear to the user and to a computer that "Document Engineering" is the Name of a Course, but it says nothing about how it should be displayed.

Simple XML Page

<?xml version="1.0"?>
    <Course>
        <Name>Document Engineering</Name>
        <Department>DESS IDM</Department>
        <Teacher>
            <Name>Antoine Rizk</Name>
        </Teacher>
    </Course>

So XML represents information content, while HTML represents the presentation of that content.

Style

In order to present the above XML example on a screen, one can transform it into its HTML equivalent using a program, or simply a standard stylesheet language designed for that purpose, called XSL. The advantage of using the latter is that it can be sent for interpretation on the client side. Many different XSL sheets can be defined for a single XML fragment and sent to different users according to their profiles and platform configuration.

XML architecture

The figure above illustrates what the future information system architectures will look like. Here, the user has at his/her disposal an HTML terminal or an XML one, used in client mode to browse heterogeneous databases connected in a three-tier architecture to the internet.

document database

One database could be a document base, another one could be a GIS, and a third one could be a relational or object database. Each tool being used where it fits best. Currently, queries are dispatched to databases in SQL or OQL, or another proprietary form. In the future, queries will be sent in a unified manner in XQL, the XML Query language. Results are generated, joined and assembled back in XML, then sent to a presentation server which uses the future XSL style sheet language to transform XML data into HTML. More info on XML from www.gca.org and www.oasis-open.org/cover.

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@oilit.com with PDM_V_3.3_9908_4 as the subject.

© Oil IT Journal - all rights reserved.