Sept. 4 2008 12:00 AM

Even if you are not a programmer, you have likely heard some mention of XML within your organization and throughout the industry. Certainly, your technologists are using it when explaining how they are going to implement a new solution. Often, it is spoken of in almost reverential terms, as the cure-all for a variety of problems.


XML stands for "eXtensible Markup Language,•bCrLf a structured data interchange format. While most often associated with Internet messaging, it can be used in many other ways, such as storing content or interacting with a database. But what is a markup language? A markup language is a system of tags (strings of characters) to name parts of a document and how they should be handled. Sometimes a tag will have extra information (attributes) included to modify the tag's interpretation. Markup languages originally were used in typesetting and word processing but now have found prominence in electronic systems. In fact, the word markup originated from the notations that a typesetter would place on a manuscript to indicate fonts and layout that should be applied. If you've ever worked with galley proofs of manuscripts, the standard handwritten markings used are another form of markup.


There are two classes of markup languages:


1      The original procedural type indicates how parts of the document should be processed. Microsoft's Rich Text Format (RTF) is a common example of this class, as well as Adobe's Portable Document Format (PDF). Early examples were for computer-based publication, such as TROFF and TeX.

2      Descriptive languages, such as XML, which only indicate structure, keep the procedural details as independent information that may change based upon the targeted use. 


Markup languages are useful because they permit information to be exchanged between different organizations and systems, independent of the technologies used to create and utilize the document. Today, XML is the most prominent markup language and is being used to describe information in and sent between hosts of computer applications. 


Why XML? Firstly, it is vendor and platform neutral. Since XML tag information is coded using a universal character set (Unicode coded into ASCII characters), it can encode any language and be read on any computer system. In fact, sender and receiver need to know nothing about the technology being used to create and to utilize the XML message. They only need to use an agreed-upon document definition that defined the tags and their attributes that may be used.


Secondly, XML permits the use of a small set of available and proven tools, which provide the mechanisms for extracting information from an XML stream or transforming it into another format. For example, modern web browsers can receive an XML message and, at a minimum, display its structure and most types of content without needing to know the message's usage or meaning. The same message may be transformed using XML-related tools to be read on a mobile telephone display, a web browser and as input to a proprietary ERP.


Lastly, XML-based industry standards are being introduced to standardize how organizations can exchange information and interact with one another:


  • Health Level 7 (HL7) is the most frequently used communication standard in the health care industry worldwide. It has published an ANSI-accredited document standard, the Clinical Document Architecture (CDA), which is based on XML. HL7 is also working on a proposal to apply XML as the interchange format in its version 2.x messages; nevertheless, HL7 has definitely agreed to use XML as the standard interchange format of its new model-based version 3 messages.

  • The Association for Cooperative Operations Research and Development (ACORD) is a global, non-profit insurance association, whose mission is to facilitate the development and use of standards for the insurance, reinsurance and related financial services industries. ACORD has developed a number of XML-based standards for the insurance industry to exchange messages in a technology-independent manner.

  • Automobile makers and dealers have created under the Standards for Technology in Automotive Retail (STAR) group a set of voluntary standards for communication over the Internet. This scheme seeks to simplify dealer operations and reduce the effort required for tasks, such as locating parts, scheduling service or retrofit appointments and applying for leases and loans.

  • Process automation systems are converging on XML-based standards, such as Process Specification Language (PSL) and Business Process Execution Language (BPEL), which will permit design and description systems to communicate with multiple engines and permit these engines to communicate with each other.



XML's Supporting Players

•     Document Type Definitions and Schemas are used to define a specific XML document type. 

•     Namespaces allow multiple document definitions to be used in one XML document.

•     XPath permits accessing pieces within an XML document. 

•     XQuery is used to query access and manipulate XML databases.




XML did not spring forth suddenly; it was developed based upon earlier international standards. The immediate "father•bCrLf was Standard Generalized Markup Language (SGML), which was itself based upon several proprietary solutions. SGML was targeted at the development and publication of large technical documents and solved problems arising from incompatibility between text editing, formatting and database applications. One example of its use was in the production and revision of aircraft maintenance manuals, which are generally unique to each plane. For example, an entertainment subsystem that is not installed in a particular plane, but is often included in that specific model, might be in the manual's SGML yet not printed in the hard copy form or included on the customer CDs.


But SGML was often viewed as being overly complex and difficult to employ for simpler projects. When Tim Berners-Lee invented HTML and the World Wide Web, he based it upon SGML. As we have observed, HTML and the web have become universally prominent, with millions of pages being created using text editors, graphic design tools and programmatically. However, as Berners-Lee simplified SGML to make web page creation easy, and as later contributors expanded on his concept, there were many "tags•bCrLf that were implied or did not require matching closing pairs (an SGML requirement to maintain structure).


Automatic processing of web pages is difficult or impossible since it can be difficult to identify and extract values. Therefore, XML was then created to solve that problem. After XML was developed, a restricted usage set for HTML that conformed to XML structural rules — XHMTL — was adopted to help bring the two worlds into alignment. In XHTML, all tags must be closed and all attribute values in tags must be quoted. In addition, all tag and attribute names must be lowercase in order to be valid. HTML, on the other hand, was case insensitive. Since January 2000, all new standards relating to HTML are now based upon XHTML.


Organizations are quickly realizing that they can use XML and XML tools to solve a number of technical problems. If you are connecting two applications that may run on different machines or in different organizations, you can use the existing remote call tools or define your own XML messages and use the tools to simply process them. Such application-specific markup languages based upon XML include:


•       Resource Description Framework (RDF) is a set of specifications that can be used to model information. It is a tool for the development of Semantic Web efforts to permit automated use of web-based data, as well as finding increasing use in knowledge management.

•       XForms is a specification for web forms and form data. XForms definitions are often packaged in an HTML page, but they may be used with other tools. Most forms packages support XForms as a storage format, permitting forms designed with one tool to be modified or filled with another.

•       DocBook is a specification for describing the logical structure of technical documentation without dictating its display.

•       SOAP is a protocol for exchanging XML messages, usually over HTTP, that is used for such tasks as remote procedure calls. Sponsored originally by Microsoft, it has now become an official standard and a component of many service-oriented architecture solutions.


As the business landscape continues to evolve, the demand for interconnectivity of corporate information, no matter what technologies created it or is receiving it, is only growing. Therefore, organizational dependence on XML, its tools and other markup languages is only inevitable. With the ever-increasing demands of our customers to provide instantaneous service, the ability to receive, share and utilize all information, no matter the source, will set those companies with this capability apart from those who don't. 


Bernard Chester [] is an authority on designing and implementing document management-based solutions, including integrating EDMS with other technologies and designing and implementing Internet interfaces and custom tools for systems.