On March 19, I am presenting an overview of XML as part of a “digital masterclass” offered by the Independent Publishers Guild. A talk can dedicate only a passing moment to defining terms, and for better or worse XML is term-heavy.
Solution: here’s a post that provides a publishing point of view on the most frequently used XML terms. This isn’t an attempt to provide the definitive list, just a hopefully accessible (albeit somewhat long) one.
Extensible markup language, abbreviated XML, gives publishers tools to mark up text documents, including books and articles. XML and its predecessor, Standardized General Markup Language (SGML), have been in use for more than 25 years.
While XML is a (relatively) new, structured way to mark up text documents, mark-up itself is something that publishers and their supply chain partners (as well as their readers!) have been doing for decades. Using XML as part of their editorial and production processes, publishers can streamline subsequent conversion and reuse of their content.
Elements are the basic unit of content and markup. In publishing, a book is the “root element”, made up of other elements that would typically include a title page, a table of contents, chapters, paragraphs, footnotes and other structural components.
XML uses tags to mark the start and end of the text blocks for each element. The tags follow a standard format, but the tag sets (the collection of terms used to define the elements) can be customized to a particular publisher, type of book or project.
A document type definition, abbreviated DTD, provides a summary of the rules that govern the structure of an XML document (for example, that each paragraph must be assigned to a chapter, and that no paragraph can be part of more than one chapter). DTDs are useful in validating XML documents before they are used to publish content.
The tags used to identify elements in an XML document do not contain formatting instructions. Text attributes are applied using style sheets, which (loosely) fall into a handful of categories, including cascading style sheets (CSS), transforms (XSLT) and formatting objects (XSL-FO).
Cascading style sheets are the simplest option. They are made up of a list of the elements in a book file, accompanied by the styles that should be applied to each element. These style sheets can be applied quickly and are typically used to format content for display in a web browser.
XSLT files are XML applications that convert one XML document into another. As an example, an XSLT application is used to transform an original book file into the EPUB format, which is emerging as the de facto standard for e-book files. When the original book file is properly structured, the transformation is simple and can take minutes.
XSL-FO files are XML applications that comprehensively describe the layout of text on a page. Publishers use the XSL-FO transforms to convert an XML file into a readable, printable document.
Hypertext markup language (HTML) was developed in the 1990s as an SGML application. The successor to HTML is XHTML, an XML application that separates structure from presentation. Styles are typically applied using CSS. Minor changes in a CSS file can be made to quickly change the appearance of any XML document, even an entire book.
And finally, ONIX is “is an XML-based family of international standards intended to support computer-to-computer communication between parties involved in creating, distributing, licensing or otherwise making available intellectual property in published form, whether physical or digital.” That explanation comes with the good graces of Editeur, custodian of the ONIX standard.
Next up: a comparable summary of content-specific terms related to the introduction of XML and its features.