Data and Documents – Once There and Back
Modern document logistics is much more than switching from strictly physical to electronic delivery. Ultimately, it is the direct exchange of raw data via a central output instance.
What does traditional document processing look like nowadays? The raw data for a process is first converted from specialized applications into human-readable content (composition), then formatted, e.g., as a letter-sized document, printed, and sent to the recipient. From there, it follows the entire route back, i.e., scanning, analysis/text recognition via optical character recognition (OCR), "de-formatting" the document, and finally extracting the raw data.
Or digital documents that could be read and processed via machine are first converted into analog form, i.e., print, and then into TIF or JPG documents, resulting in content consisting of “pixel clouds.” The actual content is initially encrypted (raster images) and then rendered "readable" through OCR. Not only is this cumbersome, but also involves the loss of semantic structural data needed for later reuse.
The problem is that this approach is oriented to the letter-sized page format, which is fine for a print, fax or archive file, but not for mobile end devices and the Web. It would be much better to transfer the raw data only. In other words, document creation and delivery must occur outside of the specific specialist application. Ergo the page size and output channel are not selected in the application, but much later than is generally practiced today.
Is PDF delivery still in keeping with the times?
Of course the adoption of the now ubiquitous electronic PDF is an important step to shortening the cycle described above. But it is just a beginning. After all, what good is a PDF document if it has no metadata for multi-channel-capable processing? Technologies like XMP were, in fact, developed specifically for storing metadata in an electronic document for automatic read-out on the recipient side and transfer into the given application (ERP, CRM, etc.). This certainly advances automation in document processing, but it is by no means the end of the road. For one thing, PDF is also page size based, which means tedious “de-formatting" for delivery on mobile end devices. The gain is marginal, considering that processes like de-formatting and decomposition are complex and usually require expensive tools.
So what does document processing look like in the future? Without a doubt, the most elegant method is to create an interface for the pure data, independent of page format, layout and channel. That is really the only way to efficiently prepare documents of all types and formats for digital and physical communication routes. For companies, this means separating document creation from delivery and setting up a central document and output management instance. This hub uses defined rules and criteria from the different departments (e.g., sales, marketing, service) to determine the data, layout, format and output channel, always tuned, of course, to the recipient.
Centralization not only benefits the processor, who is free to concentrate on his or her core business. It also provides a reliable overview of which documents left the company in a given time period. Other criteria can also be monitored, of course, an advantage not to be underestimated: many firms lack an accurate picture of just how much is printed, faxed, and sent electronically. What document management lacks is the 360-degree view.
Recipient and process determine the channel
Strictly speaking, multi-channel communication means breaking away from a specific page format so that every document can be output on any channel without expensive workarounds such as de-formatting. Because today customers do communicate with companies via a number of channels. Mr. X, for example, still wants his insurance policy in hard copy, but would prefer his monthly debit notification as an e-mail attachment, or better yet, sent directly to his smartphone. In other words, a delivery medium is chosen for each and every business process. But that is possible only through central processing where all document-related communication pathways converge, particularly if adding new channels is straightforward.
In this context, HTML5 has certainly paved the way toward modern document processing. The text-based markup language is already setting the tone with mobile platforms such as the iPhone, iPad and Android devices. And it’s no wonder: HTML5 content can be easily processed for any electronic output channel, be it a smartphone or a Web site. And if print is your preference, it's still an option.
Conversion to PDF files is also possible. HTML5 is currently the most intelligent format for the creation and display of documents, regardless of size or output channel. It allows dynamic, size-dependent display, e.g., from letter-sized to smartphone, conversion from any layout to text-oriented formats, extraction of individual data (including retrieval of invoice items) and building tables of contents and index lists.
What is data, what is a document?
The fact is in these multi-channel times, "painting" a letter-sized page using page composition tools is the wrong approach, because the target layout can be anything from 2 to 24 inches. Instead, companies need to invest in document logistics capable of taking data from a given application and preparing it specific to the recipient and output channel.
What is needed is information technology that maps the entire document management cycle in a central system, and specifically for all applications that generate documents. Clearly defined rules for corporate design, output formats, and handling of metadata are stored based on business logic. This makes the question "what is data and what is a document?" even more important. The boundary is not always clear, but one thing is certain. The further downstream in the document logistics process the output channel is chosen and the more strictly the business process remains separate from document creation, the more flexible the company is.