How Your Doc Comp Tool Is Smashing Your Data
How much money are you willing to pay to destroy your most precious data? Thousands of dollars? Millions? The very idea of it seems ridiculous--but I’ll bet you pay a big price for it all the time and don’t even know it.
Organizations in the high volume transactional document output industry are awakening to the fact that their document composition tools wreck their data and that they lose content every time they print out their content—and there is virtually nothing they can do about it.
I’ll get to how this happens in a moment. But first a quick story to illustrate my point: A friend of mine was doing a major renovation on a mansion he bought—the house had been built by a very infamous NYC gangster and was built to be a fortress, complete with a machine gun nest on the roof, metal shuttered windows, a panic room in the basement and an escape tunnel that led to a shed in an empty lot about a quarter mile away. The gangster was in prison and would not be coming back; the government auctioned off the house; my friend claimed he got it for a very good price. But it needed work.
During the renovation there might be 25-30 workers in the house at any particular time. It was extremely chaotic and inefficient. In fact, one time my friend actually witnessed a painter crab-walking down a hallway with a roller, painting the wall a beautiful eggshell white. Behind him was another workman with a sledgehammer, knocking down the freshly painted wall.
Something similar happens in the content and document output industry. Once upon a time in the document output industry the most important function was putting print on the paper and getting the paper in the mail. The document was composed with print in mind and when the final product had passed its final QA it was printed, inserted and mailed. No thought was given to re-usability after it was printed. The job was done.
So the process back then was to: 1) take data, 2) add a little formatting, 3) put it all together and send it to the printer. All of this became more complicated as the document composition vendors added complexity along with new functionality to their tools. The documents looked great. But an unfortunate by-product of the process was that data was damaged in order to achieve the goal of print aesthetics. Let me explain.
Let’s say one Monday morning someone from your creative department decides they just have to use this new font they saw over the weekend in a magazine advertisement. They do the research and find it is a custom font. Additional research and $200 later they’ve secured the font and load it on their design system. The new font, called “Bohemian Celtic Classic” seems to be just the ticket for grabbing consumer’s attention. Tests bear out the premise: the font looks great, is unusual and catches the eye. The sales team tells the customers about this font and they agree to use it in their next print run.
The new font is imported into the document composition software but they find they can’t use it—so they turn it into a raster graphic of the font. They generate a print stream, for example an Advanced Function Presentation (AFP) stream that drives the production printer. The document is printed, looks great, and works wonders for the customer. Everyone is happy.
Except then the decision is made to archive the document to help customer service do their job more efficiently. And a certain segment of the market wants the document presented on a web browser. And another segment wants to receive it on their phone. So you go back and try to retrieve it so can be re-purposed. But it’s gone. The text printed in Celtic Bohemian Classic has been destroyed. The text that was in that field is gone. It’s like the home renovation story I told above: the paint job looks great—but the wall was destroyed. Worth the money? I don’t think so.
And it’s not about fancy fonts. It’s not about someone adding color. It’s not about someone thinking it’s a good idea to insert a message into some white space on the document. It’s about the state of the documents that come to you in IT and Operations—and it’s about how they come in, the lack of control you have over them, and the limited time you have to make it work for both the operation and the customer. None of this stuff is your fault but it’s all your problem. Maybe you would like to know how we got here?
So what happened is that a long time ago most of the document composition software vendors made the decision to simply turn font characters into a raster graphic (not all did this, but many did and still do). In doing so, they destroy some field data. What’s left is the image of the data. It’s like the Plato’s cave analogy: we think we are seeing the real thing, but it’s merely a shadow of the real thing. From the human eye perspective it is fine, but there is a consequence—and that’s what people in the industry are beginning to understand.
One of these consequences can’t be seen—it’s the data that accompanies the data through the entire process: the metadata. And IT and Operations managers are finding that they need that metadata, they want that metadata and that if they want to deliver the document in a form other than print then the metadata being gone is actually a huge problem.
They are coming to Compart for solutions to solve these problems, and we are working with them to find them. The customer wants that field data and the metadata. Again, if you are the end user it really doesn’t matter because these doc comp tools are so good at what they do that you can’t tell the gyrations the comp engine went through in order to make this look like something you could accept. Document composition software is amazing at keeping the visual aesthetics so perfect that the human eye is unable to tell the difference. As long as it goes to the printer all is well.
But that’s not where we are right now. Right now the content matters more than the print aesthetics. The content matters to the mobile device, to the web browser, to the archive. And the data being destroyed as part of the process is increasingly problematic.
It’s especially problematic when big service bureaus acquire other companies and fail to architect the applications as they onboard them. If you don’t have your procedures and architecture set as you onboard you can end up in real trouble. Because the solution for this issue after the fact is labor intensive, technologically complex and expensive.
So to get back to my original scenario, the IT Director asks, “Remember that job with the funky font you ran that the customer loved?” He wants to be able to use the data as part of a project to create a mobile application so he instructs you to go back into the print stream and extract the data you used.
But you don’t have the original data. You only have the print stream. As a service provider this is often your only data source. As a result you have only a print file to send to the mobile device. We can convert the file to PDF or HTML5, but de-formatting the page is a laborious process. It’s actually much easier the other way around.
In the new world we live in, the content matters, not the page. So we need to understand that if the document was created for a specific channel (i.e., print) then the system that controls the channel will likewise control the document. We need to get past our Guttenberg mind-set and begin with fresh eyes to create content without worrying about pages. We need to stop painting pretty pictures of documents and begin putting intelligence into the process. A channel-neutral content hub is inevitable. But that’s a story for another time.
Author: John Lynch, VP of Technology and Manager, North American Region, Compart