Compart - Document- and Output-Management

Use Cases

Automated Document Checking

Compart |

Analysis and Review of Documents

Every company that runs personalized advertising campaigns is familiar with the phenomenon: the number of variants in direct mail is increasing rapidly. Several thousand different versions within a campaign are not uncommon. It would be a futile endeavor to manually compare each document with the template. The risk of error is too high, especially as the complexity of mailing campaigns increases.

One hundred percent reliable document checking is essential. Tools for automated document testing provide the technological basis for this.

Summary

Reading time: 5 min

  • Automated document analysis and review
  • Regulatory compliance and quality assurance
  • Arvato: Nearly 100% document checking reliability
  • Methods of document testing

The Goal of arvato Bertelsmann

In the past, document checking at arvato was mainly done manually. But with the increasing volume of invoices, the company quickly reached its limits: The purely visual comparison was not only time-consuming, but also did not offer the one hundred percent certainty on which the service provider depends.

When checking by means of various test scenarios, a residual risk always remained. Therefore, arvato urgently needed a tool that would automate the document comparison and take into account all changes - even those that are not visible to the naked eye.

Solution

Since 2012, arvato has been working with the DocBridge® Delta document checking software - and is thus able to compare documents both visually and at the text level. The solution detects even the smallest differences, even in complex and extensive documents. Thanks to the high degree of automation, document analysis at arvato is now not only more secure, but also more efficient: Employees can concentrate better on their core business.

Benefit

  • Nearly 100 percent process reliability
  • Processing of all common data formats
  • Concentration on core business - higher productivity
  • High throughput (approx. 7,000 documents daily)
x

Compliance and 100% Quality Assurance
Learn How to Reap the Benefits of an Automated Document Quality Assurance

Other aspects

In the past, document testing at arvato was mainly done manually. But with the increasing volume of invoices, the company quickly reached its limits: The purely visual comparison was not only time-consuming, but also did not offer the one hundred percent security on which the service provider depends. When checking using various test scenarios, there was always a residual risk.

Advantages of testing methods with DocBridge® Delta software:

  • Integrated management of templates, images, text modules and other resources for consistent and coherent documents
  • High flexibility and scalability
  • Impress Designer (creation of templates) and Impress Engine (production of documents) are also available as API services (including for cloud environments)

„Data analysis today is not only more secure, but also more efficient; after all, thanks to automation, employees can better focus on the core business."


Roger Fuchs
arvato Bertelsmann

Digital document comparison
Learn more about the complete success story of Europe's leading provider of business process outsourcing.
View entire reference

Quality assurance of complex documents with variable data
Immediately after the introduction of the powerful tool, the full-service provider for direct marketing achieved a significantly higher processing speed for around 80 percent of the checks.
To the sucess story

Direct mail solution
Naehas has achieved higher quality and less effort in high-volume personalized advertising campaigns with the use of verification software.
Read reference

Basic Principle and Architecture of DocBridge® Delta

DocBridge® Delta offers three basic technologies for verification:

  • An interactive interface for ad-hoc document testing
  • A command line driven call for automation
  • Web services

The solution developed by Compart finds and analyzes differences between individual documents electronically and also visualizes them. In the process, the objects coded in the files are examined down to every detailed structure and the differences are displayed - both visually and textually. Certain areas can be excluded from the comparison, for example the variable address field.

Visually Analyze and Compare Documents

During visual comparison, both documents are rasterized into a pixel image of the same resolution and the converted pixel images are compared with each other - similar to a light table where both documents are superimposed to detect differences between them. In the generated file, the pixels that match in both documents are displayed in gray. If, on the other hand, individual pixels occur in only one of the two documents, the pixels present only in the reference document are displayed in green and the corresponding pixels in the document to be matched are displayed in red.

In this way, even the smallest differences in objects, such as margins that differ by only one pixel width in the case of slightly differently cut characters of a font, are immediately recognizable. If the document to be compared shows that the entire content has been shifted, an additional setting can be used to compensate for the shift in order to check whether there are any other differences between the documents apart from the shift.

If there are individual pages in the documents to be compared that are not present in the comparison document, these can be suppressed during the comparison, for example to exclude additionally created blank pages from the comparison. The difference document with the comparison points highlighted in color is output as a multipage TIFF document or as a usual PDF.

 

Structural Comparison at Object Level

DocBridge® Delta writes the detected differences of the individual objects on each page in detail to a log file. From the entries in the log file, differences of the following type, for example, can be found:

  • Different positioning of objects
  • Differences in the content of objects such as text differences or different barcode content
  • Different attributes of the same object types such as different fonts or font attributes such as font sizes with the same text content

Before checking, the documents to be compared are converted to the Unicode-based metaformat, so that even if a different code page is used, only real text differences are registered.

Other document testing options

The solution developed by Compart finds and analyzes differences between individual documents electronically and also visualizes them. In the process, the objects coded in the files are examined down to every detailed structure and the differences are displayed - both visually and textually. Certain areas can be excluded from the comparison, for example the variable address field.

  • Binary comparison between two document files
  • Check for size differences with optional specification of a minimum value in bits or pixels (only size differences above a certain value are considered)
  • Comparison based on number and arrangement of pages
  • Comparison based on positioning tolerances (only positioning differences above a certain value are considered)
  • Exclusion of definable page ranges for comparison in the case of changing content such as address fields or dates
  • Differences in meta information such as different indices within TLEs in AFP documents or XMP information in PDF documents
Document Testing Checking Comparing

Background Knowledge
Methods of Analysis and Testing:

Template-based document testing

What to do when you need to reliably test hundreds of different documents against a specific template, but the files to be checked are not identical in length and structure? Conventional tools allow automated document comparison only if the reference document and the candidate document have exactly the same number of pages.

Better, because more flexible, are 1:n comparisons. The principle: An output file of any length and with different page types is compared page by page with a given template. The advantage of this "one-to-many" method is that the template (reference document) and the test file (candidate document) do not have to be identical in terms of page length and type. Thus, a document to be checked with hundreds of individual pages can be checked against a template of only a few pages with absolute certainty and accuracy.

Important: It is not enough to compare documents only on a visual level. After all, analysis at the pixel level is of little use if only five of 1,000 deviations found are actually relevant. Rather, it is also a matter of object and text comparison, i.e., the comparison must be made at bit level, because quite a few deviations cannot be detected with the naked eye.

Therefore: What is needed are solutions that allow the greatest possible tolerance during the check without neglecting absolute correctness in content, corporate identity (fonts, layout, etc.) and compliance (legal requirements) (fuzzy methodology).

Rule based tests

Documents must comply with various rules, both legal and industry- and company-specific. Support is provided by high-performance software [solutions] that can be used to automatically and reliably check documents of any type and format against stored sets of rules. These can be formal criteria such as compliance with corporate design (wording, layout, etc.) or the correctness of addresses as well as compliance with legal regulations (archiving/verifiability, reporting, data protection, etc.).

1. Testing against factors relevant to production

  • Is there enough space in the document to apply various control marks for enveloping, franking, mailing, etc. (e.g. OMR = Optical Mark Recognition)?

2. Testing against postal rule

  • Is the address field designed appropriately for mailing?
  • Do the fonts and font sizes used comply with the prescribed guidelines of the postal service providers?
  • Is all the information required for outputting and mailing the document correct and complete?
  • Are there possibly images and overlays that hinder the reading of the address field?

3. Testing against corporate design (CD) rules

  • Fonts, color, font size
  • Logos
  • Footer/Header
  • Text modules
  • Imprint

4. Testing against legal requirements

  • Electronic archiving according to GDPdU
  • Sarbanes Oxley Act (SOX)
  • KonTraG (Law on Control and Transparency in Business)
  • GoBD
  • Codes of Federal Regulations (CFR)
Address management

Practice shows: Anyone who sends out six- or seven-figure shipments every year is constantly busy checking recipient addresses. This is time-consuming and expensive. Despite the various address qualification services offered by postal service providers (e.g. Premiumadress from DPAG), there is still a high risk of returns.

Qualified address management is not something you do on the side. Without professional support from IT solutions for automated and reliable address validation, nothing works here. The challenge: to extract the address from any data stream, check it for completeness and correctness and, if necessary, sort out the relevant item - BEFORE the mail is handed over to the service provider.

Document analysis

Detailed information about the structure of a print file is the be-all and end-all for high quality in production printing. This involves questions such as:

  • Can the document be printed at all on the available attachment?
  • Does the document even need to be recreated?
  • Are adjustments and changes necessary?
  • Does the file contain all the information required for printing (including control characters for simplex/duplex, fonts)?
  • What is the color distribution within the file?

The aim of a document analysis is to avoid misprints and production disruptions

With professional document analysis software, misprinting doesn't stand a chance

Companies that print complex documents in large quantities and in many different formats (AFP, PDF, PCL, Metacode, PostScript, etc.) often do not have sufficient resources to analyze the files comprehensively. This makes it all the more important to have software tools that automatically determine all the data that is important for high-speed printing. These include:

  • Expected ink consumption
  • Number of pages/size
  • Embedded fonts
  • Number and structure of included images and graphics
  • Simplex / duplex printing

One of the most important features: determining the color distribution within a print file. These software tools calculate exactly the proportion of CMYK colors required for full-color printing - both for the individual page and for the document as a whole.
The advantage? These results can be used to estimate printing costs even more accurately. What's more, software tools offer the option of storing freely definable maximum values for the CMYK analysis. On this basis, it is then easier to assess whether a print job is at all cost-effective. In short, this provides you with a reliable calculation tool and simplifies your document checking.

Regression test

Even the smallest changes in the creation and formatting of documents affect their layout and content. Whether it's a new font, additional pages or a new release of the document-generating software - who always knows exactly how the modifications will affect the quality of the documents? Most importantly, is every change really relevant to production? A new logo or a new font size does not necessarily have consequences for content. But that is precisely what is at stake. The problem is that conventional inspection programs only offer a visual comparison. But analysis at pixel level is of little use if only five of 1,000 deviations found are actually relevant.

Regression testing: Qualified testing and not at any price

Fact: Automated and reliable quality checking is essential in high-volume document processing - after all, software updates and enhancements are commonplace in companies. Hardly any company has the resources to manually check the new documents (candidate document) against the templates (reference document) reliably for every modification. Especially not with tens of thousands of different document types! Better, because safer and faster, are automated regression tests. They check all changes and upgrades against the existing applications (legacy) without the employee having to intervene.

  • Intelligent analysis tools for document comparison that detect and list every production-relevant deviation are in demand
  • Inspection programs that examine documents at the object level, i.e. in terms of content, and not just visually, are in demand
  • Software solutions with freely selectable tolerance limits and the option of excluding areas from the comparison are in demand

Conclusion

With its solutions, Compart automates the document testing of companies, authorities and organizations, thereby helping them to achieve higher productivity in their core business and ensuring one hundred percent compliance of the products.