One of the best known applications for this is Wikidata, the knowledge database of the online encyclopedia Wikipedia, in which tens of millions of facts are now stored. If, for example, you want to know how many Chicago Bears Football players who were born in Illinois and attended college in a Big10 school, you will certainly find what you are looking for here. Certainly - a very unusual example, but one that makes the significance of the subject clear. The aim is to gain new connections/knowledge from structured data about algorithms (ontologies). This is where artificial intelligence (AI) comes into play, which can then be used to formulate complex queries (see the "Glossary").
A further important topic in this context is that the stored data with a structure must be checked - something that is often not done today. The XML schema, for example, is a proven method for guaranteeing the correctness and completeness of an XML file, as errors caused by unchecked data can be very costly.
Consistent data verification is essential. Last but not least, the data must also be converted utilizing rules. There are many possibilities for this today, one of the best known is the programming language XSLT (see also the "Glossary"). But there are other sets of rules.
Instead of Destroying Content....
Anyone who wants to increase the degree of automated communication processing in CCM in the sense of the next stage of digitization must ensure structured, consistent and centrally available data. For automated communication processing and output management, this means preserving the content of documents as completely as possible right from the start instead of destroying it - as is typical in electronic inbox´s of organizations.
The problem here: In many organizations, incoming emails still require data entry, i.e. converted into an image format, in order to subsequently make parts of the document content interpretable again by means of OCR technology. It wastes resources unnecessarily, especially when you consider that email attachments today can be quite complex documents with multiple pages.
However, this media disjointedness is equivalent to a "data gau": electronic documents (emails), which in themselves could be read and processed by IT systems, are first converted into TIFF, PNG or JPG files. So "pixel clouds" arise from content. In other words, the actual content is first encoded (raster images) and then made "readable" again with difficulty using Optical Character Recognition (OCR). This is accompanied by the loss of semantic structural information, which is necessary for later reuse.
How nice would it be, for example, if you could convert email attachments of any type into structured PDF files immediately after receipt? This would lay the foundation for long-term, revision-proof archiving; after all, the conversion from PDF to PDF/A is only a small step.
Basis for Further Automation
The following example: A leading German insurance group receives tens of thousands of emails daily via a central electronic mailbox, both from end customers and from external and internal sales partners. Immediately after receipt, the system automatically "triggers" the following processes:
- Conversion of the actual email ("body") to PDF/A
- Individual conversion of the email attachment (e.g. various Office formats, image files such as TIFF, JPG, etc.) to PDF/A
- Merging of the email body with the corresponding attachments and generation of a single PDF/A file per business transaction
- At the same time, all important information is read from the file (extracted) and stored centrally for downstream processes (e.g. generation of reply letters on an AI basis, case-closing processing, archiving).
Everything runs automatically and without media cutoff. The clerk receives the document in a standardized format, without having to worry about preparation (classification, making legible).