Preparing an XSL-FO Document for PDF/UA
The PDF/UA standard for accessible PDF documents gains
Preparing an XSL-FO Document for PDF/UA
The PDF/UA standard for accessible PDF documents gains more and more momentum, as PDF documents cannot only be displayed and used on different output devices including handhelds in a better way by complying to this standard, but they can even be read out. Yet how can I create a valid PDF/UA document?
The internet provides plenty of information on how documents should be prepared in standard text processing software such as Word so that they can be converted to PDF or PDF/UA. Yet how can documents in other formats be prepared for PDF/UA?
This article describes step-by-step, which settings are relevant for PDF/UA and how an existing XSL-FO document can be edited, so that it can be converted to PDF/UA by Compart DocBridge products.
This does not mean, however, that all these settings have to applied directly to the XSL-FO. It is rather the goal of this article to provide an overview, so that the document generation process can be adapted before the XSL-FO, which is frequently used as intermediate format, is generated including all necessary information for PDF/UA. It goes without saying that this does not rule out the direct creation of the XSL-FO document.
Specific Features of PDF/UA Documents
The following features are characteristic for PDF/UA documents:
- PDF/UA identification (version information)
- embedded fonts
- dc:title entry in the meta data
- specification of the semantic structure of the document (tagged PDF)
- The natural language of the document must be specified as well as any language variations within the document.
- Alternative texts for all non-text contents are required, e. g. for images.
- Alternate texts should be provided for abbreviations and acronyms.
- marked artifacts
Preparing the XSL-FO document
The following steps describe how the relevant information for PDF/UA can be added to an existing XSL-FO document and how the document can then be converted to PDF/UA using Compart software. It is hence required that an input document in XSL-FO format is already available.
-
Insertion of the document title
<cpfo:document-info xmlns:cpfo="http://www.compart.net/xmlns/cpfo"
document-info-name="title" document-info-value="Telefun Anschreiben"/> -
Specification of document language and whitespace treatment
<fo:root xmlns:fo= "http://www.w3.org/1999/XSL/Format" language= "de"
white-space-treatment="ignore-if-after-linefeed"> -
Specification of document language and whitespace treatment
Tagging Example:
<fo:block role="P">
<fo:inline role="Span" cpfo:abbr="im Auftrag"> i.A.</fo:inline>
Sebastian Turner
</fo:block> -
Creation of the semantic structure/Addition of roles to the text blocks (Tagging)
The entire content of the document must semantically be tagged using the role attribute. The following tags are the standard tags suggested by Adobe for the creation of tagged PDF:
- Container elements: Document, Part, Div, Art, Sect
- Headings: either numbered tags (H1, H2, H3) or unnumbered tags (H) can be used, but not both in the same document
- Paragraphs: P
- Inlines: Span
- Lists:
- L identifies a list.
- LI is used to identify each list item.
- Each list item consists of a label (Lbl) and the list item content (LBody).
- Table identifies a table.
- THead is used to identify the table header and TBody the rest of the table content.
- Each row is tagged TR.
- Cells within the table header are marked TH and cells within the table body TD.Images are characterized as Figure.
- Table of Contents: The entire table of contents is tagged as TOC and each entry within the TOC as TOCI.
- Index: The Index tag identifies an index. Links: Links are marked as Link.
Tagging Examples:
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:cpfo="http://www.compart.net/xmlns/cpfo" language="de"
white-space-treatment="ignore-if-after-linefeed" role="Document">
<fo:block role="P">Sehr geehrte Frau Silvia Obermaier,</fo:block>
<fo:block role="P">
<fo:inline role="Figure" cpfo:alt="gescannte Unterschrift">
<fo:external-graphic content-height="50px" content-width="100px" scaling="uniform" src="Unterschrift_Turner.jpg" content-type="content-type:image/png"/>
</fo:inline>
</fo:block>
Configuring PDF/UA settings
General settings for the PDF generation in the MFFPDF filter profile
-
Version information pdfua
<output>
<!-- PDF version for output -->
<version value="pdf/ua" strict="FALSE"/>
...
</output> -
Embedded fonts (embed="always")
<font family="COURIER" serifstyle="SERIF" spacing="MONOSPACED" reftype="TrueType" >
<<face weight="MEDIUM" width="NORMAL" style="UPRIGHT" fontfile="cour" fontfiletype="TrueType" embed="always"/>
<<face weight="BOLD" width="NORMAL" style="UPRIGHT" fontfile="courbd" fontfiletype="TrueType" embed="always"/>
<<face weight="MEDIUM" width="NORMAL" style="ITALIC" fontfile="couri" fontfiletype="TrueType" embed="always"/>
<<face weight="BOLD" width="NORMAL" style="ITALIC" fontfile="courbi" fontfiletype="TrueType" embed="always"/>
</font>
Creating the PDF/UA document
Executing the DocBridge Mill tool cpmcopy with the parameter gendocumentstructure to generate the tag structure
cpmcopy.exe ${fo} -gendocumentstructure -o ${out} -type pdf
Validating the PDF/UA document
PDF/UA documents can, for example, be validated for PDF/UA conformance by the free PDF/UA accessibility checker PAC2.