DocBridge Mill, AFPDS and "Encoding Lists"
When DocBridge Mill converts AFPDS documents created by third party software, it sometimes happens that text is not searchable or that special characters are not converted to the desired characters in the output file. This can occur because the creator of the document did not use standard IBM GCGID’s.
In order to understand this behavior, some basic knowledge of how AFP text is presented may be helpful.
In AFP the font used to present a text item is defined by setting both a Code Page and a font Character Set:
- AFP Code Pages are the tables used to associate Graphic Character Global IDentifiers (GCGID) with so-called code points or hexadecimal values.
- AFP font Character Sets contain the descriptive and metric information of the whole font, and the metric and shape information for each character. Each character in the font Character Set has a unique GCGID.
When the text item is formatted and presented, the code point of each character in the text item is matched to the respective GCGID´s through the associated AFP Code Page. Once the GCGID has been determined, the actual glyph characteristics are read from the font Character Set and the text items are presented to the user either on paper or on his or her screen.
DocBridge Mill internally converts all text items to UNICODE, therefore the UNICODE values of each code point used in the text items need to be known. If the GCGID´s used to present the text items do not conform to the IBM Expanded Core font GCGID naming convention, DocBridge Mill needs specific information about this coding to process the file correctly. The user has to define a GCGID to UNICODE encoding list in the DocBridge Mill AFP Filter profile for each Code Page used.
If IBM Expanded Core fonts are used no custom GCGID to UNICODE encoding is required.
Steps to follow to create a custom encoding list:
1. Dump the AFPDS file with your favourite AFPDS dump tool or use the Compart AFP Utility afpdump.
afpdump -v <filename> >> <dumpoutput.txt>
2. Locate the text item in the AFPDS dump output.
0xDB: TRN Transparent Data [7]
Text: "25.03 A"
(hex:) F2 F5 4B F0 F3 40 41
3. Note the hex value of the character not presented as desired.
4. Determine the active font id by searching for the preceding "Set Coded Font Local" text string, e.g.:
0xF1: SCFL Set Coded Font Local [1]
CFLid: 0x02 (2)
5. Determine the Code Page and font Character Set names by searching for the preceding “Map Coded Font” text string and locating the matching "Resource local id", e.g.:
T02: Fully Qualified Name [10]
FQNType: 0x85 (Code Page Name Reference)
FQNFmt: 0x00 (Character string)
FQName: "T1001000"
T02: Fully Qualified Name [10]
FQNType: 0x86 (Font Character Set Name Reference)
FQNFmt: 0x00 (Character string)
FQName: "C0040TPN"
T24: Resource Local ID [2]
ResType: 0x05 (Coded Font)
ResLID: 0x02 (2)
6. Locate the "Code Page" in the AFPDS dump output: If the Code Page is not found in the AFPDS dump, the external resource file with the matching file name should be located and dumped, e.g.:
0xD3A8CE: BR Begin Resource [20] (Offset 279706)
"T1001000"
(hex:) E3 F1 F0 F0 F1 F0 F0 F0
Reserved: 00 00
T21: Resource Object Type (R) [8]
ObjType: 0x41 (Code Page object (retired))
ConData: 00 00 00 00 00 00 00
{
0xD3A887: BCP Begin Code Page [0] (Offset 279735)
7. Locate the required "hex value" in the "CPI Code Page Index", e.g.:
065 "CPT00065" CodePoint: 0x41 (65)
8. Note the required GCGID
9. Locate the "Character Set" in the AFPDS dump output: If the "Character Set" is not found in the AFPDS dump, the external resource file with the matching file name should be located and dumped.
0xD3A8CE: BR Begin Resource [20] (Offset 364352)
"C0TMPOST"
(hex:) C3 F0 E3 D4 D7 D6 E2 E3
Reserved: 00 00
T21: Resource Object Type (R) [8]
ObjType: 0x40 (Font Character Set object (retired))
ConData: 00 00 00 00 00 00 00
{
0xD3A889: BFN Begin Font [0] (Offset 364381)
10. Locate the GCGID in the "Character Set", e.g.:
GCGID: CHAR0065 14 17 842
......XXX......
......XXX......
.....XX.XX.....
.....XX.XX.....
.....XX.XX.....
....XX...XX....
....XX...XX....
...XXX...XXX...
...XX.....XX...
...XX.....XX...
..XXXXXXXXXXX..
..XXXXXXXXXXX..
..XX.......XX..
.XX.........XX.
.XX.........XX.
.XX.........XX.
XX...........XX
XX...........XX
11. Determine the UNICODE value for the character: If the Character Set is based on ASCII encoding, the Hex value is normally equal to the Unicode value:
"CPT00065" CodePoint: 0x41 = UNICODE 0041 = "A"
If not, you have to find the UNICODE value for the character. A helpful link is www.unicode.org/charts/charindex.html.
12. You can now create the encoding for the Code Page and Character Sets in the <encodinglist> section in the mffafp.pro.
"encoding name" = Code Page name
"devname" = Character name in Code Page
"unc" = UNICODE value for the character
<encoding name="T1001000">
<entry devname="CPT00000" unc="0000"/>
<entry devname="CPT00001" unc="0001"/>
<entry devname="CPT00002" unc="0002"/>
<entry devname="CPT00003" unc="0003"/>
<entry devname="CPT00004" unc="0004"/>
...
<entry devname="CHAR0065" unc="0041"/>
<entry devname="CHAR0066" unc="0042"/>
<entry devname="CHAR0067" unc="0043"/>
<entry devname="CHAR0068" unc="0044"/>
</encoding>
Hints:
- Repeat the steps for each Code Page found in the file.
- Only code the characters found in the Character Sets.
- The file can contain more than one Character Set using the same character names, some may contain more characters than others. Reference all character sets to determine the UNICODE–GCGID pairs.


