Chapter 9. Utility Programs

9.1.  General Remarks for all Utilities

PDFUnit provides utility programs to extract several parts of a PDF document into separate files, mostly XML, which can then be used in tests. The following list gives an overview of the available programs:

// Utility programs belonging to PDFUnit:

ConvertUnicodeToHex           9.2: “Convert Unicode Text into Hex Code” 
ExtractBookmarks              9.5: “Extract Bookmarks to XML” 
ExtractEmbeddedFiles          9.4: “Extract Attachments” 
ExtractFieldInfo              9.3: “Extract Field Information to XML” 
ExtractFontInfo               9.6: “Extract Font Information to XML” 
ExtractImages                 9.7: “Extract Images from PDF” 
ExtractJavaScript             9.8: “Extract JavaScript to a Text File” 
ExtractNamedDestinations      9.9: “Extract Named Destinations to XML” 
ExtractSignatureInfo          9.10: “Extract Signature Information to XML” 
ExtractXFAData                9.11: “Extract XFA Data to XML” 
ExtractXMPData                9.12: “Extract XMP Data to XML” 
ExtractZugferdData            9.15: “Extract ZUGFeRD Data” 
RenderPdfPageRegionToImage    9.13: “Render Page Sections to PNG” 
RenderPdfToImages             9.14: “Render Pages to PNG” 

The utility programs generate files. Their names are derived from those of the input files. The following rules are used to avoid naming conflicts with existing files:

  • Generated file names start with an underscore.

  • The names have two suffices. The penultimate is .out and the last one is the typical suffix for the kind of file type.

For example, when you extract bookmarks from foo.pdf, the file _bookmarks_foo.out.xml is created. Rename it before using it in a test, because then it is no longer an output file.

The Windows batch scripts in the following chapters demonstrate how to start the programs. These scripts are part of the PDFUnit release, but you have to adapt most of their content to your environment anyway: you need to set the classpath, input file and output directory.

When you start a program without parameters or with incorrect parameters, PDFUnit shows a message detailing the corect command line parameters.

The utilities also run on Unix. Unix developers should easily translate the Windows scripts into shell scripts. If you need assistance, please contact us at: info[at]pdfunit.com.