Java “understands Unicode” as does XML. So PDFUnit also “understands” Unicode. The section 7: “Unicode” deals with Unicode in detail.
This section describes a utility program that converts a Unicode string into its ASCII hex code. The hex code can be used in many of your tests. If you are using a small number of Unicode characters it is easier to use ASCII hex code than to install a new font on your computer. And maybe you don't have permission anything.
The utility ConvertUnicodeToHex
converts any string into ASCII and escapes all non-ASCII characters
into their corresponding Unicode hex code. For example, the Euro character
is converted into \u20AC.
The input file can be of any encoding, but you have to define the right encoding before executing the program.
You start the Java program with the parameter -D:
:: :: Converting Unicode content of the input file to hex code. :: @echo off setlocal set CLASSPATH=./lib/pdfunit-2015.10/*;%CLASSPATH% set TOOL=com.pdfunit.tools.ConvertUnicodeToHex set OUT_DIR=./tmp set IN_FILE=convert-unicode-to-hex.in.txt java -Dfile.encoding=UTF-8 %TOOL% %IN_FILE% %OUT_DIR% endlocal
The name of the output file is derived from the name of the input file.
So _convert-unicode-to-hex.out.txt with the following content
is generated:
#Unicode created by com.pdfunit.tools.ConvertUnicodeToHex #Wed Jan 16 21:50:04 CET 2013 convert-unicode-to-hex.in_as-ascii=\u00E4\u00F6\u00FC \u20AC @
The output file is written in the encoding of the Java Runtime,
derived from the environment parameter file.encoding.
Leading and trailing whitespaces in the input string will be trimmed! When you need them for your test, add them later by hand.