3.10.  Form Fields

Overview

It is often the content of form fields which is processed when PDF documents are part of a workflow. To avoid problems the fields should be created properly. So field names should be unique.

You can extract all information about form fields with the utility ExtractFieldsInfo into an XML file, which can then be used for XML and XPath based tests.

The following sections describe a lot of tests for field properties, size and content. Depending on the application context one of the following tags and attributes may be useful to you:

<!-- Tags for tests on fields: -->
  
<hasField   withName                (required) 
            
            width=".."              (optional, ...
            height=".."             ... but used together)
            unit=".."               (optional, default = MILLIMETER)

            hasMultipleLines".."    (optional)
            hasSingleLine".."       (optional)
            isEditable".."          (optional)
            isExportable".."        (optional)
            isHidden".."            (optional)
            isMultiSelectable".."   (optional)
            isPasswordProtected".." (optional)
            isReadOnly".."          (optional)
            isPrintable".."         (optional)
            isRequired".."          (optional)
            isSigned".."            (optional)
            isVisible".."           (optional)
    
            withType".."            (optional)
            
/>

...  continued
... continuation

<!-- Nested tags of <hasField /> are described later in this chapter -->
<!-- The constants for the attribute 'withType' ard described later in this chapter -->

<hasFields                  />
<hasNumberOfFields          />
<hasSignedSignatureFields   />
<hasUnsignedSignatureFields />

<!-- Nested tags of <hasFields /> are: -->

<allithoutDuplicateNames />
<allWithoutTextOverflow  />  1
<matchingXPath           />
<matchingXML             />


1

This test is described separately in chapter 3.11: “Form Fields - Text Overflow”:

Existence of Fields

The following test verifies whether or not fields exist:

<testcase name="hasFields_NoFieldsAvailable"
          errorExpected="YES"
>
  <assertThat testDocument="acrofields/noAcrofieldDemo.pdf">
    <hasFields />
  </assertThat>
</testcase>

Number of Fields

If you only need to verify the number of fields, you can use the tag <hasNumberOfFields />:

<testcase name="hasNumberOfFields">
  <assertThat testDocument="acrofields/simpleRegistrationForm.pdf">
    <hasNumberOfFields>4</hasNumberOfFields>
  </assertThat>
</testcase>

Perhaps it might also be interesting to ensure that a PDF document has no fields:

<testcase name="hasNumberOfFields_NoFieldsAvailable">
  <assertThat testDocument="acrofields/noAcrofieldDemo.pdf">
    <hasNumberOfFields>0</hasNumberOfFields>
  </assertThat>
</testcase>

Name of Fields

Because fields are accessed by their names to get their content, you could check that the names exist:

<testcase name="hasField_MultipleInvocation">
  <assertThat testDocument="acrofields/simpleRegistrationForm.pdf">
    <hasField withName="name" />
    <hasField withName="address" />
    <hasField withName="postal_code" />
    <hasField withName="email" />
  </assertThat>
</testcase>

Duplicate field names are allowed by the PDF specification, but they are probably a source of surprises in the later workflow. Thus PDFUnit provides a test to check the absence of duplicate names.

<testcase name="hasFields_AllWithoutDuplicateNames">
  <assertThat testDocument="acrofields/javaScriptForFields.pdf">
    <hasFields>
      <allWithoutDuplicateNames />
    </hasFields>
  </assertThat>
</testcase>

Content of Fields

It is very simple to verify that a given field contains data:

<testcase name="hasField_WithAnyValue">
  <assertThat testDocument="acrofields/javaScriptForFields.pdf">
    <hasField withName="ageField">
      <withAnyValue />
    </hasField>
  </assertThat>
</testcase>

To verify the actual content of fields with an expected string, the following tags are available:

<!-- Tags to check content in fields -->

<containing             />
<endingWith             />
<havingJavaScriptAction />
<matchingComplete       />
<matchingRegex          />
<notContaining          />
<notMatchingRegex       />  (useful, because regular expressions are 
                             not designed to find 'Not-Matches')
<startingWith           />
<withAnyValue           />
<withoutTextOverflow    />

The following examples should give you some ideas about how to use these tags:

<testcase name="hasField_MatchingComplete">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasField withName="Text 1">
      <matchingComplete>
        Single Line Text
      </matchingComplete>
    </hasField>
  </assertThat>
</testcase>
<!-- This is a small test to protect fields against SQL injection. -->

<testcase name="hasField_NotContaining_SQLComment">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasField withName="Text 1">
      <notContaining>--</notContaining>
      <notContaining>/></notContaining>
    </hasField>
  </assertThat>
</testcase>

Type of Fields

Each field has a type. Although a field type is not as important as the name, it can be tested with the attribute withType="..":

<testcase name="hasFieldWithType_MultipleInvocation">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasField withName="Text 25"        withType="TEXT"        />
    <hasField withName="Check Box 7"    withType="CHECKBOX"    />
    <hasField withName="Radio Button 4" withType="RADIOBUTTON" />
    <hasField withName="Button 19"      withType="PUSHBUTTON"  />
    <hasField withName="List Box 1"     withType="LIST"        />
    <hasField withName="List Box 1"     withType="CHOICE"      />
    <hasField withName="Combo Box 5"    withType="CHOICE"      />
    <hasField withName="Combo Box 5"    withType="COMBO"       />
  </assertThat>
</testcase>

Available field types are defined as constants for the attribute withType. The names of the constants correspond to the typical names of visible elements of a graphical user interface. But the PDF standard uses other names for the types. The following list shows the association between PDFUnit constants and PDF internal constants. These may appear in error messages:

<!-- Mapping between PDFUnit constants and PDF-internal types. -->

PDFUnit-Constant    PDF-intern
-------------------------------
CHOICE          ->  "choice"
COMBO           ->  "choice"
LIST            ->  "choice"
CHECKBOX        ->  "button"
PUSHBUTTON      ->  "button"
RADIOBUTTON     ->  "button"
SIGNATURE       ->  "sig"
TEXT            ->  "text"

The previous program listing shows all testable fields except for a signature field, because that document has no signature field. The document of the next listing has a signature field and that can be tested:

<testcase name="hasField_WithType_Signature">
  <assertThat testDocument="signed/sampleSignedPDFDocument.pdf">
    <hasField withName="Signature2" isSigned="YES" />
  </assertThat>
</testcase>

Datailed tests for signatures and certificates are described in the chapter 3.23: “Signatures and Certificates”:

Field Size

If the size of form fields is important, check it using the attributes width=".." and height="..":

<testcase name="hasField_WidthAndHeight">
  <assertThat testDocument="acrofields/notExportableAcrofield.pdf">
    <hasField withName="Title of 'someField'"
              width="159"     (default is MILLIMETER)
              height="11"     (default is MILLIMETER) 
    />
  </assertThat>
</testcase>
<!-- 
  When @unit is omitted, the values of width are taken as "MILLIMETER".
 -->

<testcase name="hasField_Width">
  <assertThat testDocument="acrofields/notExportableAcrofield.pdf">
    <hasField withName="Title of 'someField'" width="159"                    />
    <hasField withName="Title of 'someField'" width="159"  unit="MILLIMETER" />
    <hasField withName="Title of 'someField'" width="15.9" unit="CENTIMETER" /> 1
    <hasField withName="Title of 'someField'" width="450"  unit="DPI72"      /> 2
    <hasField withName="Title of 'someField'" width="450"  unit="POINTS"     />
    <hasField withName="Title of 'someField'" width="6.26" unit="INCH"       />
  </assertThat>
</testcase>

1 2

The formats POINTS and DPI72 are identical.

When you are creating a test you probably do not know the dimensions of a field. That is not a problem. Use any value for width and height and run the test. The resulting error message returns the real field size.

Whether a text fits into a field or not is not predictable by calculation using font size and field size. In addition to the font size the words at the end of each line determine the required number of rows and the required height. And the calculation has to consider hyphenation. Chapter 3.11: “Form Fields - Text Overflow” deals with this subject in detail.

Field Properties

Fields have more properties than just the size. For example editable and printable. Since most of the properties can not be tested manually, appropriate tests have to be part of every PDF testing tool. The following example shows the principle.

<testcase name="hasField_Editable">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasField withName="Combo Box 4" isEditable="YES" />
  </assertThat>
</testcase>

These are the available attributes for verifing properties of form fields:

<!-- Attributes to check field properties: -->

hasMultipleLines="YES"   
hasSingleLine="YES"

isEditable="YES/NO"
isExportable="YES/NO"
isHidden="YES/NO"
isMultiSelectable="YES/NO"
isPasswordProtected="YES"
isPrintable="YES/NO"
isReadOnly="YES/NO"
isRequired="YES/NO"
isSigned="YES"
isVisible="YES/NO"

Whitespaces will be ignored when comparing expected and actual field content:

<testcase name="hasField_MultiLineField_MultipleInvocations">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasField withName="Text multi" 
              hasMultipleLines="YES"
              isExportable="YES"
    >
      <matchingComplete>
         Multiple Line Support:
         First Line;
         Second Line;
      </matchingComplete>
    </hasField>
  </assertThat>
</testcase>

JavaScript Actions for Fields

Assuming that PDF documents are processed in a workflow, the input into fields is typically validated with constraints implemented in JavaScript. That prevents incorrect input.

PDFUnit can verify that a field is linked with an action:

<testcase name="hasField_HavingJavaScriptAction_MultipleInvocation">
  <assertThat testDocument="acrofields/javaScriptForFields.pdf">
    <hasField withName="ageField" >
      <havingJavaScriptAction>Validate</havingJavaScriptAction>
    </hasField>
    <hasField withName="nameField" >
      <havingJavaScriptAction>Keystroke</havingJavaScriptAction>
    </hasField>
    <hasField withName="commentField" >
      <havingJavaScriptAction>Keystroke</havingJavaScriptAction>
    </hasField>
  </assertThat>
</testcase>

Tags to validate the JavaScript itself are described in chapter: 3.14: “JavaScript”.

Unicode

When tools for creating PDF do not handle Unicode sequences properly, it is difficult to test those sequences. But difficult does not mean impossible. The following picture shows the name of a field in the encoding UTF-16BE with a Byte Order Mark (BOM) at the beginning:

Although it is tricky, the name of this field can be tested as a Java Unicode sequence:

<!--
  The name of the field consists of UTF-16BE code represented as ASCII.
  Use a Unicode sequence for the field name to test it.
 -->

<testcase name="hasField_NameContainingUnicode_UTF16">
  <assertThat testDocument="unicode/unicode_inFieldnames.pdf">
    <hasField withName="\u00fe\u00ff\u0000F\u0000o\u0000r\u0000m\...\u0000]" />
  </assertThat>
</testcase>

<!-- The Unicode sequence in this example is abbreviated. -->

More information about Unicode and Byte-Order-Mark can be found in Wikipedia.

Validate Field Information against XML

Chapter 9.3: “Extract Field Information to XML” describes how to extract information about all fields of a PDF document. You can compare the field properties in a PDF document with the extracted XML file:

<testcase name="hasField_MatchingXML">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasFields>
      <matchingXML file="acrofields/plugin-pdf_form_maker.xml"/>
    </hasFields>
  </assertThat>
</testcase>

Validate Field Information with XPath

The extracted XML data can also be used for XPath based tests. That allows you to test dependencies between multiple fields (cross-constraints). The next example gives you an idea of the possibilities:

<testcase name="hasField_MatchingXPath_NumberOfTextFields">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasFields>
      <matchingXPath expr="count(//field[./@type='text']) = 43"/>
    </hasFields>
  </assertThat>
</testcase>

The tag <matchingXPath /> can be used multiple times in one test.

<testcase name="hasField_MatchingXPath_MultipleInvocation">
  <assertThat testDocument="acrofields/plugin-pdf_form_maker.pdf">
    <hasFields>
      <matchingXPath expr="count(//field[./@type='text'])     = 43"/>
      <matchingXPath expr="count(//field[./@type='button'])   = 54"/>
      <matchingXPath expr="count(//field[./@type='choice'])   =  5"/>
      <matchingXPath expr="count(//field[./@type='signatur']) =  0"/>
    </hasFields>
  </assertThat>
</testcase>

The following two example check whether unsigned signature fields exist:

<testcase name="hasField_MatchingXPath_HavingUnSignedSignatureFields_1">
  <assertThat testDocument="acrofields/certificateform.pdf">
    <hasUnsignedSignatureFields />
  </assertThat>
</testcase>
<testcase name="hasField_MatchingXPath_HavingUnSignedSignatureFields_2">
  <assertThat testDocument="acrofields/certificateform.pdf">
    <hasFields>
      <matchingXPath expr="count(//field[./@type='sig'][./@isSigned='false']) > 0"/>
    </hasFields>
  </assertThat>
</testcase>

PDFUnit uses the XSLT-processor in your Java runtime. Please read the documentation for your JRE or JDK to find out which XPath 2.0 syntax elements and functions are supported. There are no restrictions from PDFUnit itself.