3.30.  XFA Data

Overview

The XML Forms Architecture, (XFA) is an extension of the PDF structure using XML information. Its goal is to integrate PDF forms better into workflow processes.

XFA forms are not compatible with Acro Forms. Therefore, tests for acroforms cannot be used for XFA data. Tests for XFA data are mainly based on XPath.

<!-- Tags to test XFA data: -->

<hasXFAData   />
<hasNoXFAData />

<!-- Inner tags of hasXFAData: -->

<hasXFAData>
  <matchingXPath />  (optional)
  <matchingXML   />  (optional)
  <withNode      />  (optional)
</hasXFAData>

Existence and Absence of XFA

The first test focuses on the existence of XFA data:

<testcase name="hasXFAData">
  <assertThat testDocument="xfa/xfa-movie.pdf">
    <hasXFAData />
  </assertThat>
</testcase>

You can also check that a PDF document does not contain XFA data:

<testcase name="hasNoXFAData">
  <assertThat testDocument="xfa/no-xfa.pdf">
    <hasNoXFAData />
  </assertThat>
</testcase>

Comparing XFA with an XML File

The XFA data of a document can be extracted into an XML file using the utility ExtractXFAData. In a later test the file can be compared with the XFA data of another PDF document:

<testcase name="hasXFAData_MatchingXML">
  <assertThat testDocument="xfa/xfa-movie.pdf">
    <hasXFAData>
      <matchingXML file="xfa/xfa-movie.xml" /> 1
    </hasXFAData>
  </assertThat>
</testcase>

1

Whitespaces are ignored when comparing XML data.

Often it makes no sense to compare the complete XFA data from a file with those from a PDF document. So you can test individual XML nodes or use XPath expressions for more detailed tests. Both options are described in the following sections.

Validate Single XML-Tags

The next examples use the following XFA data (extract):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/">
  ...
  <x:xmpmeta xmlns:x="adobe:ns:meta/"
             x:xmptk="Adobe XMP Core 4.2.1-c041 52.337767, 2008/04/13-15:41:00"
  >
    <config xmlns="http://www.xfa.org/schema/xci/2.6/">
      ...
      <log xmlns="http://www.xfa.org/schema/xci/2.6/">
        <to>memory</to>
        <mode>overwrite</mode>
      </log>
      ...
    </config>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      ...
      <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" >
        <xmp:MetadataDate>2009-12-03T17:50:52Z</xmp:MetadataDate>
      </rdf:Description>
      ...
    </rdf:RDF>
    ...
  </x:xmpmeta>
</xdp:xdp>

The value of a certain XML node can be tested with the Tag <withNode />:

<testcase name="hasXFAData_WithNode">
  <assertThat testDocument="xfa/xfa-enabled.pdf">
    <hasXFAData>
      <withNode tag="xmp:MetadataDate" 
                value="2009-12-03T17:50:52Z" 
                defaultNamespace="http://www.xfa.org/schema/xci/2.6/" /> 1
    </hasXFAData>
  </assertThat>
</testcase>

1

PDFUnit analyzes the XFA data from the current PDF document and determines the namespaces automatically. Only the default namespace has to be specified.

If the XPath expression evaluates to a node set, the first node is used.

When processing the XPath expression PDFUnit internally adds the path element "//" to the given XPath expression. For this reason the expression need not contain the document root "/".

Tests on attribute nodes are of course also possible:

<testcase name="hasXFAData_WithNode_NamespaceDD">
  <assertThat testDocument="xfa/xfa-enabled.pdf">
    <hasXFAData>
      <withNode tag="dd:dataDescription/@dd:name" 
                value="movie" 
      />
    </hasXFAData>
  </assertThat>
</testcase>

XPath based XFA Tests

XPath can do more than just identify individual nodes. With the tag <matchingXPath /> you can use all the power of XPath.

The following two examples give you an idea of what is possible:

<testcase name="hasXFAData_FunctionStartsWith">
  <assertThat testDocument="xfa/xfa-enabled.pdf">
    <hasXFAData>
      <!-- complete value: 'movie' -->
      <matchingXPath expr="starts-with(//dd:dataDescription/@dd:name, 'mov')" />
    </hasXFAData>
  </assertThat>
</testcase>
<testcase name="hasXFAData_MatchingXPath_FunctionCount_MultipleInvocation">
  <assertThat testDocument="xfa/xfa-movie.pdf">
    <hasXFAData>
      <matchingXPath expr="//pdf:Producer[. ='Adobe LiveCycle Designer ES 8.2']" />
      <matchingXPath expr="count(//processing-instruction()) = 30" />
    </hasXFAData>
  </assertThat>
</testcase>

One limitation has to be mentioned. The evaluation of the XPath expressions depends on the implemented features of the XPath engine you are using. By default PDFUnit uses the JAXP implementation of your JDK. So the XPath compatibility also depends on the version of your JDK.

Default Namespaces in XPath

XML namespaces are detected automatically, but the default namespace has to be declared explicitly. Because the XML standard allows multiple declarations of a namespace in a document, it is not automatically clear which default namespace should be used when more than one declaration exists. Therefore, the default namespace must be declared:

<testcase name="hasXFAData_DefaultNamespace_matchingXPath">
  <assertThat testDocument="xfa/xfa-movie.pdf">
    <hasXFAData>
      <matchingXPath expr="count(//default:subform[@name ='movie']//default:field) = 5" 
                     defaultNamespace="http://www.xfa.org/schema/xfa-template/2.6/"
      />
    </hasXFAData>
  </assertThat>
</testcase>

For the same reason, the default namespace must be given when using the tag <withNode />:

<!-- 
  The default namespace has to be declared, 
  but any alias can be used for it.
-->
<testcase name="hasXFAData_DefaultNamespace_WithNode_AnyAlias">
  <assertThat testDocument="xfa/xfa-enabled.pdf">
    <hasXFAData>
      <withNode tag="foo:log/foo:to"                                     1
                value="memory" 
                defaultNamespace="http://www.xfa.org/schema/xci/2.6/" />
    </hasXFAData>
  </assertThat>
</testcase>

1

It is not important which alias you choose for the default namespace.