3.38.  XMP Data

Overview

XMP is the abbreviation for Extensible Metadata Platform, an open standard initiated by Adobe to embed metadata into files. Not only PDF documents are able to embed data, but also images. For example, metadata can be location and time.

The metadata in a PDF file can be important when processing a document, so they should be correct. PDFUnit provides the same test methods for XMP data as for XFA data:

// Methods to test XMP data:
.hasXMPData()
.hasXMPData().matchingXPath(..) 
.hasXMPData().withNode(..)

.hasNoXMPData()

Existence and Absence of XMP

The following examples show how to verify the existence and absence of XMP data:

@Test
public void hasXMPData() throws Exception {
  String filename = "documentUnderTest.pdf";

  AssertThat.document(filename)
            .hasXMPData()
  ;
}
@Test
public void hasNoXMPData() throws Exception {
  String filename = "documentUnderTest.pdf";
  
  AssertThat.document(filename)
            .hasNoXMPData()
  ;
}

Validate Single XML-Tags

Tests can check a single node of the XMP data and its value. The next example is based on the following XML-snippet:

<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    ...
    <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/">
      <xmp:CreateDate>2011-02-08T15:04:19+01:00</xmp:CreateDate>
      <xmp:ModifyDate>2011-02-08T15:04:19+01:00</xmp:ModifyDate>
      <xmp:CreatorTool>My program using iText</xmp:CreatorTool>
    </rdf:Description>
    ...
  </rdf:RDF>
</x:xmpmeta>

With the utility ExtractXMPData you can extract the XMP data from a PDF document into an XML file. Chapter 9.12: “Extract XMP Data to XML” describes how to use the utility.

In the example the existence of XML-nodes are validated. The method withNode(..)needs an instance of com.pdfunit.XMLNode as a parameter:

@Test
public void hasXMPData_WithNode_ValidateExistence() throws Exception {
  String filename = "documentUnderTest.pdf";
  XMLNode nodeCreateDate = new XMLNode("xmp:CreateDate");
  XMLNode nodeModifyDate = new XMLNode("xmp:ModifyDate");
  
  AssertThat.document(filename)
            .hasXMPData()
            .withNode(nodeCreateDate)
            .withNode(nodeModifyDate)
  ;
}

When you want to verify the value of a node, you also have to pass the expected value to the constructor of XMLNode:

@Test
public void hasXMPData_WithNodeAndValue() throws Exception {
  String filename = "documentUnderTest.pdf";
  XMLNode nodeCreateDate = new XMLNode("xmp:CreateDate", "2011-02-08T15:04:19+01:00");
  XMLNode nodeModifyDate = new XMLNode("xmp:ModifyDate", "2011-02-08T15:04:19+01:00");

  AssertThat.document(filename)
            .hasXMPData()
            .withNode(nodeCreateDate)
            .withNode(nodeModifyDate)
  ;
}

The XPath expression may not start with the document root, because PDFUnit adds // internally.

If an expected node exists multiple times within the XMP data, the first match is used.

Of course, the node may also be an attribute node.

XPath based XMP Tests

To take advantage of the full power of XPath, the method matchingXPath(..) is provided. The following two examples help give an idea of what is possible:

@Test
public void hasXMPData_MatchingXPath() throws Exception {
  String filename = "documentUnderTest.pdf";
  String xpathString = "//xmp:CreateDate[node() = '2011-02-08T15:04:19+01:00']";
  XPathExpression expression = new XPathExpression(xpathString);
  
  AssertThat.document(filename)
            .hasXMPData()
            .matchingXPath(expression)
  ;
}
@Test
public void hasXMPData_MatchingXPath_MultipleInvocation() throws Exception {
  String filename = "documentUnderTest.pdf";

  String xpathDateExists = "count(//xmp:CreateDate) = 1";
  String xpathDateValue = "//xmp:CreateDate[node()='2011-02-08T15:04:19+01:00']";
  
  XPathExpression exprDateExists = new XPathExpression(xpathDateExists);
  XPathExpression exprDateValue = new XPathExpression(xpathDateValue);
  
  AssertThat.document(filename)
            .hasXMPData()
            .matchingXPath(exprDateValue)
            .matchingXPath(exprDateExists)
  ;
  
  // The same test in a different style:
  AssertThat.document(filename)
            .hasXMPData().matchingXPath(exprDateValue)
            .hasXMPData().matchingXPath(exprDateExists)
  ;
}

The capability to evaluate XPath expressions depends on the XML parser or more exactly the XPath engine. By default PDFUnit uses the parser in the JDK/JRE.

Chapter 13.12: “JAXP-Configuration” explains how to use any other XML-Parser:

Default Namespaces in XPath

XML namespaces are detected automatically, but the default namespace has to be declared explicitly using an instance of DefaultNamespace. This instance must have a prefix. Any value can be chosen for the prefix:

@Test
public void hasXMPData_MatchingXPath_WithDefaultNamespace() throws Exception {
  String filename = "documentUnderTest.pdf";

  String xpathAsString = "//foo:format = 'application/pdf'";
  String stringDefaultNS = "http://purl.org/dc/elements/1.1/";
  DefaultNamespace defaultNS = new DefaultNamespace(stringDefaultNS);        
  XPathExpression expression = new XPathExpression(xpathAsString, defaultNS); 

  AssertThat.document(filename)
            .hasXMPData()
            .matchingXPath(expression)
  ;
}

The default namespace must be used not only with the class XPathExpression, but also with the class XMLNode:

@Test
public void hasXMPData_WithDefaultNamespace_SpecialNode() throws Exception {
  String filename = "documentUnderTest.pdf";

  String stringDefaultNS = "http://ns.adobe.com/xap/1.0/";
  DefaultNamespace defaultNS = new DefaultNamespace(stringDefaultNS);
  String nodeName  = "foo:ModifyDate";
  String nodeValue = "2011-02-08T15:04:19+01:00";
  XMLNode nodeModifyDate = new XMLNode(nodeName, nodeValue, defaultNS);

  AssertThat.document(filename)
            .hasXMPData()
            .withNode(nodeModifyDate)
  ;
}