3.7.  Document Properties

Overview

PDF documents contain information about title, author, keywords and other properties. These standard properties can be extended by individual key-value data. Such metadata are playing an ever increasing role in the context of search engines and archive systems, so PDF document properties should be set wisely. PDFUnit provides some test to verify them.

An example of very poor document properties is a PDF document entitled jfqd231.tmp (that really is its title). Nobody will ever search for that and therefore it will never be found. It is a document type on a typewriter by an U.S. government organization that was scanned in 1993. But not only is the title useless, also the file name lacks any meaning. So the benefit of this document is only marginally greater than if it didn't exist at all.

The following tags are available:

<!-- Tags to test document properties: -->

<hasAuthor   />
<hasCreator  />
<hasKeywords />
<hasProducer />
<hasProperty />
<hasSubject  />
<hasTitle    />

<hasNoAuthor   />
<hasNoCreator  />
<hasNoKeywords />
<hasNoProducer />
<hasNoProperty />
<hasNoSubject  />
<hasNoTitle    />

<hasCreationDate           />
<hasCreationDateAfter      />
<hasCreationDateBefore     />
<hasModificationDate       />
<hasModificationDateAfter  />
<hasModificationDateBefore />
<hasNoCreationDate         />
<hasNoModificationDate     />

Document properties of a test document can also be compared with the properties of a master document. Such tests are described in chapter 4.7: “Comparing Document Properties”.

Testing the Author ...

You can verify the author of a document manually with any PDF reader, but an automated test is quicker.

It is very simple to check whether a document has any value for the property author:

<testcase name="hasAuthor">
  <assertThat testDocument="documentInfo/documentInfo_allInfo.pdf">
    <hasAuthor />
  </assertThat>
</testcase>

Use the tag <hasNoAuthor /> to verify that the property author does not exist:

<testcase name="hasNoAuthor">
  <assertThat testDocument="documentInfo_noAuthorTitleSubjectKeywordsApplication.pdf">
    <hasNoAuthor /> 
  </assertThat>
</testcase>

The next test verifies the value of the property author:

<testcase name="hasAuthor_matchingComplete">
  <assertThat testDocument="documentInfo/documentInfo_allInfo.pdf">
    <hasAuthor>
      <matchingComplete>PDFUnit.com</matchingComplete>
    </hasAuthor>
  </assertThat>
</testcase>

There are several tags to compare an expected property value with the actual one. The names are self-explanatory:

<!-- Comparing text for author, creator, keywords, producer, subject, title: -->

<containing       />
<endingWith       />
<matchingComplete />
<matchingRegex    />
<notContaining    />
<notMatchingRegex />
<startingWith     />

Whitespaces are not changed when executing these tags. Typically property values are short, so the test-developer has to use whitespaces in a correct way.

Each comparison is case sensitive.

The implementation of the tag <matchingRegex /> follows the rules of java.util.regex.Pattern .

... and Creator, Keywords, Producer, Subject and Title

Tests on the content of creator, keywords, producer, subject and title work just like those for Author above.

Each property has it's own tag <hasXXX /> and <hasNoXXX />.

You can combine multiple tags in one test:

<!-- Multiple string comparisons are possible -->
<testcase name="hasKeywords_allTextComparingTags">
  <assertThat testDocument="documentInfo/documentInfo_allInfo.pdf">
    <hasKeywords>
      <notContaining>--</notContaining>
    </hasKeywords>
    <hasKeywords>
      <matchingRegex>.*key.*</matchingRegex>
    </hasKeywords>
    <hasKeywords>
      <startingWith>PDFUnit</startingWith>
    </hasKeywords>
  </assertThat>
</testcase>

But such a test is not recommended because the name of the test is not specific enough.

Common Validation as a Key-Value Pair

All tests for document properties shown in the previous sections can also be implemented with the general tag <hasProperty />:

<testcase name="hasProperty_StandardProperties">
  <assertThat testDocument="customproperties/Leitfaden_Elektronische_Signatur.pdf">
    <hasProperty name="Title">
      <matchingComplete>PDFUnit sample - Demo for Document Infos</matchingComplete>
    </hasProperty>
    <hasProperty name="Subject">
      <matchingComplete>Demo for Document Infos</matchingComplete>
    </hasProperty>
    <hasProperty name="CreationDate">
      <matchingComplete>D:20131027172417+01'00'</matchingComplete>
    </hasProperty>
    <hasProperty name="ModDate">
      <matchingComplete>D:20131027172417+01'00'</matchingComplete>
    </hasProperty>
  </assertThat>
</testcase>

<hasProperty /> validates any document property as a key-value pair:

The PDF document in the following example has two custom properties as can be seen with Adobe Reader®:

And this is the test for custom properties:

<testcase name="hasProperty_CustomProperties">
  <assertThat testDocument="customproperties/Leitfaden_Elektronische_Signatur.pdf">
    <hasProperty name="Company">
      <matchingComplete>Signature Perfect KG</matchingComplete>
    </hasProperty>
    <hasProperty name="SourceModified">
      <matchingComplete>D:20081204045205</matchingComplete>
    </hasProperty>
  </assertThat>
</testcase>

To ensure that a property does not exist, see the following test:

<testcase name="hasNoProperty">
  <assertThat testDocument="customproperties/Leitfaden_Elektronische_Signatur.pdf">
    <hasNoProperty name="OldProperty_ShouldNotExist" />
  </assertThat>
</testcase>

PDF documents of version PDF-1.4 or higher can have metadata as XML (Extensible Metadata Platform, XMP). Chapter 3.31: “XMP Data” explains that in detail.