4.14.  Comparing Text

PDFUnit can compare text on any page of a test document with the corresponding page of a referenced document. The following simple example compares the first and last page of two documents. Please note that whitespaces are normalized:

@Test
public void haveSameText_OnSinglePage() throws Exception {
  String filenameTest = "documentUnderTest.pdf";
  String filenameReference = "reference.pdf";

  AssertThat.document(filenameTest)
            .and(filenameReference)
            .restrictedTo(FIRST_PAGE)
            .haveSameText()
  ;

  AssertThat.document(filenameTest)
            .and(filenameReference)
            .restrictedTo(LAST_PAGE)
            .haveSameText()
  ;
}

You can restrict the test to selected pages which is explained in chapter 13.2: “Page Selection”:

And you can restrict the comparison to a section of a page:

@Test
public void haveSameText_InPageRegion() throws Exception {
  String filenameTest = "documentUnderTest.pdf";
  String filenameReference = "reference.pdf";

  int leftX   =  17;
  int upperY  = 254;
  int width   =  53;
  int height  =  11; 
  PageRegion region = new PageRegion(leftX, upperY, width, height);

  AssertThat.document(filenameTest)
            .and(filenameReference)
            .restrictedTo(EVERY_PAGE)
            .restrictedTo(region)
            .haveSameText()
  ;
}

The treatment of white space can be controlled in the same kind as in other tests:

@Test
public void haveSameText_IgnoreWhitespace() throws Exception {
  String filenameTest = "documentUnderTest.pdf";
  String filenameReference = "reference.pdf";
  
  int leftX   =  17;
  int upperY  = 254;
  int width   =  53;
  int height  =  11; 
  PageRegion region = new PageRegion(leftX, upperY, width, height);

  AssertThat.document(filenameTest)
            .and(filenameReference)
            .restrictedTo(FIRST_PAGE)
            .restrictedTo(region)
            .haveSameText(WhitespaceProcessing.IGNORE)
  ;
}