Compare PDF reports

Best practices, code snippets for common functionality, examples, and guidelines.
Praveen597
Posts: 27
Joined: Sat Aug 27, 2011 9:41 am

Compare PDF reports

Post by Praveen597 » Sun Aug 16, 2015 2:28 pm

Hi All,

Can Ranorex support comparison of 2 PDF files ? Is there any inbuilt API ? (paid or unpaid, any thing is fine)

Thanks,
Praveen

User avatar
odklizec
Ranorex Guru
Ranorex Guru
Posts: 3933
Joined: Mon Aug 13, 2012 9:54 am
Location: Zilina, Slovakia

Re: Compare PDF reports

Post by odklizec » Sun Aug 16, 2015 3:58 pm

Hi,

There is no built-in support to directly compare two PDF files. By default, Ranorex is able to track and validate the elements in PDF file with enabled accessibility support (search this forum for PDF validation topics). Any other advanced validation (like comparing entire PDF files) could be done via user code, eventually by using a 3rd party PDF comparison library.
Pavel Kudrys
Ranorex explorer at Descartes Systems

Please add these details to your questions:
  • Ranorex Snapshot. Learn how to create one >here<
  • Ranorex xPath of problematic element(s)
  • Ranorex version
  • OS version
  • HW configuration

User avatar
jasoncleo
Posts: 37
Joined: Mon Jun 08, 2015 7:37 am

Re: Compare PDF reports

Post by jasoncleo » Tue Oct 13, 2015 3:51 am

PDF comparison is tricky, and it can be complicated by how the new and baseline PDFs are created. I don't know the technical explanation for it, but PDF allows for "layers", and each layer can have differing properties and attributes based on the content it holds.

This is important to know for a few reasons, because you can have two PDFs that look the same, but cannot be compared because the layers are different. For example, a PDF may have been "flattened" so that all the layers are merged and converted into simply a single layer image.

There are a few commercial PDF tools out there which also support comparison, and allow for integration with a .Net environment. Many of them are limited though as they'll compare basic layer structure and text content of the layers that have text, but they won't handle graphics/tables/charts well in the comparison, so some just skip those in the comparison altogether.

We ended up using the approach of an image comparison. We leveraged Spire .Net as the library to convert our PDF documents into page-by-page images, and then used a simple algorithm to build a 2-D array of pixel brightness/colour and then compare them that way with a set tolerance, going through a page at a time.

It was a crude mechanism, but works well enough for what we need. The benefit of the pixel array, is that it allowed us to output additional images that circled the spots where pages differed from the baseline for easy analysis.

Spire .Net isn't free (unless you want to use their hobbled 3 page version). It is possible to use GhostScript which is an opensource library, but you'd need to do a bit of work, and I didn't have the time for that.