Category Archives: pdfium

Making Chromium’s PDFium greater

One of the coolest things about working at Igalia, is that we have the chance to work on real world problems from day 1, either by improving an existing solution or creating new ones, based on open source technologies.

During the past month, I worked on a specific aspect of Chromium’s PDF engine, PDFium: extending PDFium’s capability so that Annotations can be manipulated by Acrobat JS scripting.

In practice, the goal was to make the specific scenario described on bug 492 to behave similarly in Chromium/PDFium, when compared to IE/Acrobat.

“Mission given, is mission accomplished”

The following Acrobat JS APIs were implemented:

Document::URL
Document::gotoNamedDest
Document::syncAnnotScan (only stubbed out)
Document::getAnnot
Document::getAnnots

… and Acrobat JS scripts like the snippet below can now be successfully executed by PDFium, making 200k+ PDF files to behave in Chromium/PDFium similarly to IE/Acrobat –
with regards to the way Annotations are manipulated.

1 this.disclosed = true;
2 this.syncAnnotScan();
3 var u = this.URL;
4 var args = u.split('calcrtid=');
5 var tags;
6 if (args.length > 1) {
7   tags = args[1].split('&');
8   this.gotoNamedDest(tags[0]);
9   var a = this.getAnnot(this.pageNum, tags[0]);
10   if (a != null) {
11     a.hidden = false;
12     var rect = a.rect;
13     if (rect != null) {
14       this.scroll(rect[2] + 100, rect[3] + 100);
15     }
16   }
17 }

Also as part of this project, it was possible to fix two discrepancies in PDFium, again if compared to Acrobat: 1) Hidden annotations were shown; 2) PDFium’s positioning Text Markup annotations.

About (2) specifically, on Acrobat if an “/AP” clause is present as part of an “/Annot” definition, the coordinates used to draw annotations come from “/Rect” values. On the other hand, when “/AP” is not defined, “/QuadPoints” values are used to grab the annotation coordinates from. The divergence was coming from the fact that PDFium uses “/Rect” regardless the presence or absence of “/AP”. This CL fixed it.

To wrap up the work, I was also able to extend PDFium’s main testing tool (pdfium_test) to actually run Acrobat JS tests (follow up), as well as make lots of driven-by clean ups, and change PDFium’s main client (Chromium) to deal better when its async nature.

The PDFium community

Throughout the project, interactions with the PDFium community were smooth:

  • PDFium contribution flow follows pretty much the Chromium’s:
    https://bugs.chromium.org to report issues and Codereview to get the work reviewed.
  • Once approved, a CL is committed through a “Commit Queue” bot – which ensures a CLs correctness by running PDFium regression suite.
  • Owners are really responsive.

Acknowledgments

My work in PDFium is sponsored by Bloomberg. Thank you all!

igalia-bloomberg