Using PDFpen AppleScript to Copy Comments from PDFs

Last week I posted a script for extracting the contents of note annotations from PDFs. An alternative approach uses the comprehensive scripting library of PDFpen.[1] One advantage of this method is that the notes, which inherit from the imprint class, are contained within separate pages, allowing page numbers to be accessed. The AppleScript is as follows:

tell application "PDFpen"
    set finalResult to ""
    set pageNumber to 0
    repeat with aPage in pages of document 1
        set pageResult to ""
        set pageNumber to pageNumber + 1
        repeat with theImprint in imprints of aPage
            if the class of theImprint is equal to note then
                set theResult to rich text of theImprint
                set pageResult to pageResult & "* " & theResult & "

"
            end if
        end repeat -- imprints
        if pageResult is not equal to "" then
            set finalResult to finalResult & "page " & (pageNumber as string) & ":

" & pageResult
        end if
    end repeat -- pages
end tell
set the clipboard to the finalResult

This script can be accessed from inside PDFpen by saving it to /Users/name/Library/Application Support/PDFpen/Scripts, or combined with a Open Finder Items action in Automator to build a Service. Running as a Service will crash the AppleScript above if the PDF has not finished opening. Adding the following code makes the main Applescript wait before trying to process the PDF:

repeat
    if document 1 of application "PDFpen" exists then exit repeat
    delay 0.5
end repeat

  1. I was a little surprised to learn that Preview offers zero AppleScript support out of the box.  ↩

OS X Service for Copying Comments from PDFs

Today I was trying out ReaddleDocs on the iPad for reading PDFs of papers. This app is not as full-featured as GoodReader but it importantly supports syncing of entire folders with Dropbox, and I like the cleaner interface.[1]

While apps like this make it very easy to annotate PDFs with comments, which can then be synced across all your devices through the power of Dropbox, getting these comments back out again once you get back to your Mac can be a pain.

This is a ruby script that uses the pdf-reader gem to find all the comments in a PDF document and copy them to the clipboard:

require 'rubygems'
require 'pdf-reader'

ARGV.each do |file|

  pdf = PDF::Reader.new( file )

  comments = []
  pdf.objects.each_pair do |key, value| 
    if value.class == Hash
      comments << "* #{value[:Contents]}\n\n" if value[:Name] == :Comment
    end
  end

  IO.popen('pbcopy', 'w').puts comments

end

This script can be accessed as a Service with the following setup in Automator:

The only complication here is Automator’s insistence on using the system version of ruby (1.8.7) instead of v. 1.9.3 I have installed elsewhere in my path, which meant that the service initially could not find the pdf-reader gem. This was fixed by reinstalling pdf-reader into the system gem path with /usr/bin/gem install pdf-reader.

I would also like to be able to pull out highlighted text, and to add a page number to each comment in the copied output, but these appear to be advanced topics.[2]

Update: An alternative workflow using AppleScript and PDFpen is described here.


  1. The only feature that I miss from GoodReader is the ability to crop all the pages in a document to remove margins.  ↩

  2. Or left as an exercise for the reader.  ↩