OS X Service for Copying Comments from PDFs

Today I was trying out ReaddleDocs on the iPad for reading PDFs of papers. This app is not as full-featured as GoodReader but it importantly supports syncing of entire folders with Dropbox, and I like the cleaner interface.[1]

While apps like this make it very easy to annotate PDFs with comments, which can then be synced across all your devices through the power of Dropbox, getting these comments back out again once you get back to your Mac can be a pain.

This is a ruby script that uses the pdf-reader gem to find all the comments in a PDF document and copy them to the clipboard:

require 'rubygems'
require 'pdf-reader'

ARGV.each do |file|

  pdf = PDF::Reader.new( file )

  comments = []
  pdf.objects.each_pair do |key, value| 
    if value.class == Hash
      comments << "* #{value[:Contents]}\n\n" if value[:Name] == :Comment

  IO.popen('pbcopy', 'w').puts comments


This script can be accessed as a Service with the following setup in Automator:

The only complication here is Automator’s insistence on using the system version of ruby (1.8.7) instead of v. 1.9.3 I have installed elsewhere in my path, which meant that the service initially could not find the pdf-reader gem. This was fixed by reinstalling pdf-reader into the system gem path with /usr/bin/gem install pdf-reader.

I would also like to be able to pull out highlighted text, and to add a page number to each comment in the copied output, but these appear to be advanced topics.[2]

Update: An alternative workflow using AppleScript and PDFpen is described here.

  1. The only feature that I miss from GoodReader is the ability to crop all the pages in a document to remove margins.  ↩

  2. Or left as an exercise for the reader.  ↩

Custom Cite Keys in Papers using TextExpander

I use Papers for cataloguing PDFs, which has made it a lot easier to keep track of numerous resources when researching a field or writing a paper.

Papers will automatically generate cite keys; identifier strings that can be used to reference a publication in a manuscript. For example, in LaTeX publications can be referenced using \cite{CITE_KEY} and the bibliographic information included in a bibliography section at the end of the document, or extracted from a linked bibTeX database. While Papers has the noble goal of generating “universal cite keys” to allow collaborating authors to keep compatible library databases, I came to Papers with a large bibTeX database and my own cite key scheme. This scheme defines cite keys as follows:

  • Single author: [Surname]_[Journal][Year], for example Morgan_JImprobRes2012.

  • Two authors: [First_Surname][Second_Surname]_[Journal][Year] for example MorganAndCoauthor_JImprobRes2012.

  • More than two authors: [First_Surname]EtAl_[Journal][Year] for example MorganEtAl_JImprobRes2012.

This is usually sufficient to uniquely identify any paper in my library. In the case of duplicate entries from particularly prodigious authors I then append letters e.g. MorganEtAl_JImprobRes2012a

I find these cite keys more easily human readable than those generated by Papers, which are of the form Morgan:2012sj, or even the possible auto generated cite keys in BibDesk. When reading a LaTeX file I find it easier to remember which paper I referred to, since I can see the journal title, and have some idea about the number of authors. The second advantage is that these cite keys are easily human generated. If I want to reference a paper that is not yet in my database I can type in an appropriate cite key and carry on writing without having to worry about editing a BibTeX entry until later.

Unfortunately Papers does not support user defined cite keys, and editing the metadata by hand is a one of those tasks that only takes a few seconds but becomes tedious for a large number of papers. As a workaround I now use a TextExpander snippet that builds a cite key from the metadata present in Papers.[1] This runs the following ruby script:

require 'appscript'
require 'osax'

def records # still hoping that Papers2 will become AppleScriptable
  papers = Appscript.app("/Applications/Papers2.app")
  system_events = Appscript.app("System Events.app")

system_events.processes["Papers2"].menu_bars[1].menu_bar_items['Edit'].menus['Edit'].menu_items['Copy As'].menus['Copy As'].menu_items['BibTeX Record'].click 
  clipboard = OSAX.osax.the_clipboard
  return Paper.new(clipboard.split(",\r")[1..-1])

class Paper < Hash

  def initialize( strings )
    strings.each do |string|
      key, value = string.split" = "
      self[ key ] = clean( value )

  def clean( string )
    to_substitute = [ [ /\\\"([a-z])/, "\\1" ], # ö
                      [ /\\\'([a-z])/, "\\1" ], # é 
                      [ /\\c /, "" ],           # ç
                      [ /\\v /, "" ],           # š
                      [ /\\([a-z])/, "\\1"] ]   # others, e.g. \\e remaining after string.delete( to_remove )
    to_remove = "\{\}\r\'"
    string.delete( to_remove ).gsub_from_array( to_substitute )

  def author_list
    authors = self["author"].split(" and ").collect{ |string| Author.new( string ) }

    return author_list = case authors.length
      when 1 then authors[0].cite_name
      when 2 then authors[0].cite_name + "And" + authors[1].cite_name
      else authors[0].cite_name + "EtAl"

  def journal
    return self["journal"].delete(" :.\-")

  def year
    return self["year"]

  def cite_string
    self.author_list + '_' + self.journal + self.year


class Author
  def initialize( bibtex_string ) # with format e.g "Morgan, B. J."
    @surname = bibtex_string.split(",")[0]
    @forenames = bibtex_string.split(",")[1].split( )

  def cite_name
    @surname.gsub(" ","")

class String
  def gsub_from_array( gsub_array )
    gsub_array.inject(self) { | string, sub | string.gsub( sub[0], sub[1] ) }

print records.cite_string

This uses appscript [2] and some GUI scripting to access the Papers command “Copy as BibTeX Record” to extract the metadata for the currently selected paper. The rest of the script is straight Ruby, and prints the appropriate cite key.

The end result is that I can edit the auto-generated cite key for the selected paper and type ;cite to change the entry to my own format.[3] This is one of those little reductions in friction that makes life better.[4]

This TextExpander snippet is available on GitHub.

  1. The ability of Papers to extract metadata from a repository such as Web of Knowledge is a huge timesaver here.  ↩

  2. I know appscript is deprecated but I am resisting rewriting in AppleScript. I am sure this policy will come back to bite me when appscript eventually stops working.  ↩

  3. Now if only the Papers developers would add AppleScript support then this could be rolled into a script where all my “incorrect” cite keys could be fixed in one go.  ↩

  4. Especially since it removes the opportunity for typing mistakes which are much harder to find when they show up later as an “undefined citations” error from LaTeX.  ↩