Parsing literature searches

Rob MacLeod


It is nice to be able to stick the results of a literature search into a database. Here I describe a program (script actually) called parseref that converts the output of the search into something BibTEX can understand. The whole package consists of a Unix shell script and a set of awk programs. To run this, you will need an awk program, included automatically for Unix and the Mac OSX but available as a utility for Windoze. In fact, you will need the GNU awk program, called gawk on many systems, for some of the functions in the program.

1 The Code

Right click (control-click on the Mac) on the links below to download each of the files:

To work properly, all the files must be in the same directory as the files you wish to convert, or they must at least be available in the PATH (or whatever it is called in your operating system).

2 Using parseref

At present, the Unix script I have written, called parseref.sh, works only for searches of the PubMed version of Medline and the CPX database for engineering papers and conference proceedings. If there is demand for other search engines, please let me know and I can whip something up. CPX seems to have disappeared so if you find a new link to it, also please let me know.

Here are the steps required to do the parsing.

2.1 The Search

  1. Go to one of the search engines listed above to find the reference(s) you want to keep for your BibTEX database.
  2. In PubMed, look for the menu item that usually says ``SUMMARY''-it is next to the ``Display'' button. From this menu, select the option ``MEDLINE'' and then hit ``Display''. The result will be a detailed listing of all the elements of the reference citations, not meant for human reading, but great for computer programs.
  3. In CPX, the routine is similar, but this time there is a button labeled ``Download format'' near the top of the window. Push it, and all the selected citations will some out in a format that the programs can read.
  4. Highlight the part of this listing that includes the reference(s) you want to keep and then copy and paste it into a standard text editor window (e.g., emacs).
  5. When you are finished saving all the references you want to keep, save the file.

2.2 The Parsing

  1. Run the program parseref on the file you have saved as follows:
               parseref.sh -p/-c [-l] infilename
    
    where you must select -p for a file in PubMed format or -c for one in CPX format. The -l option controls the formatting of authors' names--the default is Initials Last-name, e.g., ``R.S. MacLeod'' but for the -l option, we use Last-name, Initials, e.g., ``MacLeod, R.S.'' If you forget the arguments, just type parseref by itself and you will be some help.
  2. The result will be a file with .bib extension with the same base filename as your input file. This should be in BibTEX format and so will fit into your existing file.

You can also run the awk script directly if you have problems with the Unix shell script as follows (for pubmed formated entries):

    gawk -f parsepubmed.awk [authortype=lastnamefirst] infilename >> outfilename
where infilename and outfilename are the input and output files, respectively.

Note that the output file may also contain some strings that we use a lot to ensure consistent listings for journal names. The current set of such strings is as follows:

  @String{j-BME = "IEEE Trans Biomed Eng"}
  @String{j-CR = "Circ Res"}
  @String{j-C = "Circulation"}
  @String{j-AJP = "Am J Physiol"}
  @String{j-ABE = "Ann Biomed Eng"}
  @String{j-JE = "J Electrocardiol"}

These will appear at the end of the output file and you can either use them or replace the journal variables with whatever strings you prefer.

About this document ...

Parsing literature searches

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 3 -no_white -link 3 -no_navigation -no_math -html_version 3.2,math -show_section_numbers -local_icons parseref

The translation was initiated by Rob Macleod on 2005-07-06


Rob Macleod 2005-07-06