Friday, June 30, 2017

Workflow - Merging, reconciling and separating out subject terms

In upgrading legacy metadata, I spend a lot of time wrestling with controlled or not-so-very controlled terms:  LCSH terms, local terms, red terms, blue terms, FAST terms, slow terms.

Here's a quick, not-overly-detailed sample workflow for 1) merging multiple columns containing subject terms, 2) reconciling subject terms to LC's vocabularies, and 3) splitting matched (LC) and unmatched (non-LC --> local) subject terms into two separate columns.

This workflow assumes that you have set up reconciliation services.  If you have not, Freeyourmetadata.org has easy-to-follow instructions.  I strongly recommend that you locally host a data dump of whichever vocabulary you are wanting to reconcile against so that 1) the process will run faster, and 2) you aren't taxing someone else's server.

Recipe - Removing the first or last terms from a delimited string

Given the following sample data:


If I wanted to remove the first delimited term from each row (e.g. "Santa Cruz Beach Boardwalk" from row 1 and "Surfing" from all the other rows, I would use the following recipe:

Edit cells->Transform













and the following GREL expression:

value.split("<delimiter>").slice(1).join("<delimiter>")

where <delimiter> is whatever delimiter is being used for your data string.




















Alternatively, if you wanted to remove the last delimited term (in this case, "Santa Cruz"),
you would do the transform as above, but you would use this GREL expression:
value.split("<delimiter>").slice(0,length(value.split("<delimiter>"))-1.join("<delimiter>")

The -1 is needed because of the way Open Refine indexes an array. (I'll cover arrays and indexing in a future post)




Introduction to the all new blog

Most of the Open Refine documentation as it exists is aimed at people with a programming background. Rachel and I wanted to have a central repository for the documentation I developed for the Metadata Services department at UC Santa Cruz.

We also wanted to have a place for other librarians to archive their own Open Refine recipes/documentation. If you want yours posted or if you have a URL to a useful recipe from someone else, please email tyleet@ucsc.edu or jaffer@ucsc.edu