For example, in this import of Marcedit records, some of the subfield a's have an indicator in front of them:
Click on image to enlarge |
In Marcedit, I'd normally do this:
click on image to enlarge |
But I couldn't figure out how to do the equivalent in Open Refine. value.match seemed like a contender, but I couldn't figure out how to access the array elements. I finally figured it out today --
the array is not stored in a variable that you need to name to access, instead, you directly have to use the value.match expression and an index to get a hold of the elements. (you'll see down below what I'm talking about.)
Step 1 - Facet your data first, if you can. If you can't, use an if statement (which I'll cover in another post)
Step 2 -is pretty familiar if you're used to Marcedit - do a transform on the "Contents" column and use the same regular expression you would use in Marcedit for value.match. Note, the syntax is value.match(/<reg ex>/) not value.match(<reg ex>)
Click to enlarge |
Now, if you see lines 23-24, you'll notice that instead of capturing groups, the data has been pushed into an array. This is the equivalent of the Marcedit capturing groups. The difference between Marcedit and Open Refine is that instead of 1, 2, and 3 for the group labels, you will use an array index of one less - indexes 0, 1, 2. And instead of using $ to denote the group, you will use value.match(/<capturing group reg ex>/)
Or:
$1 = value.match(/<capturing group reg ex>/)[0]
$2 = value.match(/<capturing group reg ex>/)[1]
$3 = value.match(/<capturing group reg ex>/)[2]
Step 3: Now that you have a way of accessing the array elements, all you have to do to leave out the first group is to do a concatenate in Open Refine: value.match(/<capturing group reg ex>/)[1] +
value.match(/<capturing group reg ex>/)[2]
Click to enlarge |
Now that there's a way to mimic the capturing groups functionality of Marcedit, that means you can do the same type of rearranging. So, say that in addition to removing the indicator, I wanted the $b in lines 22-24 to be moved to the end, I would first write my regex and check to see that the groups are divided up correctly:
click to enlarge |
click to enlarge |
And if you wanted to add some text between subfields, that's easy enough to do, too.
Click to enlarge |
No comments:
Post a Comment