Friday, July 28, 2017

Using the Marcedit capturing group Find/Replace in Open Refine, pt.2 : putting it into an if function.

This is a continuation from this post about how to mimic Marcedit capturing group Find/Replace in Open Refine.

The one problem with mimicking the Marcedit Find/Replace is that it will wipe out the data you don't want to alter unless you facet it out first. This is kind of cumbersome and still doesn't have the convenience of Marcedit's Find/Replace.

However, the if function takes care of that problem. I'm going to post the recipe here at the top and then go into detail further down. So, for your capturing group find of value.match(/<capturing group regex>/), do a transform cells, and drop it into this statement:

if(isNotNull(value.match(/<your find>/)), <your replace>, value)

 

So, for example, with the Marcedit data from my previous post, to get rid of the indicator before any beginning subfield a, I carved up my data into two capturing groups, using value.match(/(.)(\$.*)/)
This is the <capturing group regex> component of the statement.

I want to keep value.match(/(.)(\$.*)/)[1] only, so this is the <whatever manipulation you want to do with your capturing groups> component.

When I drop it in to the if function, it becomes this:
if(isNotNull(value.match(/(.)(\$.*)/)), value.match(/(.)(\$.*)/)[1], value)















How it all works
If you can write a function that returns a boolean, you're halfway to writing a complete if function.
Here's how an if works:

if ( <boolean generating GREL>, <if boolean = true, execute the functions here>, <if boolean=false, execute the functions here>)

Now, if we're trying to mimic find/replace, we have two problems:
1) value.match () doesn't generate a true/false. It generates null, or an array that mimics the capturing groups. (if you want the whys, see this post.)
2) we don't want to do anything to the strings that our find doesn't match.

1) is easy to solve. isNotNull() wrapped around value.match() will generate the true/false you need.
2) this is also easy. When you want to keep the current data, you just put in "value", so you do the same with the <if boolean=false, execute the functions here>.

Now, the if function becomes:
if(isNotNull(value.match(/<your find>/)), <your replace>, value)

Doing a Marcedit Find/Replace across columns
You can expand the Marcedit find/replace across columns, too. Here's my sample data:







With this set of data, the indicator is in the wrong column. So, in addition to getting rid of the indicator in the Content column, I want to join it to the data in the Indicator column.

The first thing I would do is join the indicator from the Content column to the Indicator column. To do this, I would first start a transform on the Indicator column.

If you remember, you can access the data other columns other than the one you're working on by using row.cells.<column name>.value, instead of value.  So, instead of using value.match, I would use row.cells.Content.value.match

Now, that I've accessed the data, all I want to do is put the indicator in a capturing group so that I can bring it over to the Indicators column. So, my regex would be: row.cells.Content.value.match(/(.)(\$.*)/)

Just to make sure I did it correctly, I typed it in and verified that everything was getting split the way I wanted it to:












The "\\" in element zero (aka the first capturing group) looks concerning, so to double check, I had Open Refine output it:
 row.cells.Content.value.match(/(.)(\$.*)/)[0]












As you noticed, the extra \ disappeared. My guess is that it was temporarily put in the array so that Open Refine wouldn't identify it as a special character, and it was removed once you accessed it.

I recommend doing this anytime you're unsure what elements will output from match.

Now, I'm going to grab the if statement recipe:
if(isNotNull(value.match(/<your find>/)), <your replace>, value)

And I'm going to drop in my match statement:
if(isNotNull(row.cells.Content.value.match(/(.)(\$.*)/), <your replace>, value)

Part two is figuring out what I want to do with the replace. I just want to concatenate the current indicator to the indicator in the first capturing group. So, that would be: value + row.cells.Content.value.match(/(.)(\$.*)[0]

Putting it all together:
if(isNotNull(row.cells.Content.value.match(/(.)(\$.*)/)), value + row.cells.Content.value.match(/(.)(\$.*)/) [0], value)





No comments: