Wednesday, October 4, 2017

Recipe: Populating a column with sequential call numbers

You can actually fill a column in Open Refine with sequential call numbers. All you have to do is take advantage of the fact that row.index returns an integer.

Our local accession numbers for video games are GVD<number>, so if I wanted to populate a spreadsheet with :

GVD1100
GVD1101
GVD1102
GVD1103
etc.

You'd do : "GVD" + (row.index + 1100)
or
"<prefix>" + (row.index + <starting call number>) 

Monday, October 2, 2017

Recipe and walkthrough - Joining the first two cells in a column and moving the third up

 If I had this sample spreadsheet:











And I wanted to transform it to this spreadsheet:



I can use the column array conversion, a trick with array arithmetic, and nested ifs to do so in one GREL expression. Or I can use three transforms.

Assuming that all data in the "Subject" column is three subject and only three subjects per call#, and that there are no blank lines in the column, do the following:

1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform with: if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], value)
5. Transform subject column again: if(mod(rowIndex, 3) == 1, row.record.cells["Subject"].value[rowIndex+1], value)
6. Last transform: if(mod(rowIndex, 3) == 2, "", value)
7. If you were in record view, switch back to rows. Delete placeholder column.

OR:

1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform on the subject column and use this piece of GREL:

if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], if(mod(rowIndex,3)== 1, row.record.cells["Subject"].value[rowIndex+1], ""))
5. If you were in record view, switch back to rows. Delete placeholder column.


Thursday, September 28, 2017

Recipe and walkthrough - Join selected cells in a column

Again, I'll be posting a recipe first and then a detailed walkthrough below.

Recipe:
Given this truncated sample data:








Assuming that there are more entries that aren't being shown, if I only want to join the two subject headings in Column 4 for Ammanati, Bartolomeo:

  1.  Make sure you're in the row view before you proceed
  2. Add column based on any column, with whatever name (I'm calling it index), and "" in the Expression box.
  3.  Edit the very first cell in index, put in a placeholder value. (NOTE: put in one placeholder value only, doing two won't work.)
  4. Move index to the first column position.
  5. Switch to record view. (Note : this appears to be optional in Open Refine 2.9, but you may need to do it for older versions.)
  6.   Because I found it hard to read recipes with other people's column names, I am referring to Column 2, as the <test column>. (Since this is the column you use as the test condition for the if statement). Column 4 will be referred to as the <edit column>, to indicate which column I want to edit.
    1. Transform on <edit column> : if(cells["<test column>"].value.contains("Ammanati"), row.record.cells["<edit column>"].value[rowIndex] + ";" + row.record.cells["<edit column>"].value[rowIndex+1], value)
  7.  Transform on <test column> and create a placeholder row: if(row.record.cells["<test column>"].value[rowIndex-1].contains("Ammanati"), "place", value)
  8.  Switch to row view.
  9. Custom text facet on <test column>: value.contains("place"), select true
  10. Star the placeholder rows.
  11.  Remove the custom facet and facet by star
  12. All->edit rows->remove matching rows. 
  13. Remove the index column.

 Long, detailed explanation:

Friday, September 15, 2017

Custom text faceting with booleans cheat sheet



Search Criteria (up to Open Refine 2.8)
Custom text facet
Facet on strings that have both <string one> and <string two>
and (value.contains("<string one>"), value.contains("<string two>"))
Facet on strings that have either <string one> or <string two>
or (value.contains("<string one>"), value.contains("<string two>"))
Facet on strings that have <string one> and <string two> and <string three>
and(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>"))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that have <string one> or <string two> or <string three>
or(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>"))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that don’t contain <string one>
not(value.contains("<string one>")
Facet on strings that don’t contain <string one> or <string two>
not(or (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that contain (<string one> and <string two>) or <string 3>
or(and (value.contains("<string one>"), value.contains("<string two>")), value.contains("<string three>") )
regex search : Facet on strings that contain "Playstation" or "PlayStation" or "Nintendo"
or(isNotNull(value.match(/.*Play[sS]tation.*/)), value.contains("Nintendo"))

Search Criteria (Open Refine 3.0)
Custom text facet
Facet on strings that have both <string one> and <string two>
toString(and (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that have either <string one> or <string two>
toString(or (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that have <string one> and <string two> and <string three>
toString(and(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>")))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that have <string one> or <string two> or <string three>
toString(or(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>")))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that don’t contain <string one>
toString(not(value.contains("<string one>"))
Facet on strings that don’t contain <string one> or <string two>
toString(not(or (value.contains("<string one>"), value.contains("<string two>"))))
Facet on strings that contain (<string one> and <string two>) or <string 3>
toString(or(and (value.contains("<string one>"), value.contains("<string two>")), value.contains("<string three>") ))
regex search : Facet on strings that contain "Playstation" or "PlayStation" or "Nintendo"
toString(or(isNotNull(value.match(/.*Play[sS]tation.*/)), value.contains("Nintendo")))