Thursday, September 28, 2017

Recipe and walkthrough - Join selected cells in a column

Again, I'll be posting a recipe first and then a detailed walkthrough below.

Recipe:
Given this truncated sample data:








Assuming that there are more entries that aren't being shown, if I only want to join the two subject headings in Column 4 for Ammanati, Bartolomeo:

  1.  Make sure you're in the row view before you proceed
  2. Add column based on any column, with whatever name (I'm calling it index), and "" in the Expression box.
  3.  Edit the very first cell in index, put in a placeholder value. (NOTE: put in one placeholder value only, doing two won't work.)
  4. Move index to the first column position.
  5. Switch to record view. (Note : this appears to be optional in Open Refine 2.9, but you may need to do it for older versions.)
  6.   Because I found it hard to read recipes with other people's column names, I am referring to Column 2, as the <test column>. (Since this is the column you use as the test condition for the if statement). Column 4 will be referred to as the <edit column>, to indicate which column I want to edit.
    1. Transform on <edit column> : if(cells["<test column>"].value.contains("Ammanati"), row.record.cells["<edit column>"].value[rowIndex] + ";" + row.record.cells["<edit column>"].value[rowIndex+1], value)
  7.  Transform on <test column> and create a placeholder row: if(row.record.cells["<test column>"].value[rowIndex-1].contains("Ammanati"), "place", value)
  8.  Switch to row view.
  9. Custom text facet on <test column>: value.contains("place"), select true
  10. Star the placeholder rows.
  11.  Remove the custom facet and facet by star
  12. All->edit rows->remove matching rows. 
  13. Remove the index column.

 Long, detailed explanation:

Friday, September 15, 2017

Custom text faceting with booleans cheat sheet



Search Criteria (up to Open Refine 2.8)
Custom text facet
Facet on strings that have both <string one> and <string two>
and (value.contains("<string one>"), value.contains("<string two>"))
Facet on strings that have either <string one> or <string two>
or (value.contains("<string one>"), value.contains("<string two>"))
Facet on strings that have <string one> and <string two> and <string three>
and(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>"))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that have <string one> or <string two> or <string three>
or(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>"))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that don’t contain <string one>
not(value.contains("<string one>")
Facet on strings that don’t contain <string one> or <string two>
not(or (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that contain (<string one> and <string two>) or <string 3>
or(and (value.contains("<string one>"), value.contains("<string two>")), value.contains("<string three>") )
regex search : Facet on strings that contain "Playstation" or "PlayStation" or "Nintendo"
or(isNotNull(value.match(/.*Play[sS]tation.*/)), value.contains("Nintendo"))

Search Criteria (Open Refine 3.0)
Custom text facet
Facet on strings that have both <string one> and <string two>
toString(and (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that have either <string one> or <string two>
toString(or (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that have <string one> and <string two> and <string three>
toString(and(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>")))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that have <string one> or <string two> or <string three>
toString(or(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>")))

Note: You can chain as many value.contains() statements as you want.
Facet on strings that don’t contain <string one>
toString(not(value.contains("<string one>"))
Facet on strings that don’t contain <string one> or <string two>
toString(not(or (value.contains("<string one>"), value.contains("<string two>"))))
Facet on strings that contain (<string one> and <string two>) or <string 3>
toString(or(and (value.contains("<string one>"), value.contains("<string two>")), value.contains("<string three>") ))
regex search : Facet on strings that contain "Playstation" or "PlayStation" or "Nintendo"
toString(or(isNotNull(value.match(/.*Play[sS]tation.*/)), value.contains("Nintendo")))