I've been working a lot with Alma Analytics, and before the February release, our 035 field was repeated. This unfortunately meant that if I used the "network number" field for Analytics, I would get this:
(CaPaEBR)ebr10975975; (OCoLC)654843296; (CU-SC)b44543517-01cdl_scr_inst
You could use a formula in Analytics to give you just the OCLC number, but you could also do it in Open Refine. The recipe is this:
forEach(value.split(";"), v, if (contains(v, "<prefix label for the number you want to split out>"), v, "")).join("")
So, for example, if I want to split out the OCLC number, I would use:
forEach(value.split(";"), v, if (contains(v, "(OCoLC)"), v, "")).join("")
If I wanted the CaPaEBR number, I would use:
forEach(value.split(";"), v, if (contains(v, "(CaPaEBR)"), v, "")).join("")
If I wanted our old bib number (which is the last number in this string), I would use:
forEach(value.split(";"), v, if (contains(v, "(CU-SC)"), v, "")).join("")
Showing posts with label recipe. Show all posts
Showing posts with label recipe. Show all posts
Wednesday, February 19, 2020
Friday, January 25, 2019
Recipe: Using filters to identify all the blank columns in a spreadsheet
Rachel had an interesting problem for me - she had a spreadsheet with a plethora of columns. Some had information, and some were blank. It was tedious trying to use faceting to identify the blank columns, so she wondered if there was a way to quickly identify them in Open Refine.
The answer is quite simple, first identify how many total rows you have on your spreadsheet. Then, go to any column in Open Refine.
The answer is quite simple, first identify how many total rows you have on your spreadsheet. Then, go to any column in Open Refine.
Then do facet->custom text facet.
In the box, enter the following GREL:
filter(row.columnNames,cn, isBlank(cells[cn].value))
This will make a facet out of each column, with the number of blank rows listed to the right:
There are 414 rows total on this spreadsheet, so any column with "414" next to it is completely blank. |
Wednesday, October 4, 2017
Recipe: Populating a column with sequential call numbers
You can actually fill a column in Open Refine with sequential call numbers. All you have to do is take advantage of the fact that row.index returns an integer.
Our local accession numbers for video games are GVD<number>, so if I wanted to populate a spreadsheet with :
GVD1100
GVD1101
GVD1102
GVD1103
etc.
You'd do : "GVD" + (row.index + 1100)
or
"<prefix>" + (row.index + <starting call number>)
Our local accession numbers for video games are GVD<number>, so if I wanted to populate a spreadsheet with :
GVD1100
GVD1101
GVD1102
GVD1103
etc.
You'd do : "GVD" + (row.index + 1100)
or
"<prefix>" + (row.index + <starting call number>)
Monday, October 2, 2017
Recipe and walkthrough - Joining the first two cells in a column and moving the third up
If I had this sample spreadsheet:
And I wanted to transform it to this spreadsheet:
I can use the column array conversion, a trick with array arithmetic, and nested ifs to do so in one GREL expression. Or I can use three transforms.
Assuming that all data in the "Subject" column is three subject and only three subjects per call#, and that there are no blank lines in the column, do the following:
1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform with: if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], value)
5. Transform subject column again: if(mod(rowIndex, 3) == 1, row.record.cells["Subject"].value[rowIndex+1], value)
6. Last transform: if(mod(rowIndex, 3) == 2, "", value)
7. If you were in record view, switch back to rows. Delete placeholder column.
OR:
1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform on the subject column and use this piece of GREL:
if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], if(mod(rowIndex,3)== 1, row.record.cells["Subject"].value[rowIndex+1], ""))
5. If you were in record view, switch back to rows. Delete placeholder column.
And I wanted to transform it to this spreadsheet:
I can use the column array conversion, a trick with array arithmetic, and nested ifs to do so in one GREL expression. Or I can use three transforms.
Assuming that all data in the "Subject" column is three subject and only three subjects per call#, and that there are no blank lines in the column, do the following:
1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform with: if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], value)
5. Transform subject column again: if(mod(rowIndex, 3) == 1, row.record.cells["Subject"].value[rowIndex+1], value)
6. Last transform: if(mod(rowIndex, 3) == 2, "", value)
7. If you were in record view, switch back to rows. Delete placeholder column.
OR:
1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform on the subject column and use this piece of GREL:
if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], if(mod(rowIndex,3)== 1, row.record.cells["Subject"].value[rowIndex+1], ""))
5. If you were in record view, switch back to rows. Delete placeholder column.
Thursday, September 28, 2017
Recipe and walkthrough - Join selected cells in a column
Recipe:
Given this truncated sample data:
Assuming that there are more entries that aren't being shown, if I only want to join the two subject headings in Column 4 for Ammanati, Bartolomeo:
- Make sure you're in the row view before you proceed
- Add column based on any column, with whatever name (I'm calling it index), and "" in the Expression box.
- Edit the very first cell in index, put in a placeholder value. (NOTE: put in one placeholder value only, doing two won't work.)
- Move index to the first column position.
- Switch to record view. (Note : this appears to be optional in Open Refine 2.9, but you may need to do it for older versions.)
- Because I found it hard to read recipes with other people's column names, I am referring to Column 2, as the <test column>. (Since this is the column you use as the test condition for the if statement). Column 4 will be referred to as the <edit column>, to indicate which column I want to edit.
- Transform on <edit column> : if(cells["<test column>"].value.contains("Ammanati"), row.record.cells["<edit column>"].value[rowIndex] + ";" + row.record.cells["<edit column>"].value[rowIndex+1], value)
- Transform on <test column> and create a placeholder row: if(row.record.cells["<test column>"].value[rowIndex-1].contains("Ammanati"), "place", value)
- Switch to row view.
- Custom text facet on <test column>: value.contains("place"), select true
- Star the placeholder rows.
- Remove the custom facet and facet by star
- All->edit rows->remove matching rows.
- Remove the index column.
Long, detailed explanation:
Subscribe to:
Posts (Atom)