I had a set of data where I needed to only process sentences that were more than a certain number of words. Fortunately, doing so is relatively easy.
First of all, the words in a sentence can be split up into an array using value.split(" ").
Secondly, the length() function in Open Refine returns the length, or total number of elements in an array. Since each element is a word, put these two together and the number of words will be calculated:
Showing posts with label test conditions for if. Show all posts
Showing posts with label test conditions for if. Show all posts
Monday, July 31, 2017
Tuesday, July 18, 2017
If structure cheat sheet
Updated 8/24/17
If structure cheat sheet
If structure cheat sheet
<operation> can be any function – reprinting a cell
value, concatenating cells values, any string processing function listed in the
wiki, and even a foreach, with, or another if function.
Do <operation> only if test
condition is false
|
if (test condition, value,
<operation>)
|
Do <operation> only if test
condition is true
|
if (test condition,
<operation>, value)
|
Do <operation1> if test
condition is true, otherwise do <operation 2>
|
if (test condition, <operation
1>, <operation 2>)
|
Do a Marcedit Find/Replace on string1 string2 string3 to transform
it into string2; string3; string1
|
if (isNotNull(value.match(/(regex for string1)(regex for string2)(regex
for string3)/)), value.match(/(regex
for string1)(regex for string2)(regex for string3)/)[1] + “;” + value.match(/(regex for string1)(regex for string2)(regex
for string3)/)[2] + “;” + value.match(/(regex for string1)(regex for string2)(regex for string3)/)[0],
value)
|
Test Condition Cheat Sheet
Example data to be used is
auto
autos
auto mechanics
automechanics
auto mechanics and aviation
planes
trains
automobiles
aviation
aviation and automechanics
auto
autos
auto mechanics
automechanics
auto mechanics and aviation
planes
trains
automobiles
aviation
aviation and automechanics
NOTE : Test
conditions can also be used for custom text faceting
Test Condition
|
Code
|
String must exactly match auto
|
value == “auto”
|
String can be auto, autos, auto mechanics, automechanics, or auto mechanics and aviation
|
value.contains("auto")
|
String can be auto mechanics, automechanics, auto mechanics, or auto mechanics and aviation.
|
isNotNull(value.match(/(.*auto.*mechanics.*)/))
|
String can be auto, autos, planes, trains, or automobiles.
|
isNull(value.match(/(.*auto.*mechanics.*)/))
|
String must match auto, autos, or planes. | or(isNotNull(value.match(/(^auto[s]/))), value == "planes") |
String must start with some variation of auto.
|
value.startsWith("auto")
|
Find all of the variations of auto mechanics and aviation
|
and(isNotNull(value.match(/(.*auto\s*mechanics.*)/)), value.contains("aviation"))
|
All strings that begin with "a" |
value.startsWith("a")
|
All strings that end with "s"
|
value.endsWith("s")
|
String must be less than 4 words
|
length (split(value, “ “)) < 4
|
String must contain any substring with variable spacing of “auto mechanics”
and must be less than 4 words
|
and(isNotNull(value.match(/.*auto.*mechanics.*/)),
length(split(value, " ")) < 4)
|
Cell is completely blank
|
isBlank(value)
|
Cell is not blank
|
isNonBlank(value)
|
Test conditions for dates
Test string is a single range of years, such as 1938-1968.
No spaces surrounding the hyphen.
First year must be greater than 1938
|
toNumber(substring(value, 0,4)) > 1938
|
First year must equal 1938
|
toNumber(substring(value, 0,4)) == 1938
|
First year must be less than 1938
|
toNumber(substring(value, 0,4)) < 1938
|
Second year must be less than 1968
|
toNumber(substring(value,5,9)) < 1968
|
Second year must be equal 1968
|
toNumber(substring(value,5,9)) == 1968
|
Second year must be less greater than 1968
|
toNumber(substring(value,5,9)) > 1968
|
Second year is greater than the first
|
toNumber(substring(value,5,9)) > toNumber(substring(value,0,4))
|
Labels:
cheat sheet,
if,
recipes,
test conditions for if
Subscribe to:
Posts (Atom)