Showing posts with label test conditions for if. Show all posts
Showing posts with label test conditions for if. Show all posts

Monday, July 31, 2017

How to calculate the number of words in a sentence and how to use that in facets/the if function

I had a set of data where I needed to only process sentences that were more than a certain number of words. Fortunately, doing so is relatively easy.

First of all, the words in a sentence can be split up into an array using value.split(" ").

Secondly, the length() function in Open Refine returns the length, or total number of elements in an array. Since each element is a word, put these two together and the number of words will be calculated:

Tuesday, July 18, 2017

If structure cheat sheet


Updated 8/24/17

If structure cheat sheet
<operation> can be any function – reprinting a cell value, concatenating cells values, any string processing function listed in the wiki, and even a foreach, with, or another if function.


Do <operation> only if test condition is false
if (test condition, value, <operation>)
Do <operation> only if test condition is true
if (test condition, <operation>, value)
Do <operation1> if test condition is true, otherwise do <operation 2>
if (test condition, <operation 1>, <operation 2>)
Do a Marcedit Find/Replace on string1 string2 string3 to transform it into string2; string3; string1
if (isNotNull(value.match(/(regex for string1)(regex for string2)(regex for string3)/)), value.match(/(regex for string1)(regex for string2)(regex for string3)/)[1] + “;” + value.match(/(regex for string1)(regex for string2)(regex for string3)/)[2] + “;” + value.match(/(regex for string1)(regex for string2)(regex for string3)/)[0], value)

Test Condition Cheat Sheet

Example data to be used is
auto
autos
auto mechanics
automechanics
auto mechanics and aviation
planes
trains
automobiles
aviation
aviation and automechanics

NOTE : Test conditions can also be used for custom text faceting

Test Condition
Code
String must exactly match auto
value == “auto”
String can be auto, autos, auto mechanics, automechanics, or auto mechanics and aviation
value.contains("auto")
String can be auto mechanics, automechanics, auto  mechanics, or auto mechanics and aviation.
isNotNull(value.match(/(.*auto.*mechanics.*)/))

String can be auto, autos, planes, trains, or automobiles.
isNull(value.match(/(.*auto.*mechanics.*)/))
String must match auto, autos, or planes. or(isNotNull(value.match(/(^auto[s]/))), value == "planes")
String must start with some variation of auto.
value.startsWith("auto")
Find all of the variations of auto mechanics and aviation
and(isNotNull(value.match(/(.*auto\s*mechanics.*)/)), value.contains("aviation"))
All strings that begin with "a"
value.startsWith("a")
All strings that end with "s"
value.endsWith("s")
String must be less than 4 words
length (split(value, “ “)) < 4
String must contain any substring with variable spacing of “auto mechanics” and must be less than 4 words
and(isNotNull(value.match(/.*auto.*mechanics.*/)), length(split(value, " ")) < 4)
Cell is completely blank
isBlank(value)
Cell is not blank
isNonBlank(value)

Test conditions for dates
Test string is a single range of years, such as 1938-1968. No spaces surrounding the hyphen.

First year must be greater than 1938
toNumber(substring(value, 0,4)) > 1938
First year must equal 1938
toNumber(substring(value, 0,4)) == 1938
First year must be less than 1938
toNumber(substring(value, 0,4)) < 1938
Second year must be less than 1968
toNumber(substring(value,5,9)) < 1968
Second year must be equal 1968
toNumber(substring(value,5,9)) == 1968
Second year must be less greater than 1968
toNumber(substring(value,5,9)) > 1968
Second year is greater than the first
toNumber(substring(value,5,9)) > toNumber(substring(value,0,4))