First of all, the words in a sentence can be split up into an array using value.split(" ").
Secondly, the length() function in Open Refine returns the length, or total number of elements in an array. Since each element is a word, put these two together and the number of words will be calculated:
Click to enlarge |
How is this useful? Well, the number generated by length() is an actual number instead of a string. That means it can be used for arithmetic booleans.
For example: length(value.split(" ")) < 7 will give you a true if your sentence is 6 words or less, and false if it's 7 or more words.
Other arithmetic booleans are:
length(value.split(" ")) > <some number> - yields true if sentence is more than <some number> of words, false otherwise
length(value.split(" ")) <= <some number> - yields true if sentence is <some number of words> or less, false otherwise
length(value.split(" ")) >= <some number> - yields true if sentence is <some number> of words or more, false otherwise
length(value.split(" ")) == <some number> - only yields true if sentence is exactly <some number> of words
Now, say I only want titles that are 4 words or more. I could make a custom text facet with the function:
length(value.split(" ")) >= 4
Click to enlarge |
All I have to do is click on the "True" facet and I've isolated out the data that is 4 words or more.
As you can see in the screenshot above, the mathematical expression generates true/false values. This means that you can use it as a test condition for an if statement.
Say I only wanted to edit titles that were 4 words or longer, my if function would be:
if(length(value.split(" ")) >= 4, <whatever editing expressions I needed>, value)
No comments:
Post a Comment