Quantifiers

Quantifiers in a regex allow us to express: how often?

Examples

grep("Oh+", c("O my gosh!", "Oh wow!", "Ohhhhh no!"), value=TRUE)
## [1] "Oh wow!"    "Ohhhhh no!"
grep("Oh*", c("O my gosh!", "Oh wow!", "Ohhhhh no!"), value=TRUE)
## [1] "O my gosh!" "Oh wow!"    "Ohhhhh no!"
grep("colou?r", c("Americans love 3 colors: red, white, and blue",
                  "Brits love 3 colours: red, white, and dark blue"), val=TRUE)
## [1] "Americans love 3 colors: red, white, and blue"  
## [2] "Brits love 3 colours: red, white, and dark blue"
grep("Bonds?\\?", c("Have you seen Barry Bonds? That guy can play",
                   "Have you seen James Bond? That guy is cool",
                   "Bond v United States, 529 U.S. 334 (2000)"), value=TRUE)
## [1] "Have you seen Barry Bonds? That guy can play"
## [2] "Have you seen James Bond? That guy is cool"

More examples

grep("10{1,2} ", c("10 dollars", "100 dollars", "1000 dollars"), value=TRUE)
## [1] "10 dollars"  "100 dollars"
grep("10{1,2}", c("10 dollars", "100 dollars", "1000 dollars"), value=TRUE)
## [1] "10 dollars"   "100 dollars"  "1000 dollars"
grep("10{2,}", c("10 dollars", "100 dollars", "1000 dollars"), value=TRUE)
## [1] "100 dollars"  "1000 dollars"
grep("[0-9]{3}-[0-9]{4}", c("My office number is 268-1884",
                            "Bryan's cell phone is 353-1890",
                            "The police's number is 911"), value=TRUE)
## [1] "My office number is 268-1884"   "Bryan's cell phone is 353-1890"

Scope

What exactly does a quantifier apply to? This is called its scope

grep("ha{2,}", c("haaa", "haha"), value=TRUE)
## [1] "haaa"
grep("(ha){2,}", c("haaa", "haha"), value=TRUE)
## [1] "haha"
grep("[0-9][[:alpha:]]{2}", c("2L2Q", "21YO"), value=TRUE)
## [1] "21YO"
grep("([0-9][[:alpha:]]){2}", c("2L2Q", "21YO"), value=TRUE)
## [1] "2L2Q"

More examples

grep("[0-9]{3}(-[0-9]{4})?", c("My office number is 268-1884",
                             "The police's number is 911",
                             "Wait, 911- isn't that the police?"), value=TRUE)
## [1] "My office number is 268-1884"      "The police's number is 911"       
## [3] "Wait, 911- isn't that the police?"
grep("[0-9]{3}-([0-9]{4})?", c("My office number is 268-1884",
                             "The police's number is 911",
                             "Wait, 911- isn't that the police?"), value=TRUE)
## [1] "My office number is 268-1884"      "Wait, 911- isn't that the police?"
grep("ton.*", c("ton", "tone", "ton ", "son"), value=TRUE)
## [1] "ton"  "tone" "ton "
grep("(ton.)*", c("ton", "tone", "ton ", "son"), value=TRUE)
## [1] "ton"  "tone" "ton " "son"

Anchoring

grep("^Win", c("Winning is my favorite pasttime",
               "We love statistics", "I hate Windows"), value=TRUE)
## [1] "Winning is my favorite pasttime"
grep("[a-z]$", c("I like lasers", "I like LASERS"), value=TRUE)
## [1] "I like lasers"

More examples

grep("Do.*\\?$", c("Do you like cherries?", "Don't you like terriers?",
                   "Don't you know that terriers like cherries"), value=TRUE)
## [1] "Do you like cherries?"    "Don't you like terriers?"
grep("^<.+>|<.+>$", c("<HTML> hi", "bye </HTML>", "a <b> c"), value=TRUE)
## [1] "<HTML> hi"   "bye </HTML>"