Number of characters

To count how many characters in a string, don’t use length(), use nchar()

nchar("coffee")
## [1] 6
nchar("code monkey")
## [1] 11
length("code monkey")
## [1] 1
length(c("coffee", "code monkey"))
## [1] 2

nchar() vectorizes

Can pass a vector of strings to nchar(), and it returns the character counts in each element. This is called vectorization

nchar(c("coffee", "code monkey"))
## [1]  6 11
nchar(c("Spider-Man", "does whatever", "a spider can"))
## [1] 10 13 12

Reminder: vectorization

Some basic examples of vectorization

c(1,2,3) + c(1,2,3)
## [1] 2 4 6
1:10 - 1 # This is an example of recycling
##  [1] 0 1 2 3 4 5 6 7 8 9
1:10 * -1 # So is this
##  [1]  -1  -2  -3  -4  -5  -6  -7  -8  -9 -10
abs(-3:3)
## [1] 3 2 1 0 1 2 3
log(1:5)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
log(exp(1:7)) # Notice two vectorizations happening here
## [1] 1 2 3 4 5 6 7

Getting a substring

Grab a subseqence of characters from a string, called a substring, using substr()

phrase = "Give me a break"
substr(phrase, 1, 4)
## [1] "Give"
substr(phrase, nchar(phrase)-4, nchar(phrase))
## [1] "break"
substr(phrase, nchar(phrase)+1, nchar(phrase)+10)
## [1] ""

substr() vectorizes

Just like nchar() (and many other functions)

presidents = c("Clinton", "Bush", "Reagan", "Carter", "Ford")
substr(presidents, 1, 2) # Grab the first 2 letters from each
## [1] "Cl" "Bu" "Re" "Ca" "Fo"
substr(presidents, 1:5, 1:5) # Grab the first, 2nd, 3rd, etc.
## [1] "C" "u" "a" "t" ""
substr(presidents, 1, 1:5) # Grab the first, first 2, first 3, etc.
## [1] "C"    "Bu"   "Rea"  "Cart" "Ford"
substr(presidents, nchar(presidents)-1, nchar(presidents)) # Grab the last 2 letters from each
## [1] "on" "sh" "an" "er" "rd"

Replacements

To replace a character, or a substring, use substr()

phrase
## [1] "Give me a break"
substr(phrase, 1, 1) = "L"
phrase # "G" changed to "L"
## [1] "Live me a break"
substr(phrase, 1000, 1001) = "R"
phrase # Nothing happened
## [1] "Live me a break"
substr(phrase, 1, 4) = "Show"
phrase # "Live" changed to "Show"
## [1] "Show me a break"

Vectorized replacements

Another example of substr() vectorizing

presidents
## [1] "Clinton" "Bush"    "Reagan"  "Carter"  "Ford"
first.letters = substr(presidents, 1, 1)
first.letters.scrambled = sample(first.letters)
substr(presidents, 1, 1) = first.letters.scrambled;
presidents
## [1] "Flinton" "Cush"    "Ceagan"  "Barter"  "Rord"

Some replacement quirks

You can only replace exact many letters as you specify

phrase
## [1] "Show me a break"
substr(phrase, 1, 4) = "Provide"
phrase # Only replaced the first 4 letters
## [1] "Prov me a break"
substr(phrase, nchar(phrase)-4, nchar(phrase)) = "cat"
phrase # Only replaced the first 3 letters, in the last word
## [1] "Prov me a catak"