Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Thursday 10pm, this week.

This week’s agenda: basic string manipulations; more vectorization; practice reading in and summarizing real text data (Shakespeare); just a little bit of regular expressions.

Some string basics

"I'M NOT ANGRY I SWEAR"         # Convert to lower case
## [1] "I'M NOT ANGRY I SWEAR"
"Mom, I don't want my veggies"  # Convert to upper case
## [1] "Mom, I don't want my veggies"
"Hulk, sMasH"                   # Convert to upper case
## [1] "Hulk, sMasH"
"R2-D2 is in prime condition, a real bargain!" # Convert to lower case
## [1] "R2-D2 is in prime condition, a real bargain!"
presidents = c("Clinton", "Bush", "Reagan", "Carter", "Ford")
phrase = "Give me a break"
ingredients = "chickpeas, tahini, olive oil, garlic, salt"

Shakespeare’s complete works

Project Gutenberg offers over 50,000 free online books, especially old books (classic literature), for which copyright has expired. We’re going to look at the complete works of William Shakespeare, taken from the Project Gutenberg website.

To avoid hitting the Project Gutenberg server over and over again, we’ve grabbed a text file from them that contains the complete works of William Shakespeare and put it on our course website. Visit http://www.stat.cmu.edu/~ryantibs/statcomp-S18/data/shakespeare.txt in your web browser and just skim through this text file a little bit to get a sense of what it contains (a whole lot!).

Reading in text, basic exploratory tasks

Computing word counts

A tiny bit of regular expressions