Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Sunday 11:59pm, this week. Make sure to complete your weekly check-in (which can be done by coming to lecture, recitation, lab, or any office hour), as this will count a small number of points towards your lab score.

This week’s agenda: basic string manipulations; practice reading in and summarizing real text data (Shakespeare); practice with iteration; just a little bit of regular expressions.

Some string basics

"I'M NOT ANGRY I SWEAR"         # Convert to lower case
## [1] "I'M NOT ANGRY I SWEAR"
"Mom, I don't want my veggies"  # Convert to upper case
## [1] "Mom, I don't want my veggies"
"Hulk, sMasH"                   # Convert to upper case
## [1] "Hulk, sMasH"
"R2-D2 is in prime condition, a real bargain!" # Convert to lower case
## [1] "R2-D2 is in prime condition, a real bargain!"
presidents = c("Clinton", "Bush", "Reagan", "Carter", "Ford")
phrase = "Give me a break"
ingredients = "chickpeas, tahini, olive oil, garlic, salt"

Shakespeare’s complete works

Project Gutenberg offers over 50,000 free online books, especially old books (classic literature), for which copyright has expired. We’re going to look at the complete works of William Shakespeare, taken from the Project Gutenberg website.

To avoid hitting the Project Gutenberg server over and over again, we’ve grabbed a text file from them that contains the complete works of William Shakespeare and put it on our course website. Visit http://www.stat.cmu.edu/~ryantibs/statcomp-F19/data/shakespeare.txt in your web browser and just skim through this text file a little bit to get a sense of what it contains (a whole lot!).

Reading in text, basic exploratory tasks

Computing word counts

A tiny bit of regular expressions

Where are Shakespeare’s plays, in this massive text?

Extracting and analysing a couple of plays