Exploring data

class: center, middle, inverse, title-slide

# Exploring data
## Into the tidyverse
### June 7th, 2022

---

## Data Science workflow

According to [Hadley Wickham](http://hadley.nz/) in [`R` for Data Science](https://r4ds.had.co.nz/):

.center[![](https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png)]

__First two weeks__: data wrangling and visualization

--
Aspects of data __wrangling__:

--
- __import__: reading in data (e.g. `read_csv()`)

--
- __tidy__: rows = observations, columns = variables (i.e. __tabular__ data)

--
- __transform__: filter observations, create new variables, summarize, etc.

---

## What is Exploratory Data Analysis (EDA)?

--
_(broadly speaking)_ EDA = questions about data + wrangling + visualization

--
[`R` for Data Science](https://r4ds.had.co.nz/): _"EDA is a state of mind"_, an iterative cycle:

- generate questions

- answer via transformations and visualizations

--
Example of questions?

--
- What type of __variation__ do the variables display?

- What type of __relationships__ exist between variables?

__Goal__: develop understanding and become familiar with your data

EDA is __NOT__ a replacement for statistical inference and learning

EDA is an __important__ and __necessary__ step to build intuition

--
Now for an example...

---

## Exploring MLB batting statistics

__Import__ `Batting` table of historical MLB statistics from the [`Lahman` package](https://cran.r-project.org/web/packages/Lahman/index.html), explore using the [`tidyverse`](https://www.tidyverse.org/)

```r
library(tidyverse) # Load the tidyverse suite of packages
library(Lahman) # Load the Lahman package to access its datasets
Batting <- as_tibble(Batting) # Initialize the Batting dataset
```

--
Basic info about the `Batting` dataset:

```r
dim(Batting) # displays same info as c(nrow(Batting), ncol(Batting))
```

```
## [1] 110495     22
```

```r
class(Batting)
```

```
## [1] "tbl_df"     "tbl"        "data.frame"
```

--
`tbl` (pronounced `tibble`) is the `tidyverse` way of storing tabular data, like a spreadsheet or `data.frame`

---

__Always look at your data__: view the first 6 (by default) rows with `head()`

```r
head(Batting) # Try just typing Batting into your console, what happens?
```

```
## # A tibble: 6 x 22
##   playerID  yearID stint teamID lgID      G    AB     R     H   X2B   X3B    HR   RBI
##   <chr>      <int> <int> <fct>  <fct> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 abercda01   1871     1 TRO    NA        1     4     0     0     0     0     0     0
## 2 addybo01    1871     1 RC1    NA       25   118    30    32     6     0     0    13
## 3 allisar01   1871     1 CL1    NA       29   137    28    40     4     5     0    19
## 4 allisdo01   1871     1 WS3    NA       27   133    28    44    10     2     2    27
## 5 ansonca01   1871     1 RC1    NA       25   120    29    39    11     3     0    16
## 6 armstbo01   1871     1 FW1    NA       12    49     9    11     2     1     0     5
## # … with 9 more variables: SB <int>, CS <int>, BB <int>, SO <int>, IBB <int>,
## #   HBP <int>, SH <int>, SF <int>, GIDP <int>
```

Is our `Batting` dataset __tidy__?
--

- Each row = a player's season stint with a team (i.e. players can play for multiple teams in year)

- Each column = different measurement or recording about the player-team-season observation (can print out column names directly with `colnames(Batting)` or `names(Batting)`)

--
__Can we explore how baseball has changed over time with `Batting`?__

---

## Let the data wrangling begin...

__Summarize__ _continuous_ (e.g. `yearID`, `AB`) and _categorical_ (e.g. `teamID`, `lgID`) variables in different ways

Compute __summary statistics__ for _continuous_ variables with the `summary()` function:

```r
summary(Batting$yearID)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1871    1938    1977    1968    2002    2021
```

Compute __counts__ of _categorical_ variables with `table()` function:

```r
table("Leagues" = Batting$lgID) # be careful it ignores NA values!
```

```
## Leagues
##    AA    AL    FL    NA    NL    PL    UA 
##  1893 50965   472   737 55945   149   334
```

_How do we remove the other leagues?_

---

[`dplyr`](https://dplyr.tidyverse.org/) is a package within the `tidyverse` with functions for data wrangling

_"Grammar of data manipulation"_: `dplyr` functions are __verbs__, datasets are __nouns__

- __We can [`filter()`](https://dplyr.tidyverse.org/reference/filter.html) our dataset to choose observations meeting conditions__

```r
mlb_batting <- filter(Batting, lgID %in% c("AL", "NL"))
nrow(Batting) - nrow(mlb_batting) # Difference in rows
```

```
## [1] 3585
```

- __We can [`select()`](https://dplyr.tidyverse.org/reference/select.html) variables of interest__

```r
sel_batting <- select(Batting, yearID, lgID, G, AB, R, H, HR, BB, SO)
head(sel_batting, n = 3)
```

```
## # A tibble: 3 x 9
##   yearID lgID      G    AB     R     H    HR    BB    SO
##    <int> <fct> <int> <int> <int> <int> <int> <int> <int>
## 1   1871 NA        1     4     0     0     0     0     0
## 2   1871 NA       25   118    30    32     0     4     0
## 3   1871 NA       29   137    28    40     0     2     5
```

---

- __We can [`arrange()`](https://dplyr.tidyverse.org/reference/arrange.html) our dataset to sort observations by variables__

```r
hr_batting <- arrange(Batting, desc(HR)) # use desc() for descending order
head(hr_batting, n = 3)
```

```
## # A tibble: 3 x 22
##   playerID  yearID stint teamID lgID      G    AB     R     H   X2B   X3B    HR   RBI
##   <chr>      <int> <int> <fct>  <fct> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 bondsba01   2001     1 SFN    NL      153   476   129   156    32     2    73   137
## 2 mcgwima01   1998     1 SLN    NL      155   509   130   152    21     0    70   147
## 3 sosasa01    1998     1 CHN    NL      159   643   134   198    20     0    66   158
## # … with 9 more variables: SB <int>, CS <int>, BB <int>, SO <int>, IBB <int>,
## #   HBP <int>, SH <int>, SF <int>, GIDP <int>
```

- __We can [`summarize()`](https://dplyr.tidyverse.org/reference/summarise.html) our dataset to one row based on functions of variables__

```r
summarize(Batting, max(stint), median(AB))
```

```
## # A tibble: 1 x 2
##   `max(stint)` `median(AB)`
##          <int>        <int>
## 1            5           45
```

---

- __We can [`mutate()`](https://dplyr.tidyverse.org/reference/mutate.html) our dataset to create new variables__ (mutate is a weird name...)

```r
new_batting <- mutate(Batting, batting_avg = H / AB, so_to_bb = SO / BB)
head(new_batting, n = 1)
```

```
## # A tibble: 1 x 24
##   playerID  yearID stint teamID lgID      G    AB     R     H   X2B   X3B    HR   RBI
##   <chr>      <int> <int> <fct>  <fct> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 abercda01   1871     1 TRO    NA        1     4     0     0     0     0     0     0
## # … with 11 more variables: SB <int>, CS <int>, BB <int>, SO <int>, IBB <int>,
## #   HBP <int>, SH <int>, SF <int>, GIDP <int>, batting_avg <dbl>, so_to_bb <dbl>
```

How do we perform several of these actions?

```r
head(arrange(select(mutate(Batting, BA = H / AB), playerID, BA), desc(BA)), n = 1)
```

```
## # A tibble: 1 x 2
##   playerID    BA
##   <chr>    <dbl>
## 1 snowch01     1
```

That's awfully annoying to do, and also difficult to read...

---

## Enter the pipeline

The `%>%` (_pipe_) operator is used in the `tidyverse` (from [`magrittr`](https://magrittr.tidyverse.org/articles/magrittr.html)) to chain commands together

`%>%` directs the __data analyis pipeline__: output of one function pipes into input of the next function

```r
Batting %>%
  filter(lgID %in% c("AL", "NL"),
         AB > 300) %>%
  mutate(batting_avg = H / AB) %>%
  arrange(desc(batting_avg)) %>%
  select(playerID, yearID, batting_avg) %>%
  head(n = 5)
```

```
## # A tibble: 5 x 3
##   playerID  yearID batting_avg
##   <chr>      <int>       <dbl>
## 1 duffyhu01   1894       0.440
## 2 barnero01   1876       0.429
## 3 lajoina01   1901       0.426
## 4 keelewi01   1897       0.424
## 5 hornsro01   1924       0.424
```

---

## More pipeline actions!

Instead of `head()`, __we can [`slice()`](https://dplyr.tidyverse.org/reference/slice.html) our dataset to choose the observations based on the position__

```r
Batting %>%
  filter(lgID %in% c("AL", "NL"),
         AB > 300) %>%
  mutate(so_to_bb = SO / BB) %>%
  arrange(so_to_bb) %>%
  select(playerID, yearID, so_to_bb) %>%
  slice(c(1, 2, 10, 100))
```

```
## # A tibble: 4 x 3
##   playerID  yearID so_to_bb
##   <chr>      <int>    <dbl>
## 1 roweja01    1882   0     
## 2 seweljo01   1932   0.0536
## 3 holloch01   1922   0.0862
## 4 collied01   1918   0.178
```

---

## Grouped operations

__We [`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) to split our dataset into groups based on a variable's values__

```r
Batting %>%
  filter(lgID %in% c("AL", "NL")) %>%
  group_by(yearID) %>%
  summarize(hr = sum(HR), so = sum(SO), bb = sum(BB)) %>%
  arrange(desc(hr)) %>%
  slice(1:5)
```

```
## # A tibble: 5 x 4
##   yearID    hr    so    bb
##    <int> <int> <int> <int>
## 1   2019  6776 42823 15895
## 2   2017  6105 40104 15829
## 3   2021  5944 42145 15794
## 4   2000  5693 31356 18237
## 5   2016  5610 38982 15088
```

`group_by()` is only useful in a pipeline (e.g. with `summarize()`), and pay attention to its behavior

`ungroup()` can solve your problems afterwards

---

## Putting it all together...

We'll create a __tidy__ dataset where each row = a year with the following variables:

- total HRs (homeruns), SOs (strikeouts), and BBs (walks)
- year's BA = total H / total AB
- only want AL and NL leagues

```r
year_batting_summary <- Batting %>%
  filter(lgID %in% c("AL", "NL")) %>%
  group_by(yearID) %>%
  summarize(total_hits = sum(H, na.rm = TRUE),
            total_hrs = sum(HR, na.rm = TRUE),
            total_sos = sum(SO, na.rm = TRUE),
            total_walks = sum(BB, na.rm = TRUE),
            total_atbats = sum(AB, na.rm = TRUE)) %>%
  mutate(batting_avg = total_hits / total_atbats)
head(year_batting_summary, n = 2)
```

```
## # A tibble: 2 x 7
##   yearID total_hits total_hrs total_sos total_walks total_atbats batting_avg
##    <int>      <int>     <int>     <int>       <int>        <int>       <dbl>
## 1   1876       5338        40       589         336        20121       0.265
## 2   1877       3705        24       726         345        13667       0.271
```

---

Top three years with the most HRs?

```r
year_batting_summary %>%
  arrange(desc(total_hrs)) %>%
  slice(1:3)
```

```
## # A tibble: 3 x 7
##   yearID total_hits total_hrs total_sos total_walks total_atbats batting_avg
##    <int>      <int>     <int>     <int>       <int>        <int>       <dbl>
## 1   2019      42039      6776     42823       15895       166651       0.252
## 2   2017      42215      6105     40104       15829       165567       0.255
## 3   2021      39484      5944     42145       15794       161941       0.244
```

--
Top three years with highest batting average?

```r
year_batting_summary %>%
  arrange(desc(batting_avg)) %>%
  slice(1:3)
```

```
## # A tibble: 3 x 7
##   yearID total_hits total_hrs total_sos total_walks total_atbats batting_avg
##    <int>      <int>     <int>     <int>       <int>        <int>       <dbl>
## 1   1894      17809       629      3333        5870        57577       0.309
## 2   1895      16827       488      3621        5120        56788       0.296
## 3   1930      25597      1565      7934        7654        86571       0.296
```

---

Best and worst strikeout to walk ratios?
--

```r
year_batting_summary %>%
  mutate(so_to_bb = total_sos / total_walks) %>%
  arrange(so_to_bb) %>%
  slice(c(1, n()))
```

```
## # A tibble: 2 x 8
##   yearID total_hits total_hrs total_sos total_walks total_atbats batting_avg so_to_bb
##    <int>      <int>     <int>     <int>       <int>        <int>       <dbl>    <dbl>
## 1   1893      15913       460      3341        6143        56898       0.280    0.544
## 2   1879       6171        58      1843         508        24155       0.255    3.63
```

--
_We can make better looking tables..._

--
__[`rename()`](https://dplyr.tidyverse.org/reference/select.html) variables in our dataset__

```r
year_batting_summary %>%
  select(yearID, batting_avg) %>%
  rename(Year = yearID, `Batting AVG` = batting_avg) %>%
  slice(c(1, n()))
```

```
## # A tibble: 2 x 2
##    Year `Batting AVG`
##   <int>         <dbl>
## 1  1876         0.265
## 2  2021         0.244
```

---

## Grammar of tables preview

We can go one step further - __and use the new [`gt`](https://gt.rstudio.com/articles/intro-creating-gt-tables.html) package__ to create a nice-looking table for presentation

.pull-left[

```r
library(gt)
year_batting_summary %>%
  select(yearID, batting_avg) %>%
  rename(Year = yearID, 
         `Batting AVG` = batting_avg) %>%
  arrange(desc(`Batting AVG`)) %>%
  slice(c(1:3, (n()-2):n())) %>%
  gt() %>%
  tab_header(
    title = "Best / worst MLB Seasons by AVG",
    subtitle = "Top / bottom three are presented"
  )
```
]

.pull-right[

#kxlzlbldlv .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#kxlzlbldlv .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kxlzlbldlv .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#kxlzlbldlv .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 4px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#kxlzlbldlv .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kxlzlbldlv .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#kxlzlbldlv .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#kxlzlbldlv .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#kxlzlbldlv .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#kxlzlbldlv .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#kxlzlbldlv .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#kxlzlbldlv .gt_group_heading {
  padding: 8px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

#kxlzlbldlv .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#kxlzlbldlv .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#kxlzlbldlv .gt_from_md > :first-child {
  margin-top: 0;
}

#kxlzlbldlv .gt_from_md > :last-child {
  margin-bottom: 0;
}

#kxlzlbldlv .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#kxlzlbldlv .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 12px;
}

#kxlzlbldlv .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kxlzlbldlv .gt_first_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
}

#kxlzlbldlv .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#kxlzlbldlv .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#kxlzlbldlv .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#kxlzlbldlv .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kxlzlbldlv .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding: 4px;
}

#kxlzlbldlv .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#kxlzlbldlv .gt_sourcenote {
  font-size: 90%;
  padding: 4px;
}

#kxlzlbldlv .gt_left {
  text-align: left;
}

#kxlzlbldlv .gt_center {
  text-align: center;
}

#kxlzlbldlv .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#kxlzlbldlv .gt_font_normal {
  font-weight: normal;
}

#kxlzlbldlv .gt_font_bold {
  font-weight: bold;
}

#kxlzlbldlv .gt_font_italic {
  font-style: italic;
}

#kxlzlbldlv .gt_super {
  font-size: 65%;
}

#kxlzlbldlv .gt_footnote_marks {
  font-style: italic;
  font-size: 65%;
}
</style>
<div id="kxlzlbldlv" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="2" class="gt_heading gt_title gt_font_normal" style>Best / worst MLB Seasons by AVG</th>
    </tr>
    <tr>
      <th colspan="2" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style>Top / bottom three are presented</th>
    </tr>
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1">Year</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">Batting AVG</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr>
      <td class="gt_row gt_center">1894</td>
      <td class="gt_row gt_right">0.3093075</td>
    </tr>
    <tr>
      <td class="gt_row gt_center">1895</td>
      <td class="gt_row gt_right">0.2963126</td>
    </tr>
    <tr>
      <td class="gt_row gt_center">1930</td>
      <td class="gt_row gt_right">0.2956764</td>
    </tr>
    <tr>
      <td class="gt_row gt_center">1908</td>
      <td class="gt_row gt_right">0.2389593</td>
    </tr>
    <tr>
      <td class="gt_row gt_center">1888</td>
      <td class="gt_row gt_right">0.2387601</td>
    </tr>
    <tr>
      <td class="gt_row gt_center">1968</td>
      <td class="gt_row gt_right">0.2366924</td>
    </tr>
  </tbody>
  
  
</table></div>
]

_Note the `gt` display is different in these slides due to the [`xaringan`](https://github.com/yihui/xaringan) package formatting_

--
__Enough with tables!__

---

## Data visualization

.pull-left[

_"The simple graph has brought more information to the data analyst’s mind than any other device."_ — Tukey

- __TOMORROW__: the __grammar of graphics__

- Use [`ggplot2`](https://ggplot2.tidyverse.org/) to visually explore our data

- More intuitive than base `R` plotting!

- Will walkthrough different types of visualizations for 1D, 2D, continuous, categorical, facetting, etc.

- `tidyverse` verbs and `%>%` leads to natural pipeline for EDA
]

.pull-right[

]