class: middle, right, title-slide .title[ # Tidy data wrangling ] .author[ ### Athanasia Monika Mowinckel ] --- layout: true <div class="my-sidebar"></div> --- class: dark, center background-image: url(img/ggplot2.png), url(img/dplyr.png), url(img/pipe.png) background-size: 15% background-position: 32% 65%, 50% 65%, 68% 65% # Part 1 ## Tidy data wrangling --- class: middle, inverse ## Tidy data wrangling <ul style="color: white;"> - **plotting data with [ggplot2](https://ggplot2.tidyverse.org/) (~25 min)** - **sub-setting data with [dplyr](https://dplyr.tidyverse.org/) (~25 min)** - **chaining commands with the pipe `|>` (~10 min)** - **adding and altering variables with [dplyr](https://dplyr.tidyverse.org/) (~25 min)** --- background-image: url(img/ggplot2.png) background-size: 8% background-position: 95% 5% name: ggplot # ggplot2 ## grammar of graphics --- background-image: url(img/ggplot2.png) background-size: 8% background-position: 95% 5% ## ggplot2 setting .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_length_mm)) + geom_histogram( * fill = "forestgreen" ) ``` ] .pull-right[ ![](001-tidy-wrangling_files/figure-html/penguin-plot2-rend-1.png)<!-- --> ] --- background-image: url(img/ggplot2.png) background-size: 8% background-position: 95% 5% ## ggplot2 mapping .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_length_mm, * fill = species)) + geom_histogram( ) ``` ] .pull-right[ ![](001-tidy-wrangling_files/figure-html/penguin-plot3-rend-1.png)<!-- --> ] --- class: inverse, middle, center ## Go to RStudio ### live demo --- class: inverse, middle, center ## Go to plotting exercises ### `learnr::run_tutorial("001-plotting", "tidyquintro")`
−
+
08
:
00
--- class: dark, middle, center background-image: url(img/dplyr.png) background-size: 15% background-position: 50% 95% name: dplyr-subset # dplyr ## data subsetting --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% # dplyr ## grammar of data manipulation provides a consistent set of verbs that help you solve the most common data manipulation challenges: <div style="background-color: #94c11faa;"> `select()` picks variables based on their names. `filter()` picks cases based on their values. </div> `mutate()` - adds or alters variables that are functions of existing variables `summarise()` reduces multiple values down to a single summary. `arrange()` changes the ordering of the rows. --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% name: filter # dplyr .pull-left[ ### `filter()` - subsetting rows Reducing the number of rows in a data sat based on some logic. - `filter()` evaluates a statement to be logical (`TRUE` or `FALSE`) ] -- .pull-right[ ![](gifs/filtering.gif)<!-- --> ] --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% ## dplyr - comparison to base-R #### tidy ```r filter(penguins, bill_length_mm > 40) ``` #### base ```r penguins[penguins$bill_length_mm > 40, ] # or subset(penguins, bill_length_mm > 40) ``` <div style="font-size: 15px;"> <a href="https://dplyr.tidyverse.org/articles/base.html">https://dplyr.tidyverse.org/articles/base.html</a> </div> --- class: inverse, middle, center ## Go to RStudio ### live demo --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% name: select # dplyr .pull-left[ ### `select()` - reduce columns Reducing the number of columns (or rearranging columns) Can be used with column names, index integer, or tidyselect-functions tidy-select helpers - `ends_with("string")` - column names ending with "string" - `starts_with("string")` - column names starting with "string" - `contains("string")` - column names containing "string" ] -- .pull-right[ ![](gifs/selecting.gif)<!-- --> ] --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% ## dplyr - comparison to base-R #### tidy ```r select(penguins, species, island, ends_with("mm")) ``` #### base ```r penguins[c(1, 2, grep("mm$", names(penguins)))] # or subset(penguins, select = c("species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm")) ``` <div style="font-size: 15px;"> <a href="https://dplyr.tidyverse.org/articles/base.html">https://dplyr.tidyverse.org/articles/base.html</a> </div> --- class: inverse, middle, center ## Go to RStudio ### live demo --- class: inverse, middle, center ## Go to subsetting exercises ### `learnr::run_tutorial("002-subsetting", "tidyquintro")`
−
+
08
:
00
--- class: dark, center background-image: url(img/pipe.png) background-size: 15% background-position: 50% 65% name: pipe ## magrittr ### the pipe - chaining commands --- background-image: url(img/pipe.png) background-size: 8% background-position: 95% 5% ### the pipe - chaining commands - Common to many programming languages - sending the output from one function, straight into another, without saving the intermediary - Only really work when input is the _first_ command to a function - This is not the case for most base-R functions, but is _always_ the case with tidyverse functions - The common used pipe in R, `|>`, originally comes from the magrittr package, but also comes with dplyr ??? arguably, in tidyverse the chaining of commands is one of the things that makes it most powerful. Chaining commands is a common programming concept, where you send the output of one command directly into another, without saving the intermediary. This saves you from overcrowding your workspace with lots of new objects you will never use. It is commonly referred to as a "pipe" and in R the common pipe is |> --- background-image: url(img/pipe.png) background-size: 8% background-position: 95% 5% ### Use .pull-left[ ```r # standard select(penguins, species, island, ends_with("mm")) ``` ```r # piped penguins |> select(species, island, ends_with("mm")) ``` ] .pull-right[ ``` ## # A tibble: 344 × 5 ## species island bill_leng…¹ bill_…² flipp…³ ## <fct> <fct> <dbl> <dbl> <int> ## 1 Adelie Torgersen 39.1 18.7 181 ## 2 Adelie Torgersen 39.5 17.4 186 ## 3 Adelie Torgersen 40.3 18 195 ## 4 Adelie Torgersen NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 ## 6 Adelie Torgersen 39.3 20.6 190 ## 7 Adelie Torgersen 38.9 17.8 181 ## 8 Adelie Torgersen 39.2 19.6 195 ## 9 Adelie Torgersen 34.1 18.1 193 ## 10 Adelie Torgersen 42 20.2 190 ## # … with 334 more rows, and abbreviated ## # variable names ¹bill_length_mm, ## # ²bill_depth_mm, ³flipper_length_mm ``` ] --- class: inverse, middle, center ## Go to RStudio ### live demo --- class: inverse, middle, center ## Go to chaining exercises ### `learnr::run_tutorial("003-chaining", "tidyquintro")`
−
+
08
:
00
--- class: dark, center background-image: url(img/dplyr.png) background-size: 15% background-position: 50% 65% name: dplyr-mutate ## dplyr ### data wrangling / manipulation --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% # dplyr ## grammar of data manipulation provides a consistent set of verbs that help you solve the most common data manipulation challenges: `select()` picks variables based on their names. `filter()` picks cases based on their values. <div style="background-color: #94c11faa;"> `mutate()` - adds or alters variables that are functions of existing variables </div> `summarise()` reduces multiple values down to a single summary. `arrange()` changes the ordering of the rows. --- background-image: url(img/dplyr.png) background-size: 8% background-position: 95% 5% ## dplyr - comparison to base-R #### tidy ```r penguins |> mutate( new_column = 1, bill_ld_ratio = bill_length_mm/bill_depth_mm ) ``` #### base ```r penguins$new_column <- 1 penguins$bill_ld_ratio <- penguins$bill_length_mm/penguins$bill_depth_mm ``` <div style="font-size: 15px;"> <a href="https://dplyr.tidyverse.org/articles/base.html">https://dplyr.tidyverse.org/articles/base.html</a> </div> --- class: inverse, middle, center ## Go to RStudio ### live demo --- class: inverse, middle, center ## Go to mutating exercises ### `learnr::run_tutorial("004-mutating", "tidyquintro")`
−
+
08
:
00
--- class: dark, middle, center # End of part 1 ## 30 minute lunch break
−
+
30
:
00