Part II: Organising your files and workflow

class: middle, right, title-slide

# Part II: Organising your files and workflow
## - R project management series -
### Athanasia Monika Mowinckel
### 03.03.2022

---

layout: true

---

# Part I- recap
## Why do we care about project management (in R)?

.pull-left[
**Portability**  
_The ability to move the project without breaking code or needing adapting_
- you will change computers
- you will reorganise your file structure
- you will share your code with others
]

.pull-right[
**Reproducibility**  
_The ability to rerun the entire process from scratch_
- not just for reviews
- not just for best-practice science
- also for future (or even present) you
- and for your collaborators/helpers
]

---

# Part I- recap
## Project workflows

- All necessary files contained in the project and referenced relatively

- All necessary outputs are created by code in the project

- All code can be run in fresh sessions and produce the same output

- Does not force other users to alter their own work setup

---

## Portability - the `here`-package
### example My talks project

**Project root**

```r
library(here)
```

```
## here() starts at /Users/athanasm/workspace/r-stuff/talks
```

```r
here()
```

```
## [1] "/Users/athanasm/workspace/r-stuff/talks"
```

**Build path to the slide file**

```r
here("slides/2022.02.03-rproj/index.Rmd")
```

```
## [1] "/Users/athanasm/workspace/r-stuff/talks/slides/2022.02.03-rproj/index.Rmd"
```

**List all files in project root**

```r
list.files(here())
```

```
##  [1] "_footer.html"    "_site.yml"       "drmowinckel.css" "index.html"     
##  [5] "index.Rmd"       "LICENSE"         "search.json"     "site_libs"      
##  [9] "slides"          "talks.Rproj"
```

---
class: dark, middle

.center[
# Series talks
]

&nbsp; _Part I_   &nbsp; RStudio projects  
&nbsp; _Part II_  &nbsp; Organising your files and workflow  
&nbsp; _Part III_ &nbsp; Package / Library management  
&nbsp; _Part IV_  &nbsp; git & GitHub crash course!

---

## Folder structure

.pull-left[
- **data**
  - all raw data files, organised in meaningful ways
  - never, ever write _back_ to this folder, read only
  - if using git, never commit to history, place in `.gitignore`
- **results**
  - write all analysis etc. results to
  - treat as disposable, can be overwritten
  - may also include figures etc if wanted
]

.pull-right[
- **docs**
  - documentation
  - Rmarkdown files
- **R (advanced)**
  - if you write functions that are used in several places
  - this is the standard R folder for keeping these files that might be called with `source()` calls
- **scripts/analysis**
  - files with full analysis pipelines
  - might have `source` calls to files in R
]

---

## Files

.pull-left[
- **README.Rmd/README.md**
  - markdown file describing the project content and intent
  - maybe also explains which files to look in for what
  - ideal if storing on GitHub
  - `usethis::use_readme_rmd()`
- **DESCRIPTION**
  - R specific file with information about the project
  - can be created with `lcbcr::create_paper_project()` or
  with the project wizard from the [lcbcr-package](https://github.com/LCBC-UiO/lcbcr)
]

.pull-right[
**if you plan on putting it online (GitHub or equivalent)**
- **LICENCE**
  - dictates how project content can be reused
  - ask me at need, or check out [Choose a license](https://choosealicense.com/) 
- **CITATION**
  - gives users/readers a clear instruction on how to cite if they use the project content
  - ask me at need, or read on [GitHub docs](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files)
]

---

## File naming

.pull-left[
Organising files in `data/`, `results/`, `docs/`, and `scripts/` require some ideas of how to name files for:
- easy machine reading
- easy human reading
- easy understanding of file content
- choosing the correct type of file to store
]

.pull-right[
If you are using the `R/` folder to store R-functions, these might need somewhat different naming conventions than the other folders, as these are functions you can use across the other files.

Here, naming should be particularly thought in terms of **content** rather than structural organisation.
]

---

## File naming - Machine reading

.pull-left[
Machines are clever, but extremely pedantic.

Be consistent, be meticulous.

Some machines are more clever than others, so name files in a way that the "dumbest" of them can deal with.
]

.pull-right[
- **don't use white space in file names**
  - decide on a separator and use consistently
  - recommend the dash `-`
- **use small case letters** 
  - certain machines care about capitalization
- **use numbers smartly** 
  - numbers are awesome to use and can help organise files meaningfully
  - but needs some thinking about before implementing
]

---

## File naming - Machine reading

.pull-left[
- Machines list files in numeric-alphabetic order.
- they'll first list files starting with numbers (ascdendingly)
- then in alphabetic order.

```
1_file.txt
2_file.txt
file_one.txt
file_three.txt
file_two.txt
```
]

.pull-right[
- But they wont understand the difference between 1 and 10 when sorting

```
10_file.txt
1_file.txt
2_file.txt
file_1.txt
file_10.txt
file_2.txt
```
]

---

## File naming - Machine reading

.pull-left[
- 'zero-padding' is a way of preserving file order

```
01_file.txt
02_file.txt
10_file.txt
file_01.txt
file_02.txt
file_10.txt
```
]

.pull-right[
- using dates in file names may also ensure decent organisation
- but be consistent. Recommend using YYYY-MM-DD formatting

```
13-11-21_initial-submission-results.txt
22-01-03_revised-results.txt
2022-02-28_results.txt
```

vs.

```
2021-11-13_initial-submission-results.txt
2022-02-28_results.txt
2022-03-01_revised-results.txt
```
]

---

## File naming - Machine reading

- Consider using different space separators for different parts of the file name
- This way you can use the file name it self, programatically, if needed

```
2021-11-13_initial_submission_results.txt
2022-02-28_results.txt
2022-03-01_revised_results.txt
```

```r
file_names <- c(
  "2021-11-13_initial_submission_results.txt",
  "2022-02-28_results.txt",
  "2022-03-01_revised_results.txt")

stringr::str_split(file_names, "_")
```

```
## [[1]]
## [1] "2021-11-13"  "initial"     "submission"  "results.txt"
## 
## [[2]]
## [1] "2022-02-28"  "results.txt"
## 
## [[3]]
## [1] "2022-03-01"  "revised"     "results.txt"
```

---

## File naming - Machine reading

- Consider using different space separators for different parts of the file name
- This way you can use the file name it self, programatically, if needed

```
2021-11-13_initial-submission-results.txt
2022-02-28_results.txt
2022-03-01_revised-results.txt
```

```r
file_names <- c(
  "2021-11-13_initial-submission-results.txt",
  "2022-02-28_results.txt",
  "2022-03-01_revised-results.txt")

stringr::str_split(file_names, "_")
```

```
## [[1]]
## [1] "2021-11-13"                     "initial-submission-results.txt"
## 
## [[2]]
## [1] "2022-02-28"  "results.txt"
## 
## [[3]]
## [1] "2022-03-01"          "revised-results.txt"
```

---

## File naming - Human reading & understanding

Optimising file names for computers is great, but ultimately its us humans that need to choose files to work it.
Naming files in a way that makes the file content obvious (or at least give an idea of content) by the file name is good for such interactions.

.pull-left[
```
2021-11-13_final-results.txt
2022-02-28_finalfinal-results.txt
2022-03-01_finished-results.txt
```
]

.pull-right[
```
2021-11-13_first-submission-results.txt
2022-02-28_revision-round1-results.txt
2022-03-01_revision-round2-results.txt
2022-03-01_revision-round2-no-sex-results.txt
```
]

---

## File naming - Human & machine

Choosing file formats is important.

.pull-left[
If you are saving data tables (data of columns and rows),
save as text-files

common file extensions for tables:
- `.tab`/`.tsv` : tabulation separated (`\t`)
- `.csv` - comma (`,`) or semicolon (`;`) separated
- `.dat` - fixed-width (quite uncommon type)
- `.txt` - space (` `) separated
]

.pull-right[
Other formats can be saved on need.
From R, we can sometimes want to save entire
R-objects or environments as `.rda`, `.rds` or `.RData`

**But be careful!** If you are saving output from a statistical model,
the R-object also contains the data used, and you will have
exposed raw data!
]

---

# Using files across your project

- use `here()` to find your files relative to your project root.

- use `source()` to load in functions or run entire pipelines from other files
  - `source(here("R", "ggplot-funcs.R"))`

- use `here()` to save your outputs to designated places
  - `readr::write_tsv(my_data_frame, here("results", "2022-03-03_results.txt")`
  - `ggsave(ggplot_object, here("figures", "2022-03-03_spaghetti_age_cvlt.svg")`

---
class: middle, center, inverse

# RStudio demo

---

Notes:

- make new lcbc paper project
- have a look around the files
- save iris data to data folder
- read in iris data to the Rmd
- show with and without here()
- save data from Rmd with and without here()
- create convenience function for ggplot theme in R
- source the file in the Rmd