In nine weeks, you will learn the basics of data handling with R and details about regression techniques in the context of statistical inference. We will also cover the connection between these concepts and research philosophy. During every lecture, we will cover a different theoretical topic. In addition to the lectures, there will also be a weekly computer lab exercise that connects the statistical theory to practice. You will also attend weekly workgroup meetings wherein you will work on solving motivating, real-world case studies.

The final grade is computed as follows

Grade Component | Weight |
---|---|

Linear Regression Assignment | 25% |

Logistic Regression Assignment | 25% |

Written Exam | 50% |

In addition to the grade components listed above, you will also do `R`

exercises for the first 7 weeks of the course. These exercises will develop the skills needed to successfully complete the assignments.

To pass the course:

- Your final exam grade must be 5.5 or higher
- Both of your assignments grades must be 5.5 or higher

Week # | Topic | `R` Exercise |
Workgroup | Reading |
---|---|---|---|---|

1 | The elemental building blocks of `R` |
Assigning values to objects; Creating vectors, matrices, data frames, and lists | Receive instructions and form groups; Locate a data set for predictive modeling | |

2 | Data manipulation; Least squares | Data manipulation; Using pipes to simplify workflows | Get approval for data; Beginning data processing, cleaning, and exploration; Formulate a research hypothesis | |

3 | Linear model 1; Data visualization | The `lm()` function in `R` ; Visualizing bivariate relations |
Specify a linear model; Fit your defined model | ISL 3.1 & 6.1, Blog Post 1, Blog Post 2, Lecture Notes |

4 | Linear model 2; Assumptions; Diagnostics | Investigating the assumptions of the linear model | Check the assumptions of your model; Use your model to test your hypotheses; Continue the project in `rmarkdown` |
ISL 3.2 – 3.4 |

5 | Model building; Prediction; Cross-validation | Tying the analytic pieces together into a full regression analysis | Evaluate and, if possible, improve your model; Prepare Assignment 1; Evaluate the final linear model on your own data |
ISL 5.1, Document |

6 | Generalized linear model; Logistic regression 1 | The `glm()` function in `R` ; Logistic regression modeling; Classification |
Formulate a research hypothesis and define a logistic model; Fit your defined model | ISL 4.1 – 4.3 (except 4.3.5), Webpage |

7 | Logistic regression 2 | Finish exercise from last week | Check the assumptions of your model; Use your model to test your hypotheses | Webpage |

8 | Summary, catch-up, and questions | None | Evaluate and, if possible, improve your model; Prepare Assignment 2; Evaluate the final logistic model on your own data |

Regression techniques are widely used to quantify the relationship between two or more variables, and investigating such relations is common in data science. Linear and logistic regression are well-established and powerful techniques for analyzing the relations between a set of (predictor) variables and a single (outcome) variable. However, you must understand how and when it is appropriate to apply these regression techniques before you can use them in any beneficial way. In this course, you will learn exactly that: how and when to apply linear and logistic regression with the statistical software package `R`

.

This course gives students a new set of tools that they can apply to real-world data to explore interesting issues and problems. The course will introduce students to the principles of analytic data science, linear and logistic regression, and the basics of statistical learning. These techniques will be presented in the context of estimation, testing, and prediction. Students will learn to think carefully and critically about statistical inference, quantifying uncertainty, and measuring the accuracy of statistical estimates. Students will also develop fundamental `R`

programming skills and will gain experience with `tidyverse`

: visualizing data with `ggplot2`

and performing basic data wrangling with `dplyr`

. This course will prepare students for basic research tasks (e.g. junior researcher or research assistant) or further education in research, such as a (research) Master program.

Students will form groups to work on two assignments. Students will need to perform calculations and write `R`

code for these assignments. All work must be combined into an understandable and insightful `R`

project and must be submitted to the Surfdrive file drop environment.

Each assignment will be graded on the quality of the following components:

- The methodological application
- The model evaluation and assumption checking
- The code and scripts

Students will be evaluated on the following aspects:

Apply and interpret the basic methodological and statistical concepts underlying predictive and/or inferential research.

- Explain concepts from inferential statistics, such as probability, inference, and modeling; apply these concepts in practice.
- Make an informed choice of research designs that are suitable for regression analyses.
- Apply and explain the choice of techniques for investigating data problems.
- Apply and explain the concepts of linearity and non-linearity.
- Interpret statistical software output, and report software output following APA reporting guidelines.
- Explain and conceptualize statistical inference and its relation to statistical theory.
- Perform the different steps of solving basic regression analysis problems and report on these steps.

Apply and interpret important techniques in linear and logistic regression analysis.

- Perform, interpret, and evaluate quantitative (causal) analyses on data with the statistical software platform
`R`

. - Perform analyses in statistical software.

- Perform, interpret, and evaluate quantitative (causal) analyses on data with the statistical software platform

In this course, skills and knowledge are evaluated in three separate ways:

The exam evaluates the knowledge of methodological and statistical concepts (learning goals 1a, 1d, 1f), as well as the application of these concepts to research scenarios (learning goals 1b and 1c). During the exam students will need to interpret statistical software output (learning goal 1e).

The practical labs test if the student has sufficient skills to solve basic analysis problems and execute quantitative analyses on real-life data sets (learning goals 2a and 2b).

The workgroups focus on applying the newly gained knowledge and skills to solving relevant data analysis problems and reporting on the steps taken to obtain a solution (learning goal 1g).

Hello All,

This semester, you will participate in the **Fundamental Techniques in Data Science with R** course at Utrecht University. In this course, you will use both

`R`

and `RStudio`

. The below steps guide you through installing both `R`

and `RStudio`

. Please do so before the first meeting.Regards,

Instructor Team

Bring a laptop computer to the course and make sure that you have full write access and administrator rights on the machine. We will explore programming and compiling in this course, so you will need full access to your machine. Some corporate laptops come with limited access for their users, I therefore advise you to bring a personal laptop to the workgroup meetings.

`R`

You can obtain a copy of `R`

here. We won’t use `R`

directly in the course. Rather, we’ll call `R`

through `RStudio`

. Therefore, you also need to install `RStudio`

.

`RStudio`

Desktop`RStudio`

is an Integrated Development Environment (IDE) for `R`

. You can download `RStudio`

as stand-alone software here. The free and open-source `RStudio Desktop`

version is sufficient.

Open `RStudio`

, and copy-paste the following lines of code into the console window to execute them.

- If nothing happens after you paste the code, try hitting the “Enter/Return” key.

```
install.packages(c("ggplot2",
"tidyverse",
"magrittr",
"micemd",
"jomo",
"pan",
"lme4",
"knitr",
"rmarkdown",
"plotly",
"ggplot2",
"shiny",
"devtools",
"boot",
"class",
"car",
"MASS",
"ggplot2movies",
"ISLR",
"DAAG",
"mice"),
dependencies = TRUE)
```

If you are not sure where to paste the code, use the following figure to identify the console: