In nine weeks, you will learn the basics of data handling with R and details about regression techniques in the context of statistical inference. We will also cover the connection between these concepts and research philosophy. During every lecture, we will cover a different theoretical topic. In addition to the lectures, there will also be a weekly computer lab exercise that connects the statistical theory to practice. You will also attend weekly workgroup meetings wherein you will work on solving motivating, real-world case studies.
The final grade is computed as follows
Grade Component | Weight |
---|---|
Linear Regression Assignment | 25% |
Logistic Regression Assignment | 25% |
Written Exam | 50% |
In addition to the grade components listed above, you will also do R
exercises for the first 7 weeks of the course. These exercises will develop the skills needed to successfully complete the assignments.
To pass the course:
Week # | Topic | R Exercise |
Workgroup | Reading |
---|---|---|---|---|
1 | The elemental building blocks of R |
Assigning values to objects; Creating vectors, matrices, data frames, and lists | Receive instructions and form groups; Locate a data set for predictive modeling | |
2 | Data manipulation; Least squares | Data manipulation; Using pipes to simplify workflows | Get approval for data; Beginning data processing, cleaning, and exploration; Formulate a research hypothesis | |
3 | Linear model 1; Data visualization | The lm() function in R ; Visualizing bivariate relations |
Specify a linear model; Fit your defined model | ISL 3.1 & 6.1, Blog Post 1, Blog Post 2, Lecture Notes |
4 | Linear model 2; Assumptions; Diagnostics | Investigating the assumptions of the linear model | Check the assumptions of your model; Use your model to test your hypotheses; Continue the project in rmarkdown |
ISL 3.2 – 3.4 |
5 | Model building; Prediction; Cross-validation | Tying the analytic pieces together into a full regression analysis | Evaluate and, if possible, improve your model; Prepare Assignment 1; Evaluate the final linear model on your own data | ISL 5.1, Document |
6 | Generalized linear model; Logistic regression 1 | The glm() function in R ; Logistic regression modeling; Classification |
Formulate a research hypothesis and define a logistic model; Fit your defined model | ISL 4.1 – 4.3 (except 4.3.5), Webpage |
7 | Logistic regression 2 | Finish exercise from last week | Check the assumptions of your model; Use your model to test your hypotheses | Webpage |
8 | Summary, catch-up, and questions | None | Evaluate and, if possible, improve your model; Prepare Assignment 2; Evaluate the final logistic model on your own data |
Regression techniques are widely used to quantify the relationship between two or more variables, and investigating such relations is common in data science. Linear and logistic regression are well-established and powerful techniques for analyzing the relations between a set of (predictor) variables and a single (outcome) variable. However, you must understand how and when it is appropriate to apply these regression techniques before you can use them in any beneficial way. In this course, you will learn exactly that: how and when to apply linear and logistic regression with the statistical software package R
.
This course gives students a new set of tools that they can apply to real-world data to explore interesting issues and problems. The course will introduce students to the principles of analytic data science, linear and logistic regression, and the basics of statistical learning. These techniques will be presented in the context of estimation, testing, and prediction. Students will learn to think carefully and critically about statistical inference, quantifying uncertainty, and measuring the accuracy of statistical estimates. Students will also develop fundamental R
programming skills and will gain experience with tidyverse
: visualizing data with ggplot2
and performing basic data wrangling with dplyr
. This course will prepare students for basic research tasks (e.g. junior researcher or research assistant) or further education in research, such as a (research) Master program.
Students will form groups to work on two assignments. Students will need to perform calculations and write R
code for these assignments. All work must be combined into an understandable and insightful R
project and must be submitted to the Surfdrive file drop environment.
Each assignment will be graded on the quality of the following components:
Students will be evaluated on the following aspects:
Apply and interpret the basic methodological and statistical concepts underlying predictive and/or inferential research.
Apply and interpret important techniques in linear and logistic regression analysis.
R
.In this course, skills and knowledge are evaluated in three separate ways:
The exam evaluates the knowledge of methodological and statistical concepts (learning goals 1a, 1d, 1f), as well as the application of these concepts to research scenarios (learning goals 1b and 1c). During the exam students will need to interpret statistical software output (learning goal 1e).
The practical labs test if the student has sufficient skills to solve basic analysis problems and execute quantitative analyses on real-life data sets (learning goals 2a and 2b).
The workgroups focus on applying the newly gained knowledge and skills to solving relevant data analysis problems and reporting on the steps taken to obtain a solution (learning goal 1g).
Hello All,
This semester, you will participate in the Fundamental Techniques in Data Science with R
course at Utrecht University. In this course, you will use both R
and RStudio
. The below steps guide you through installing both R
and RStudio
. Please do so before the first meeting.
Regards,
Instructor Team
Bring a laptop computer to the course and make sure that you have full write access and administrator rights on the machine. We will explore programming and compiling in this course, so you will need full access to your machine. Some corporate laptops come with limited access for their users, I therefore advise you to bring a personal laptop to the workgroup meetings.
R
You can obtain a copy of R
here. We won’t use R
directly in the course. Rather, we’ll call R
through RStudio
. Therefore, you also need to install RStudio
.
RStudio
DesktopRStudio
is an Integrated Development Environment (IDE) for R
. You can download RStudio
as stand-alone software here. The free and open-source RStudio Desktop
version is sufficient.
Open RStudio
, and copy-paste the following lines of code into the console window to execute them.
install.packages(c("ggplot2",
"tidyverse",
"magrittr",
"micemd",
"jomo",
"pan",
"lme4",
"knitr",
"rmarkdown",
"plotly",
"ggplot2",
"shiny",
"devtools",
"boot",
"class",
"car",
"MASS",
"ggplot2movies",
"ISLR",
"DAAG",
"mice"),
dependencies = TRUE)
If you are not sure where to paste the code, use the following figure to identify the console: