In nine weeks, you will learn the basics of data handling with R and details about regression techniques in the context of statistical inference. We will also cover the connection between these concepts and research philosophy. During every lecture, we will cover a different theoretical topic. In addition to the lectures, there will also be a weekly computer lab exercise that connects the statistical theory to practice. You will also attend weekly workgroup meetings wherein you will work on solving motivating, real-world case studies.
The final grade is computed as follows
Grade Component | Weight |
---|---|
Group assignment 1: Linear regression | 25% |
Group assignment 2: Logistic Regression | 25% |
Written Exam | 50% |
In addition to the grade components listed above, you will also do
R
exercises for the first 7 weeks of the course. These
exercises will develop the skills needed to successfully complete the
assignments.
To pass the course:
During this course, you will attend 8 workgroup sessions and hand in 7 practical assignments. We expect to you attend at least 7 out of 8 workgroup sessions, and hand in at least 6 out of 7 practical assignments (before the deadline). If you do not meet these requirements, you lose the right to resit the exam.
We will use two open-source books in this course:
There is no need to purchase these books. The freely available online versions are sufficient. The relevant chapters will be linked in this dashboard where the reading is assigned. We will also use several external webpages and web apps. These resources will also be linked in this dashboard.
Week # | Topic | R Exercise |
Workgroup | Reading |
---|---|---|---|---|
1 | The basics of R |
How to work with R via scripts, projects, and markdown;
How to import external data into R ; How to write your own
functions; How to iterate repetative tasks |
Form groups; Search for a dataset for the two group assignments; Formulate research questions | R4DS: Chapter 11, Chapter 27, Chapter 19, and Chapter 21 |
2 | Programmatic data manipulation 1 | Data types and objects in R ; Data transformation;
Working with pipes |
Perform data transformations on your found dataset | R4DS: Chapter 5, Chapter 10, Chapter 14 (only 14.1 and 14.2), Chapter 15, Chapter 18, and Chapter 20 |
3 | Programmatic data manipulation 2 | Data visualization; Data inspection; Data cleaning | Continue with data inspection and cleaning | R4DS: Chapter 3 and Chapter 7; ASWR: Chapter 4 |
4 | Multiple linear regression | Estimating linear models in R using the
lm() function; Model fit and model comparison; Categorical
predictors; Moderation |
Find a best fitting model; Test your hypotheses | ASWR: Chapter 7 (only 7.1–7.4), Chapter 9 (only 9.1–9.4), Chapter 11 (only 11.1–11.3), and Chapter 16 (only 16.1—not 16.1.4—and 16.2) |
5 | Model assumptions and diagnostics | Assumptions of the linear model; Leverage, outliers, and influential cases | Check assumptions of your model and inspect for unusual observations; Make adjustments if necessary; Draw conclusions; Submit Assignment 1 | ASWR: Chapter 13 |
6 | Generalized linear model and logistic regression | Estimating generalized linear models using the glm()
function in R ; Definition, estimation, and interpretation
of logistic regression models |
Perform data inspection and cleaning for the second assignment; Formulate hypothesis; Find a best fitting model and test your hypotheses | ASWR: Chapter 17 (only 17.1–17.3); This webpage |
7 | Logistic regression assumptions and classification | Logistic regression assumptions; Classification; Confusion matrix | Check the assumptions of your model and make adjustments if necessary; Make classifications | ASWR: Chapter 17 (only 17.4); This webpage |
8 | Summary, catch-up, and questions | - | Interpret your final model as well as the confusion matrix; Draw conclusions; Submit Assignment 2 | - |
Regression techniques are widely used to quantify the relationship
between two or more variables. In data science, linear and logistic
regression are common and powerful techniques for evaluating such
relations. These techniques are only useful, however, once you
understand when and how to apply them. In this course, students will
learn how to apply linear and logistic regression with the
R
statistical software package.
This course will introduce students to the principles of analytical
data science, linear and logistic regression, and the basics of
statistical learning. Students will develop fundamental R
programming skills and will gain experience with tidyverse: visualize
data with ggplot2 and performing basic data wrangling with dplyr. This
course helps prepare students for an entry-level research career
(e.g. junior researcher or research assistant) or further education in
research (e.g., a [research] Master program or a PhD).
At the end of this course, students are able to:
R
statistical software platform to perform
basic statistical programming, data manipulation, data visualization,
and basic data wrangling. \(\\[6pt]\)R
statistical software platform to perform,
interpret, and evaluate linear and logistic regression analyses on
real-world data. \(\\[6pt]\)R
output and use the results to answer
research questions. \(\\[6pt]\)R
Markdown to document the results of a statistical
analysis.In this course, skills and knowledge are evaluated with two types of assignment.
In eight weeks, you will learn the basics of data handling and statistical programming with R and details about regression techniques in the context of statistical inference, prediction, and classification. Each week will comprise three class activities:
During this course, you will attend 8 workgroup sessions and hand in 7 practical assignments. We expect to you attend at least 7 out of 8 workgroup sessions, and hand in at least 6 out of 7 practical assignments (before the deadline). If you do not meet these requirements, you lose the right to resit the exam.
Type of assignment: Group (4 students)
Grading: 25% of your final grade
Deadline: Monday December 18, 17:00
What to submit: A ZIP archive containing the complete R project (dataset, RMD, HTML)
Where to submit: This Surfdrive folder
Description: For this assignment, you perform and report a multiple linear regression analysis in an R markdown document. The assignment will be graded on the following five dimensions:
Type of assignment: Group (4 students)
Grading: 25% of your final grade
Deadline: Thursday January 18, 17:00
What to submit: A ZIP archive containing the complete R project (dataset, RMD, HTML)
Where to submit:: This Surfdrive folder
Description: For this assignment, you perform and report a multiple logistic regression analysis in an R markdown document. The assignment will be graded on the following five dimensions:
This semester, you will participate in the Fundamental
Techniques in Data Science with R
course at
Utrecht University. In this course, you will use both R
and
RStudio
. The steps below will guide you through installing
both R
and RStudio
. Please do so before the
first meeting.
Bring a laptop computer to the course, and make sure that you have full write access and administrator rights on the machine. We will explore programming and compiling in this course, so you will need full access to your machine. Some corporate laptops come with limited access for their users, I therefore advise you to bring a personal laptop to the workgroup meetings.
R
You can obtain a copy of R
here. We won’t use R
directly in the course. Rather, we’ll call R
through
RStudio
. Therefore, you also need to install
RStudio
.
RStudio
DesktopRStudio
is an Integrated Development Environment (IDE)
for R
. You can download RStudio
as stand-alone
software here. The
free and open-source RStudio Desktop
version is
sufficient.
Open RStudio
, and copy-paste the following lines of code
into the console window to execute them.
install.packages(c("ggplot2",
"tidyverse",
"magrittr",
"micemd",
"jomo",
"pan",
"lme4",
"knitr",
"rmarkdown",
"plotly",
"ggplot2",
"devtools",
"class",
"car",
"MASS",
"ISLR",
"mice"),
dependencies = TRUE)
If you are not sure where to paste the code, use the following figure to identify the console: