Date and Time: 01:00 pm - 5:00pm on March 31 & April 1, 2022 (EDT)

Format: Online via Zoom (Zoom Meeting ID: 444 776 7930 passcode: 220331)

Website for interactive exercise: https://edstem.org/us/courses/16127/lessons/

Course Description

R is one of the most popular programming languages for statistical computing and graphics among data scientists. This short course includes six lessons covering various programming and data visualization techniques for data science in R. Each lesson includes a lecture introducing the techniques and a hands-on session for practicing. An interactive platform will be used for the hands-on sessions.

Registration

Registration is required through eventbrite.

Course Agenda

All times are in Eastern Daylight Time (EDT).

Day 1: Thursday, March 31, 2022

Time Topic Materials
01:00 pm – 02:30 pm Introduction to R and Data Structures link
02:30 pm – 03:30 pm Base R link
03:30 pm – 05:00 pm Function programming link

Day 2: Friday, April 1, 2022

Time Topic Materials
01:00 pm – 02:30 pm dplyr and tidy Data link
02:30 pm – 03:30 pm Data visualization and ggplot link
03:30 pm – 05:00 pm Relational Data link

Course Content

  1. Introduction to R and data structures: An extensive introduction to R data structure. Covered topics include:

    • Data structures in R, such as vector and list;
    • object attributes and how to create, test, coerce and retrieve attributes;
    • S3 vectors and special cases: factor, date, date-times, and durations;
    • Data frame and its modern version, tibbles, commonly used in statistics and data science;
    • data frame construction, printing, etc.
    • Subsetting of different data structures for complex operations
  2. Base R: Fundamentals of the R programming language including control flows, functions, and operators. Covered topics include

    • Choices: e.g., if, switch and the vectorised version ifelse.
    • Loops: early termination methods and common pitfalls.
    • Function basics: Utilization, components, and creation.
    • Composite functions and a modern option, pipe (%>%).
    • Advanced knowledge about functions: lazy evaluation and the special argument.
    • Exiting a function, invisible outcome, and error handling.
  3. Functional programming: Efficient and transparent functional programming that avoids complex control flows and affords better error handling. Covered topics include:

    • Functionals that take function and vectors as input to avoid many loops and speed up your codes especially in data analysis.
    • Common functionals: map, reduce, accumulate and predicate.
    • Function factories for code simplification.
    • Function operators for error handling, e.g., safely and quietly.
  4. dplyr and tidy data: Intuitive and user-friendly grammar for data manipulation. Covered topics include:

    • Efficient data manipulation with the dplyr package.
    • Five important “verbs” for common data manipulation tasks: filter, arrange, select, mutate, and summarize to help you translate your thoughts into code.
    • How to tidy your data so that your data structure is consistent and make use of R’s vectorized nature.
    • Two powerful functions, gather and spread to resolve common problems in messy data.
  5. Data visualization and ggplot: Clear and accurate visualization using ggplot2. Covered topics include:

    • Basic principles for data visualization and communicating your “story”.
    • Scatterplots and the use of aesthetic and faceting for displaying a third variable in a 2D plot.
    • Bar charts, line charts, boxplots via geom.
    • Labels, axes, annotations and legends for plot customization.
    • Advanced techniques, e.g., customized colors, zooming and add-on themes.
  6. Relational data: Analysis and manipulation of multiple pairwise related data frames. Covered topics include:

    • Two families of verbs for relational data: mutating joins that add new variables by matching observations, and filtering joins that remove observations by matching in another.
    • Classes of mutating joins and how to solve duplicate issues.
    • Analyzing dates and times including creation from strings, rounding, time zone, arithmetics (i.e. subtraction, addition, and division) with times and periods.
    • Analyzing factors including modifying factor levels, collapsing levels, and ordering levels.
    • Regular expressions: analyzing and describing patterns in strings.