Coursera - Data Science - John Hopkins

Greg Foletta

2020-04-08

Course 1 - The Data Scientist’s Toolbox

  • Notes - Data science process, types of data analysis, experimental design, big data.

Course 2 - The R Language

  • Week 1 Notes - Overview of R, data types, data frames, attributes, subsetting.
  • Week 2 Notes - R control structures, for and while loops, functions, scoping rules, dates and times.
  • Week 3 Notes - Loop function, lapply, sapply, apply, tapply, mapply, split, debugging.
  • Week 4 Notes - str() function, simulation, permutation, profiling.

Course 3 - Getting and Cleaning Data

  • Week 1 Notes - motivations and goals, raw and processed data, subset, joins.
  • Week 2 Notes - reading from MySQL, reading HDFS, webscraping, APIs.
  • Week 3 Notes - summarisation, new variables, reshaping, merging.
  • Week 4 Notes - Fixing character vectors, working with dates.

Course 4 - Exploratory Data Analysis

  • Week 1 Notes - Principles, exploratory graphs, plotting systems, graphics devices.
  • Week 2 Notes - Latice plotting system, ggplot2.
  • Week 3 Notes - Hierarchical clustering, K-means clustering, dimension reduction, colours and palletes.

Course 5 - Reproducible Research

  • Week 1 Notes - Concepts and ideas, structure of data analysis
  • Week 2 Notes - Coding standards, RMarkdown, knitr
  • Week 3 Notes - Communicating results, rpubs, reproducible research checklist, evidence based data analysis.
  • Week 4 Notes - Caching computations

Course 6 - Statistical Inference

  • Week 1 Notes - Probability, CDF and survival functions, quantiles, conditional probability, expected values, sample means.
  • Week 2 Notes - Variability, standard error, distributions, central limit theorem, confidence intervals.
  • Week 3 Notes - T-confidence intervals, hypothesis testing, t-tests, p-values.
  • Week 4 Notes - Power, t-test power, multiple testing, resampling.

Course 7 - Regression Models

  • Week 1 Notes - Least Squares, Covariance, Correlation, Regression to the Mean
  • Week 2 Notes - Interpreting coefficients, residuals, heteroskedacticity, slope and intercept variance, prediction vs slope intervals.
  • Week 3 Notes - Multivariable linear models, influence measures, variance inflation, nested models.
  • Week 4 Notes - Generalised linear models, logistic regressions, Poisson regressions.

Course 8 - Practical Machine Learning

  • Week 1 Notes - Prediction, Cross-Validation, Types of Errors
  • Week 2 Notes - Data Slicing, Training, Standardising, Imputation, Covariate Creation, PCA
  • Week 3 Notes - Prediction with Trees, Bagging, Random Forests, Boosting, LDA, Naive Bayes
  • Week 4 Notes - Thresholding, Ridge Regression, Lasso, Forecasting, Unsupervised Prediction

Course 9 - Developing Data Products