Introduction to Data Science and Machine Learning

24-26 September, University of Wyoming Conference Center

Summary

Intensive 3-Day Workshop with Lectures, Demos, and Exercises
University of Wyoming Conference Center

This workshop is intended for students, researchers, and practitioners with basic experience in data science and machine learning who want to take their skills to the next level. This intensive workshop will give you the theoretical knowledge and practical skills to apply machine learning and data science techniques in practical contexts to analyze data, build predictive models, and optimize their performance.

  • Data Analysis
  • Supervised and Unsupervised Machine Learning
  • Evaluation of Machine Learning Models
  • Parameter Tuning
  • Ensembles, Boosting, Feature Selection
  • mlr R Package (presented by the authors of mlr)
  • Basic Knowledge of Statistics, Programming, and R required

Presenters


Bernd Bischl

Julia Moosbauer

Martin Binder

Stefan Coors

Prof. Bernd Bischl leads the computational statistics group at LMU Munich and directs the Munich Center for Machine Learning. He is one of the principal authors of the mlr Machine Learning package, which the other presenters also contribute to. All presenters have extensive experience developing machine learning and data science approaches and applying them to real-world problems.

The mlr package is the most comprehensive machine learning package in R and has a rapidly growing user base. It is installed more than 15,000 times per month and the source code repository has more than 1,200 stars on GitHub. Version 3 is a complete reimplementation that takes the many lessons learned with previous versions into account to make machine learning easier, more flexible, and more efficient.

Schedule

Tuesday, September 24

8.45 — 9.00
welcome and opening remarks
9.00 — 9.30
lecture — recap of risk minimization, linear models, logistic regression
9.30 — 11.00
lecture — train/test split, resampling, overfitting
11.00 — 11.15
break
11.15 — 12.30
lecture — CART and random forests
12.30 — 13.30
lunch break
13.30 — 15.15
lecture — ROC and introduction to mlr
15.15 — 15.30
break
15.30 — 17.00
exercises
17.15 —
networking reception

Wednesday, September 25

 
 
9.00 — 11.00
lecture — regularization, tuning, nested resampling
11.00 — 11.15
break
11.15 — 12.30
demo — tuning with mlr
12.30 — 13.30
lunch break
13.30 — 15.00
lecture — boosting
15.00 — 15.30
break
15.30 — 17.00
exercises

Thursday, September 26

 
 
9.00 — 11.00
lecture — feature extraction and preprocessing
11.00 — 11.15
break
11.15 — 12.30
demo — pipelines, graphs, preprocessing
12.30 — 13.30
lunch break
13.30 — 15.00
lecture — clustering and principal component analysis
15.00 — 15.30
break
15.30 — 17.00
exercises

Prerequisites

You must have R, RStudio, and the mlr3 and mlr3learners (with all suggests) packages installed before the workshop — we will not provide any help with this during the workshop. You should have basic familiarity with programming and R. You can find a list of curated resources on how to install and get started with R at https://www.rstudio.com/online-learning/. You should also be able to install R packages from CRAN (we will install additional packages during the workshop). To get started with mlr3, have a look at the mlr3 book (work in progress).

You should be familiar with basic concepts in data science and machine learning. We will assume that you know the material that is covered in the first two days of the online introduction to machine learning course you can find at https://compstat-lmu.github.io/lecture_i2ml/articles/content.html. The course provides lecture videos, slides, and exercises with solutions. Please work through this material before September 24; otherwise you will not be able to follow the material presented in the workshop. Your main focus should be on understanding the theoretical concepts; the exercises serve to illustrate them.

Materials

Workshop materials (slides etc) are available here. Introduction slides are available here.