Academic Year 2018/19, Term 1
School of Mathematics, The University of Manchester
Lecturer: Korbinian Strimmer
Tutors: Beatriz Costa Gomes, Jack Mckenzie
Overview and syllabus:
For an outline of this course unit see MATH38161: Multivariate statistics and machine learning or download course description as PDF.
Dates and location:
The course starts 26th September 2018 and runs until 14th December 2018. The first computer lab is on 5th October 2018 and the first tutorial on 12th October 2018. All lectures, tutorials and computer labs are held in the Alan Turing Building (ATB).The course takes place at the following dates and locations:
Session | Time slot (location) | Term week |
---|---|---|
Lectures: | Wednesday 11am-12 noon (ATB G207) and Friday 12 noon-1pm (ATB G107) | 1-5 and 7-12 |
Computer labs: | Friday 4pm-6pm (ATB G105) | 2, 4, 8, 10, 12 |
Tutorials: | Friday 4pm-5pm (ATB G205) | 3, 5, 7, 9, 11 |
Office hour: | Friday 3pm-4pm (ATB 2.221) | 1-5, 7-12 |
Course works and exam:
The two course works are each worth 25% and require data analysis and simulation in R and writing of a corresponding report in R Markdown. The written exam (1.5 hours) is worth the remaining 50% and is concerned with theory and methods.
Assessment | Date | Term week |
---|---|---|
Course work #1 (25%): | announced Tuesday 23 October 2018; submission Tuesday 6 November 2018, 12 noon |
5 and 7 |
Course work #2 (25%): | announced Tuesday 4 December 2018; submission Tuesday 8 January 2019 12 noon |
11 and CVAC 04 |
Written exam (50%): | 14-28 January 2019 (date to be determined) | exam period |
Statistical computing:
In this course strong emphasis is put on computation. All methods introduced and discussed in the lectures will be tried and tested on the computer.
Log into R Studio on the minerva computational statistics server.In the bi-weekly computer labs we will work in R Studio, using R for statistical data analysis and R Markdown for project reporting. Students are strongly encouraged to install R and the R Studio software on their own computers. Course participants will also get an account on the School of Mathematics computational statistics cloud server minerva to access R Studio in a web browser (for use in the computer labs and to facilitate the coursework).
Prerequisites:
Suggested readings to refresh knowledge in statistics, matrices and R:a) Dekking et al. 2005. A modern introduction to probability and statistics: understanding why and how. Springer.
b) Petersen and Pedersen. 2012. The matrix cookbook. TU Denmark.
c) R Core Team. 2018. An introduction to R. The R Foundation.
d) Peng. 2016. R programming for data science. Leanpub.
This course assumes students are familiar with the foundations of probability, statistical learning (e.g. maximum likelihood) and matrix theory (e.g. matrix notation and algebra, eigenvalues, singular values, spectral decomposition, rank, condition etc.). Furthermore, basic experience in statistical programming and data analysis using R is expected.
Course material:
This course uses material from several text books - all can be downloaded freely from within the University network:
- Härdle and Simar. 2015. Applied multivariate statistical analysis. 4th edition.
- Hastie, Tibshirani and Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer.
- James, Witten, Hastie andTibshirani. 2013. An introduction to statistical learning with applications in R. Springer.
For learning R markdown please study the following references:
- The R markdown homepage.
- R Studio. 2014. R markdown reference guide.
- Shalizi. 2016. Using R markdown for class reports.
- Xie, Allaire and Grolemund. 2018. R markdown: the definitive guide.
The timetable below will be updated at the end of each week linking the presented material to specific chapters in these books. In addition, the scanned material (visualiser) from the lectures will be available on Blackboard. Furthermore, the automated lecture capture system is active for this module so lectures can be revisited online.
Lecture timetable and contents:
The lectures are divided into five sections, each dealing with a different area in multivariate statistics:
Term week | Lecture (Date) | Content | Reading material |
---|---|---|---|
1, 2 | 1-4 (26 Sept to 5 Oct) |
Multivariate normal model: basic multivariate statistics, multivariate normal distribution and properties, further multivariate distributions (categorical, multinomial, Dirichlet, Wishart), estimation of covariance and correlation matrix both in large and small sample settings (using likelihood and regularised/shrinkage estimation). | Lecture notes for Week 1 (9 pages) and Week 2 (11 pages) |
3, 4 | 5-8 (10 Oct to 19 Oct) |
Dimension reduction and latent variable models: variable transformations, location-scale transformation, corresponding transformation of mean, variance and probability density, coloring transformation, Mahalanobis transformation, whitening transformations (ZCA, PCA, Cholesky and variations), Principle Components Analysis, Canonical Correlation Analysis (CCA). | Lecture notes for Week 3 (11 pages) and Week 4 (11 pages) |
5, 7 | 9-12 (24 Oct to 9 Nov) |
Unsupervised learning / clustering: model-based clustering (finite normal mixture models), algorithmic approaches (e.g. K-means, hierarchical clustering). | |
8, 9, 10 | 13-18 (14 Nov to 30 Nov) |
Supervised learning / prediction and classification: Multivariate regression, diagonal, Linear, and Quadratic Discriminant Analysis (DDA, LDA, QDA) and regularised versions for high-dimensional data analysis. Further approaches to classification (e.g. support vector machines). | |
11 and 12 | 19-22 (5 Dec to 14 Dec) |
Nonlinear and nonparametric models: decision trees, random forest, etc. |
Computer labs timetable and contents:
Term week | Lab (Date) | Topic | Work material |
---|---|---|---|
2 | 1 (5 Oct) |
Introduction to minerva computer system, overview over R Studio (server), introduction to R Markdown, exploring multivariate normal density and estimation of covariances. | You find the material for Lab 1 on Blackboard. |
4 | 2 (19 Oct) |
Simulation of multariate normal data, comparison of whitening procedures, PCA analysis and dimension reduction. | You find the material for Lab 2 on Blackboard. |
8 | 3 (16 Nov) |
Unsupervised learning, clustering. | |
10 | 4 (30 Nov) |
Supervised learning, classification. | |
12 | 5 (14 Dec) |
Nonlinear and nonparametric models. |
Tutorials timetable and contents:
Term week | Tutorial (Date) | Example sheets |
---|---|---|
3 | 1 (12 Oct) |
You find the material for Sheet 1 on Blackboard. |
5 | 2 (26 Oct) |
Sheet 2 |
7 | 3 (9 Nov) |
Sheet 3 |
9 | 4 (23 Nov) |
Sheet 4 |
11 | 5 (7 Dec) |
Sheet 5 |
Coursework timetable and contents:
Term week | Coursework | Submission date | Task |
---|---|---|---|
7 | 1 | 6 Nov 12 noon | Task 1 |
CVAC 4 | 2 | 8 Jan 12 noon | Task 2 |