STAT 479 -- Machine Learning (Fall 2018)
Table of Contents
- Course Logistics
- Course Description
- Resources
- Grading
- Class Project
- Other Important Course Information
- Schedule
Project Award Winners
Teaching this class was a pleasure, and I am especially happy about how awesome the class projects turned out. Listed below are the winners of the three award categories as determined by ~210 votes. Congratulations!
Course Logistics
When
- Tue 8:00-9:15 am
- Thu 8:00-9:15 am
Where
- SMI 331
Instructors
- Instructor: Sebastian Raschka
- Teaching Assistant: Shan Lu
Office Hours
- Sebastian Raschka:
- Tue 3:00-4:00 pm, Room MSC 1171
- Shan Lu:
- Wed 3:00-4:00 pm, Room MSC B248
Course Description
Credits: 3
Course Description:
This course will cover the key concepts of machine learning, including classification, regression analysis, clustering, and dimensionality reduction. Students will learn about the fundamental mathematical concepts underlying machine learning algorithms, but this course will equally focus on the practical use of machine learning algorithms using open source libraries from the Python programming ecosystem. Students are expected to participate in an individual, final class project (the topic can be flexibly chosen by the students within the scope of the material taught in this class) and apply the learned concepts to real-world problem-solving.
Along with introducing of the concepts of machine learning, the lectures will provide a refresher on relevant concepts from calculus and linear algebra – a calculus background (e.g., Math 221) and a linear algebra background (e.g., Math 340) is recommended. While this course will also provide an introduction to the basics of the Python programming language for machine learning, it is highly recommended that students are familiar with basic programming and have completed an introductory programming class.
Learning Outcomes:
- Understanding the different fields of machine learning, such as supervised and unsupervised learning, and identifying scenarios where it makes sense to apply machine learning for real-world problem-solving.
- Building a repertoire of different algorithms and approaches to machine learning (data and algorithmic models / parametric and nonparametric models) and understanding their various strengths and weaknesses.
- Learning how to use the Python programming language and Python’s scientific computing stack for implementing machine learning algorithms to 1) enhance the learning experience, 2) conduct research and be able to develop novel algorithms, and 3) apply machine learning to problem-solving in various fields and application areas.
- Being able to think about approaching problems with the desired outcome in mind, to navigate the typical trade-off between computational efficiency, model interpretability, and predictive accuracy effectively.
- Combining both the theoretical and practical concepts taught in this class to creative, real-world problem solving and having completed a project that can be optionally shared on a resume.
Course Prerequisites: Consent of instructor.
Course Audience: Students majoring in math or statistics or those wishing to take additional statistics courses.
Credits: 3
Resources
Python Machine Learning, 2nd Edition (highly recommended)
- Raschka, S., & Mirjalili, V. (2017). Python Machine Learning, 2nd Ed. Birmhingham, UK: Packt Publishing. ISBN-13: 978-1787125933
- Many of the hands-on code examples, topics, and figures discussed in class were adopted from this book; hence, it is highly recommended to read through the chapters in this book.
- Code examples and figures are freely available online under an open source license at https://github.com/rasbt/python-machine-learning-book-2nd-edition.
Elements of Statistical Learning (recommended)
- Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning (Vol. 1, No. 10). New York, NY, USA: Springer series in statistics. ISBN-13: 978-0387848570
- Throughout this course, several chapters will be recommended as further reading material for interested students. Since this book covers more advanced material that is more appropriate for a graduate-level course, material from this book will be recommended, not required.
- A free PDF version of this book is avalailable at https://web.stanford.edu/~hastie/ElemStatLearn/.
Illustrated Guide to Python (recommended)
- “Illustrated Guide to Python 3: A Complete Walkthrough of Beginning Python with Unique Illustrations Showing how Python Really Works. Now covering Python 3.6 (Treading on Python) (Volume 1)” by Matt Harrison, ISBN-13: 978-1977921758.
This book will not be coverered in class. However, some readers asked me for good Python resources as preparation for this class, and this is one of the resources I would recommend. However, there are many other Python learning resources available online.
For instance, another great book is Allen Downey’s Think Python 2e (free PDF available at https://greenteapress.com/wp/think-python-2e/).
Depending on your preferred learning style, also consider learning Python interactively instead/or in addition of reading a Python book. A great interactive resource for learning Python is Codecademy: https://www.codecademy.com. In particular, there is a free, < 10 hr interactive course: https://www.codecademy.com/learn/learn-python.
Class Project
Overview
The goal of working on a class project is three-fold. First, it will provide you with the opportunity to apply the concepts learned in this class creatively, which helps you with understanding material more deeply. Second, designing and working on a unique project in a team which is something that you will encounter, if you haven’t already, rather sooner than later in life, and this course project helps with preparing for that. Third, along with the opportunity to practice and the satisfaction of working creatively, students can use this project to enhance their portfolio or resume.
Note about grading
There is no “perfect project.” While you are encouraged to be ambitious, the most important aspect of this project is your learning experience. Hence, you don’t want to pick something that is too easy for you, but similarly, you don’t want to choose a project where you are not certain that is out of the scope of this class. The project proposal is not graded by how exciting your project is but based on whether you follow the objectives of the project proposal, project presentation, and project report. For instance, if your project ends up being unsuccessful – for example, if you choose to design a classifier and it doesn’t achieve the desired accuracy – it will not negatively affect your grade as long as you are honest, describe the potential issues well, and suggest improvements or further experiments. Again, the objective of this project is to provide you with hands-on practice and an opportunity to learn.
The project consists of 3 parts: a project proposal, a short project presentation, and a project report. The expectations for each part will be discussed in the following sections.
1) Project Proposal
The main purpose of the project proposal is to receive feedback from the TAs/the instructor regarding whether your project is feasible and whether it is within the scope of this class. Also, the project proposal offers a chance to receive useful feedback and suggestions on your project.
For this project, you will be working in a team consisting of three students. The members of each team will be randomly assigned by the TAs/instructor. If you have any concerns working with someone in your group, please talk to a TA or the instructor for accommodations.
Proposal Format:
- The project proposal is a 1-3 page document (800-1200 words) excluding references.
- You are encouraged (not required) to use 1-2 figures to illustrate technical concepts.
- The proposal must be formatted and submitted as a PDF document (the submission deadline will be later announced via the schedule & email)
Introduction:
- Describe what you are planning to do.
- Briefly describe related work (if applicable).
Motivation:
- Describe why your project is interesting. E.g., you can describe why your project could have a broader societal impact. Or, you may describe the motivation from a personal learning perspective.
Evaluation:
- What would the successful outcome of your project look like? In other words, under which circumstances would you consider your project to be “successful?”
- How do you measure success, specific to this project, from a technical standpoint?
Resources:
- What resources are you going to use (datasets, computer hardware, computational tools, etc.)?
Contributions:
You are expected to share the workload evenly, and every group member is expected to participate in both the experiments and writing. (As a group, you only need to submit one proposal and one report, though. So you need to work together and coordinate your efforts.)
- Clearly indicate what computational and writing task each member of your group will be participating in.
It is crucial that you talk to each other regularly!!! Schedule regular meetings and/or use online communication tools (e.g., Gitter, Slack, or email) to stay in touch with your group members throughout the semester regarding the process of your project.
Modifications to the Proposal. After you have received feedback from the TAs/the instructor and your project proposal has been graded, you are advised to stick to the project outline in the proposal as closely as possible. However, if there is a concept introduced in a later lecture (for instance, a machine learning algorithm that you think is more appropriate then the one you proposed), you have the option to modify your proposal, but you are not penalized if you don’t. If you wish to update your project outline, talk to a TA first.
2) Project Presentation
During the last three lectures, you will be presenting your project to the class. The presentation is “free form” but should cover the following:
- introduce the topic to a general audience (your class);
- summarize the main approach or method;
- highlight the outcomes of your project.
The presentation should be 8-10 minutes long, plus 2 minutes will be reserved for questions. All members of the group should participate in the presentation.
- To encourage attendance, we will use a random number generator in class to determine the order in which the groups will present.
- Please bring your own device for the presentation (we have a VGA and a HDMI cable for this projector). Further, I will provide the following connectors: Displayport-to-HDMI, Displayport-to-VGA, USB-C-to-VGA, USB-C-to-HDMI, Lightning-to-HDMI (for iPad).
- There will be 3 awards:
- Best Oral Presentation
- Most Creative Project
- Best Visualizations
- The awards will be determined by voting, each student will fill out a card in class (I will provide the cards), voting for each presentation (on a scale from 1-10 for each of the 3 categories, where 10 is best), and I will collect the cards at the end of the lecture.
The voting card should be filled out as follows:
- Title of the Presentation, x/10, y/10, z/10
- Title of the Presentation, x/10, y/10, z/10 …
where
- x are the points for 1. Best Oral Presentation
- y are the points for 2. Most Creative Project
- z are the points 3. Best Visualizations
The awards will be computed based on the highest number of points for each category. However, one project can only receive one of the prizes.
Each of the three cards handed in will provide 3 bonus points towards your project report grade (9 pts in total).
3) Project Report
The project report is expected to be 6-8 pages long (excluding references) and should contain the follwing sections:
- Introduction
- Related Work
- Proposed Method
- Experiments
- Results and Discussion
- Conclusions
- Contributions
More details are provided in the LaTeX report template at https://github.com/rasbt/stat479-machine-learning-fs18.
Also, you are required to submit all the code, computations, and experiments you developed and conducted for this project. Note that the quality of code will not have any influence on your grad and will merely serve as a basis to establish that the report contains original and “real” results.
Optional: Sharing your Project
You are encouraged to share your project/final project report online after you completed the course – for example, via GitHub or on a personal website online. Besides, I would be happy to write a blog article summarizing each project in a few sentences, including a link to your project website (if applicable). However, note that your project will only be included with your explicit consent, and if you don’t want to share your project online, that’s totally fine.
Grading
The final grade will be computed using the following weighted grading scheme:
- 30% Problem Sets
- 40% Exams:
- 15% Midterm Exam
- 25% Final Exam
- 30% Class Project:
- 5% Project proposal
- 10% Project presentation
- 15% Project report
Other Important Course Information
RULES, RIGHTS & RESPONSIBILITIES
See the Guides’s Rules, Rights and Responsibilities
ACADEMIC INTEGRITY
By enrolling in this course, each student assumes the responsibilities of an active participant in UW-Madison’s community of scholars in which everyone’s academic work and behavior are held to the highest academic integrity standards. Academic misconduct compromises the integrity of the university. Cheating, fabrication, plagiarism, unauthorized collaboration, and helping others commit these acts are examples of academic misconduct, which can result in disciplinary action. This includes but is not limited to failure on the assignment/course, disciplinary probation, or suspension. Substantial or repeated cases of misconduct will be forwarded to the Office of Student Conduct & Community Standards for additional review. For more information, refer to studentconduct.wiscweb.wisc.edu/academic-integrity/.
ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES
McBurney Disability Resource Center syllabus statement: “The University of Wisconsin-Madison supports the right of all enrolled students to a full and equal educational opportunity. The Americans with Disabilities Act (ADA), Wisconsin State Statute (36.12), and UW-Madison policy (Faculty Document 1071) require that students with disabilities be reasonably accommodated in instruction and campus life. Reasonable accommodations for students with disabilities is a shared faculty and student responsibility. Students are expected to inform faculty [me] of their need for instructional accommodations by the end of the third week of the semester, or as soon as possible after a disability has been incurred or recognized. Faculty [I], will work either directly with the student [you] or in coordination with the McBurney Center to identify and provide reasonable instructional accommodations. Disability information, including instructional accommodations as part of a student’s educational record, is confidential and protected under FERPA.” http://mcburney.wisc.edu/facstaffother/faculty/syllabus.php
DIVERSITY & INCLUSION
Institutional statement on diversity: “Diversity is a source of strength, creativity, and innovation for UW-Madison. We value the contributions of each person and respect the profound ways their identity, culture, background, experience, status, abilities, and opinion enrich the university community. We commit ourselves to the pursuit of excellence in teaching, research, outreach, and diversity as inextricably linked goals.
The University of Wisconsin-Madison fulfills its public mission by creating a welcoming and inclusive community for people from every background – people who as students, faculty, and staff serve Wisconsin and the world.” https://diversity.wisc.edu/
Schedule
Note that this is a tentative schedule subject to changes.
Below is a list of topics we aim to cover. However, we will take our time, and it is more important to build a good understanding of the core concepts and the field in general rather than covering one more algorithm. Keep in mind that a good foundation will enable you to study and understand additional algorithms if the need arises.
Topics
Part I: Introduction
- Lecture 1: What is Machine Learning? An Overview.
[Lecture Material] - Lecture 2: Intro to Supervised Learning: Nearest Neighbor Methods
[Lecture Material]
Part II: Computational Foundations
- Lecture 3: Using Python, Anaconda, IPython, Jupyter Notebooks
[Lecture Material] - Lecture 4: Scientific Computing with NumPy, SciPy, and Matplotlib
[Lecture Material] - Lecture 5: Data Preprocessing and Machine Learning with Scikit-Learn
[Lecture Material]
Part III: Tree-Based Methods
- Lecture 6: Decision Trees
[Lecture Material] - Lecture 7: Ensemble Methods
[Lecture Material]
Part IV: Evaluation
- Lecture 8: Model Evaluation 1 – Introduction to Overfitting and Underfitting
[Lecture Material] - Lecture 9: Model Evaluation 2 – Confidence Intervals and Resampling
[Lecture Material] - Lecture 10: Model Evaluation 3 – Model Selection and Cross-Validation
[Lecture Material] - Lecture 11: Model Evaluation 4 – Statistial Tests and Algorithm Selection
[Lecture Material] - Lecture 12: Model Evaluation 5 – Performance Metrics
[Lecture Material]
Part V: Dimensionality Reduction
- Lecture 13: Feature Selection
[Lecture Material] - Lecture 14: Feature Extraction
[Lecture Material]
Part VI: Bayesian Learning
- Bayes Classifiers
- Text Data & Sentiment Analysis
- Naive Bayes Classification
Part VII: Regression and Unsupervised Learning
- Regression Analysis
- Clustering
The following topics will be covered in the beginning of the Deep Learning class next Spring.
Part VIII: Introduction to Artificial Neural Networks
- Perceptron
- Adaline & Logistic Regression
- SVM
- Multilayer Perceptron
Calendar
Date | Event | Description | Lecture Material | Announcements |
---|---|---|---|---|
Part I: Introduction
|
||||
Thu, Sep 06 |
Day 1 | ● Course Overview ● L01: ML Intro & Overview |
[L01 Intro - Notes] [L01 Intro - Slides] |
|
Tue, Sep 11 |
Day 2 | ● L01 Cont'd | ||
Thu, Sep 13 |
Day 3 | ● L02: Intro to Supervised Learning: KNN | [L02 KNN - Notes] [L02 KNN - Slides] [L02 KNN - Demo] |
|
Part II: Computational Foundations |
||||
Tue, Sep 18 |
Day 4 | ● L03: Using Python | [L03 Python - Notes] | Problem Set 1 Available |
Thu, Sep 20 |
Day 5 | ● L04: Python's Scientific Computing Stack | [L04 - Notes (ipynb)] [L04 - Notes (pdf)] |
|
Tue, Sep 25 |
Day 6 | ● L05: Data Preprocessing and Machine Learning with Scikit-Learn |
[L05 - Slides] [L05 - Notes (ipynb)] |
|
Part III: Tree-Based Methods
|
||||
Thu, Sep 27 |
Day 7 | ● L05 Cont'd ● L06: Decision Trees |
[L06 Trees - Slides] [L06 Trees - Notes] [L06 Trees - Demo] |
|
Tue, Oct 02 |
Day 8 | ● HW1 Discussion ● L06 Cont'd |
Problem Set 1 Due |
|
Thu, Oct 04 |
Day 9 | ● L06 Cont'd ● L07: Ensemble Methods |
[L07 Ensembles - Notes] [L07 Ensembles - Slides] |
|
Tue, Oct 09 |
Day 10 | ● L07 Cont'd |
||
Thu, Oct 11 |
Day 11 | ● L07 Cont'd | ||
Tue, Oct 16 |
Day 12 | ● L08: Model Eval. 1 - Overfitting | [L08 Model Eval 1 - Slides] [L08 Model Eval 1 - Notes] |
|
Thu, Oct 18 |
Day 13 | Midterm Exam |
Midterm Exam |
|
Tue, Oct 23 |
Day 14 | ● L08 Cont'd | Problem Set 2 Available |
|
Thu, Oct 25 |
Day 15 | ● L09: Model Eval. 2 - Conf. Intervals | [L09 Model Eval 2 - Slides] [L09 Model Eval 2 - Notes] [L09 Model Eval 2 - Code] |
Project Proposal Due |
Tue, Oct 30 |
Day 16 | ● L09 Cont'd | ||
Thu, Nov 1 |
Day 17 | ● L10: Model Eval. 3 - Cross-Validation | [L10 Model Eval 3 - Notes] [L10 Model Eval 3 - Slides] [L10 Model Eval 3 - Code] |
|
Tue, Nov 6 |
Day 18 | ● L10 Cont'd |
||
Thu, Nov 8 |
Day 19 | ● L11: Model Eval. 4 - Algorithm Selection |
[L11 Model Eval 5 - Notes]
[L11 Model Eval 4 - Slides] [L11 Model Eval 4 - Code] |
Problem Set 2 Due |
Tue, Nov 13 |
Day 20 | ● L11 Cont'd |
||
Thu, Nov 15 |
Day 21 | ● L12: Model Eval. 5 - Perform. Metrics | [L12 Model Eval 5 - Slides] | |
Tue, Nov 20 |
Day 22 | ● L12 Cont'd ● L13: Feature Selection |
[L13 Feat. Select. - Slides]
[L13 Feat. Select. - Code] |
Problem Set 3 Available |
Thu, Nov 22 |
-- | No class: Thanksgiving, university closed | ||
Tue, Nov 27 |
Day 23 | ● L13 Cont'd | ||
Thu, Nov 29 |
Day 24 | ● L14: Feature Extraction | [L14 Feat. Extract. - Slides]
[L14 Feat. Extract. - Code] |
Problem Set 3 Due on Dec 3 |
Part X: Wrapping it Up -- Final Exam and Class Project | ||||
Tue, Dec 04 |
Day 25 | Project Presentations | ||
Thu, Dec 06 |
Day 26 | Project Presentations | ||
Tue, Dec 11 |
Day 27 | Project Presentations | [Project Report Template] | Project Report Due |
Thu, Dec 13 |
-- | No Class: Official UW Study Day | ||
Thu, Dec 20 |
Final Exam 7:45 am - 9:45 am Ingraham Hall 19 |