# #stat432

This web page will serve as the syllabus for the course. Please read it carefully. You should become familiar with these policies. To do so, you will likely need to return to the syllabus several times throughout the semester. After the start of the semester, this document may continue to be updated. Any such changes will be announced.

# Course Name and Number

• Main: STAT 432 - Basics of Statistical Learning
• Cross-list: ASRM 451 - Basics of Statistical Learning

For simplicity, the course staff will exclusively refer to the course as STAT 432.

# Location and Time

This Spring 2020 version of the course is online.

• Location: Wherever you are!
• Note: Exams and office hours will take place on-campus.
• Time: Whenever you’d like!

# Course Staff

Please refer to the course staff by their given names. For example, your instructor is named David. If you refer to the staff as “Professor” or “TA,” we might refer to you as “student,” which seems odd.

# Course Content

## Course Description

STAT 432 provides a broad overview of machine learning, through the eyes of a statistician. As a first course in machine learning, core ideas are stressed, and specific details are de-emphasized. After completing the course, students should be able to train and evaluate statistical models. While we will not discuss an exhaust list of methods, given the framework developed throughout the course, students should feel comfortable exploring new methods and models on their own. Previous experience with R programming is necessary for success in the course as students will be tested on their ability to use the methods discussed through the use of a statistical computing environment.

## Topics

Tentative subjects include:

• Basics: Supervised and Unsupervised Learning, Parametric vs Non-Parametric Methods, Bias-Variance Trade-Off, Cross-Validation, Model Selection and Evaluation
• Regression: Linear Regression, Trees, KNN, Penalized Regression
• Classification: Logistic Regression, Trees, KNN, LDA, QDA, Naive Bayes
• Modern Methods: Regularization (Ridge, Lasso, Elastic Net), Ensemble Learning (Bagging, Boosting, Random Forests)
• Unsupervised: PCA, K-Means Clustering, Hierarchical Clustering, Mixture Models, EM Algorithm

## Learning Objectives

After this course, students are expected to be able to …

• identify supervised (regression and classification) and unsupervised (clustering) learning problems.
• understand some fundamental theory behind statistical learning methods.
• implement learning methods using a statistical computing environment.
• formulate practical, real-world, problems as statistical learning problems.
• evaluate effectiveness of learning methods when used as a tool for data analysis.

Note: These objectives are similar to the objectives for the Society of Actuaries Exam PA: Predictive Analytics. (See details in their linked syllabus.) While STAT 432 was not specifically designed to prepare students the the SOA Exam PA, the coverage may be sufficient to sit for the exam, although some additional exam-specific study may be required.

## Textbooks

The main text for the course will be BSL. Within BSL, readings from ISL will be assigned. If BSL and ISL provide conflicting guidance, we will defer to BSL in this course. When reading ISL, you do not need to read the sections decided to R. We will follow the R conventions only from BSL.

## Prerequisites

A course which covers linear regression that uses R, such as STAT 420 or STAT 425. Basic knowledge of probability and linear algebra is also assumed. A working knowledge of the material from the following three texts would also be sufficient.

# Course Communication

We will use several forms of communication for this course. The website will be the one-stop-shop for all course information. Canvas will be used to send announcements which will also be sent via email.

If you would like to communicate with the course staff, our preferred methods of communication, in order, are:

1. Office Hours
2. Piazza
3. Email

## Office Hours

• Main Office Hours
• CA Quiz Help Office Hours

The main office hours will be all hands on deck. It is a big block of time, in a very nice room. We hope that these are well utilized. The course instructor will staff these office hours for the entire three hours, while additional course staff will be in-and-out throughout that time.

The CA Quiz Help office hours are designed for getting help with the quizzes. The course instructor will likely not be present during this time, so administrative issue are less likely to be resolved during this time.

This course will use the Queue system for office hours.

Office hours are by far our preferred forum for discussing individual specific questions. In office hours, our response time will be literally instant. Also, since we are both present in the same physical location, follow-up is both expected, and easy. Using electronic forums of communication such as Piazza or email will have a slow response rate and a much lower communication bandwidth. In other words, please come to office hours!

If you would like to schedule a private meeting outside of regular office hours, please send an email suggesting two possible times, on two different days. (A total of four suggested times.) We have a preference for time-slots directly adjacent to current office hours. Please also indicate a brief agenda for the meeting. Requests to schedule a meeting at a time less than 24 hours in the future are unlikely to be granted.

## Piazza

This course will use Piazza for some course communications.

The course staff will attempt to check Piazza at least once a day, thus you can often expect a response within 24 hours, except for weekends. If you need a quicker response, you should consider office hours as an alternative.

The course staff would strongly prefer the use of Piazza to GroupMe. The course staff feels that a Piazza may exclude members of the course, whereas all are welcome on Piazza.

While anonymous posting is allowed, the course staff reserves the right to revoke this privilege if it is abused. Often, it is helpful to post using your name simply to make it easy to follow the conversation.

Private posts have been disabled. Any private matters should be discussed over email where your identity is known and private.

Additional Piazza policy can be found in a pinned post on Piazza.

## Email Policy

Due to the large size of this course, we follow a strict email policy. Instead of email, consider Piazza! Quick, non-private communication should take place there.

If you’d like to email the instructor or course staff, consider the following:

• Is your question about part of an assignment? First and foremost: You should ask it in office hours. After that, consider Piazza. As a last resort, use email.

If you choose to send an email, you must follow the following three rules. If you do not, your email will be considered less import than other emails which follow the rules and response time will be slower.

• All email must originate from an @illinois.edu email address or appear as sent on behalf or an @illinois.edu address.
• Depending on the situation, failure to follow this rule may make a response impossible.
• Your subject line must begin with exactly the following: [STAT 432]
• While ASMR 451 is a valid cross-listed course, please use STAT 432 for all communication purposes.
• After the above, put a single space, followed by a useful but short description of your message.
# good
[STAT 432] Grade feedback question
# bad
# improper format
# non-descriptive subject
[stat432] hi 
# bad
# improper format
[STAT432] Grade feedback question 
# bad
# improper format
# subject too long
# information found in syllabus or website
[STAT 432]when is the first CBTF exam and what is covered on the exam?

If your email is sent between 9:00 AM Monday and 11:59 PM Thursday, and you follow the above directions, we will try our best to respond within 24 hours. Questions about an assessment sent the same day the assessment is due will likely not receive a response before the assessment is due. Plan accordingly.

## Code Discussion

If your question is technical in nature, there are several steps you can take to insure a speedy response on Piazza or in email.

First and foremost, you should ask Google before you ask the course staff. Take the error message you obtained and search it with Google. The ability to solve problems this way is an extremely value skill, possibly one of the most important you should learn (but are not taught) during your academic career. Make a legitimate effort to solve the problem on your own. You won’t always be able to, and if you can’t, send an email. (Or better yet, stop by office hours.)

If you need to ask the course staff, include the following in your Piazza post or email:

• The offending line of code, as well as a few previous lines for context, preferably enough to re-create the error.
• The easier you make it to recreate the error, the more likely course staff is to help, and find a solution. If we can’t recreate the error, we can’t fix it!
• The exact error message received.
• As a last resort, attach the file containing the code, zipped with any external files needed to run the code. However, if this is the level of assistance that you need, office hours would be much better.
• The course staff is happy to help, but we are not debugging machines.

Do not use screenshots of code and error messages to communicate about them. Copy paste them so that others can copy-paste them as well.

## Course Staff Emails

Role Name Email
Instructor David Dalpiaz
Teaching Assistant Yinyin Chen
Teaching Assistant Zihe Liu
Course Aid Norman Dewantoro
Course Aid Jingyu Li
Course Aid Jonathan Lu

# Assessments

With the exception of exams, all course assignments are due at 11:59 PM, Central time, on the listed due date. An overview of assignment deadlines can be found at the bottom of the course website’s homepage.

## PrairieLearn Quizzes

Throughout the semester, quizzes will be administered through the PrairieLearn system. (10 for undergraduates, 12 for graduate students.) These will be low-stakes, unlimited attempt quizzes. These quizzes will serve as practice for exams. No quizzes will be dropped. Instead, there will be opportunity to earn buffer points with each quiz. Buffer points will allow you to obtain over 100% for a particular assignment, but your percentage on quizzes overall cannot exceed 100%.)

The buffer point and late submission details can be seen in the details of each quiz on PrairieLearn. As an example, consider Quiz 01:

• Released By: Monday, January 20
• 110% Credit: Friday, January 24, 11:59 PM
• 100% Credit: Friday, January 31, 11:59 PM
• 80% Credit: Friday, February 7, 11:59 PM

To obtain the 110% credit, you must achieve a score of 100% before the “due” date for 110% credit. (The “due” date for 110% credit will be the listed due date on the course website.)

### PrairieLearn

Quizzes and exams will both use the PrairieLearn system. Use the link below to sign-up and add STAT 432.

## CBTF Exams

Exams in STAT 432 will be scheduled and administered through the Computer-Based Testing Facility. (This is a facility on the University of Illinois at Urbana-Champaign campus. While the course is online, you are still required to sit for exams in this facility.) Dates of these exams can be found on the course website and the CBTF website.

### Computer-Based Testing Facility

STAT 432 uses the College of Engineering Computer-Based Testing Facility (CBTF) for its quizzes and exams.

The policies of the CBTF are the policies of this course, and academic integrity infractions related to the CBTF are infractions in this course.

If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please take your Letter of Accommodation (LOA) to the CBTF proctors in person before you make your first exam reservation. The proctors will advise you as to whether the CBTF provides your accommodations or whether you will need to make other arrangements with your instructor.

Any problem with testing in the CBTF must be reported to CBTF staff at the time the problem occurs. If you do not inform a proctor of a problem during the test then you forfeit all rights to redress.

## Data Analyses

There will be three data analyses (DA) throughout the semester. Each data analysis will consistent of four assignments which together total 10 points:

• A written report.
• [5] Quiz
• Objective verification of some numeric results.
• [2] Code
• A review of code style and structure.
• [2] Reflection
• A self-review of your performance.

Specific policies and directions will be released with each analysis. The above points add to 11, meaning that there is one buffer point possible with each analysis.

# Course Technology

## Statistical Computing

R and RStudio are required software for this course. You will need access to a computer where you have the ability to install and update this software.

• R is a freely available language and environment for statistical computing and graphics.
• RStudio is a free and open-source integrated development environment (IDE) for R.

It is your responsibility to make sure you are using the most recent version of both R and RStudio. Failure to use the most recent version of R will result in an inability to complete the quizzes.

## Learning Management

Canvas will be used to distribute grades and for assignment submissions.

## Assessment Weights

Assessment Percentage
Quizzes 50
CBTF Exam 01 16
CBTF Exam 02 16
Data Analysis 01 6
Data Analysis 02 6
Data Analysis 03 6

The quiz sub-score will be the average of the 10 quizzes for undergraduates. (It will be the average of 12 quizzes for graduate students.) If your quiz sub-scores is above 100 as a result of buffer points, it will be recorded as 100.

A B C D
Plus TBD 87 77 67
Neutral 93 83 73 63
Minus 90 80 70 60

The instructor reserves the right to lower, but not raise, grade cutoffs. (However, this policy should not create an expectation that this will happen. Asking for a change in cutoffs will make any change in cutoffs less likely.) The grade of A+ will be reserved for the top five students in the course.

Grading in the course is not competitive. (With the exception of A+, which has no effect on GPA.) There is nothing (other than some statistical realities) that would prevent the entire class from receiving a grade of A.

All grade disputes must be discussed with the course instructor. Teaching Assistants and Course Aids do not have authority to modify grades.

The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in this course. Any violations will be dealt with in a swift, fair, and strict manner. In short, do not cheat, it is not worth the risk. You are more likely to get caught than you believe. If you think you may be operating in a grey area, you most likely are.

Policies about specific assessment types will be released with directions for those assessments. Two heuristics to keep in mind:

• Do not share files with other students. Do not copy-paste from any source other than the course notes and website.
• Use spoken language to exchange ideas, not code.

Under no circumstances should course materials be provided to Course Hero or any similar for-profit website. The course staff will seek the harshest possible academic integrity penalty for any students who do so.

## Disability Accommodations

To obtain disability-related academic adjustments or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 217-333-4603, e-mail disability@illinois.edu or go to the DRES website.

To ensure appropriate accommodation is provided in a timely manner, please provide your Letter of Accommodation during the first week of class. Letters received after a relevant assessment has been administered will likely cause logistical issues that could result in an inability to accommodate.

Please also be aware of the note above about accommodations in the CBTF.

## Diversity Statement

The University of Illinois is committed to equal opportunity for all persons, regardless of race, ethnicity, religion, sex, gender identity or expression, creed, age, ancestry, national origin, handicap, sexual orientation, political affiliation, marital status, developmental disability, or arrest or conviction record. We value diversity in all of its definitions, including who we are, how we think, and what we do. We cultivate an accessible, inclusive, and equitable culture where everyone can pursue their passions and reach their potential in an intellectually stimulating and respectful environment. We will continue to create an inclusive campus culture where different perspectives are respected and individuals feel valued.

## The Extended Syllabus

For some thoughts on teaching philosophy, some explanation of policies, and some general tips for success, please see The Extended Syllabus.

## Changes

The instructor reserves the right to make any changes he considers academically advisable. Such changes, if any, will be announced. Please note that it is your responsibility to keep track of the course proceedings.