Location: Enzi STEM 315
Meeting Time: M/W/F 8:00am - 8:50am
Office Hours: M 9:00-10:30 & T 9:30-11:00 By Appointment (bit.ly/dmb-ohs)
Instructor: Dr. Mike Borowczak
Office: Engineering 4071B
Email: mborowcz@uwyo.edu
Website: cs.uwyo.edu/~mborowcz/cosc-4570
This course explores data mining theories, tools and real-world applications. This course will consist of traditional lectures, flipped classroom activities, research surveys, mini-projects (homework), a final culminating project and exam.
Students will be able to:
This data mining course uses the 2nd edition of Mining of Massive Datasets (MMDS) by J. Leskovec, A. Rajaraman and J. Ullman, which is available for free at http://www.mmds.org/. Print versions are also available. In addition to MMDS, the following texts are recommended supplemental resources: Computer Science Theory for the Information Age (CSFTIA) by J. Hopcroft and R. Kannan and The Elements of Statistical Learning (TESL) by T. Hastie, R. Tibshirani and J. Friedman. You are be expected to complete the assigned reading prior to class.
Data Science is a constantly evolving field, we’ll also use current and seminal papers, forum posts, documents and other work to ground our discussion - you’ll be expected to complete the assigned reading prior to class, otherwise our discussions will be rather one-sided. If I make a mistake, or if you have a question - ask - let’s get on the same page. I won’t have all the answers to all of your questions - in those scenarios - you can either 1) wait for me to find the answer or 2) find the answer and build up our community of knowledge. We’ll use Piazza for collaboration and discussions on class topics, homework, and projects. Our Piazza course site is: https://piazza.com/uwyo/spring2017/cosc4570_5010.
COSC 4570 homework requires the use of a computer, preferably your own, with a virtual machine player e.g VM Ware player (Windows/Mac) or KVM (Linux). The CS computer labs should have the needed virtual machine software, but it may be impractical to download/save VM images to those accounts - consider investing in a larger USB external drive to store your VM images. There is a chance, depending on funding, that we may use an external service (Qubole) to spin-up and host data science machines using Amazon Web-Services - if so, you’ll use your @UWYO email account for access.
Your grade is computed as a direct unweighted sum of the all the in-class participation, homework, mini-projects, final project, final presentation, and exam scores. The following point boundaries are used to determine final grades.
Points | Letter Grade |
>899 | A |
800-899 | B |
700-799 | C |
600-699 | D |
<600 | F |
If necessary, all or any results can be curved. The curve can only ever be upwards (i.e., only ever in your favor). Average numerical grades are rounded to the nearest whole number (that is, 799.5 becomes 800 and a B, 799.4 becomes 799 and a B). I may relax these grade boundaries but only ever in you favor (i.e., it might be possible that the A grade boundary ends up being 880 instead of 900...).
A summary of your grades will be posted on UW’s WyoCourses site. Please review your scores and report any discrepancies to me.
Late work is only accepted for credit 24 hours after the assignment due date . The student receive a
maximum of 75% of the earned points for late work submitted within 24 hours of the due
date. E.g. if an assignment is worth 25 points, is submitted 22 hours after the due date, and
would have received 20 points if submitted on time, the late-score would be computed as
×
=
Late work that is submitted after the due date and prior to exam will remain ungraded until the end of the semester. At the end of the semester - the late work will only be graded, at the sole discretion of the instructor, if it affects the pass/failure of the course. The maximum course grade you can receive in this scenario is a C.
No separate extra credit assignments will not be offered or made available. Rather, individual homework assignments may contain an opportunity to gain extra credit.
It is expected that you attend class regularly, and your grade will be affected positively if you are present in class. As an active and engaged learner, you are expected to attend and arrive punctually to our scheduled classes. engagement throughout the class is critical to your ultimate learning. Your participation and attendance will contribute to 10% of your overall score.
The University of Wyoming is built upon a strong foundation of integrity, respect and trust. All members of the university community have a responsibility to be honest and the right to expect honesty from others. Any form of academic dishonesty is unacceptable to our community and will not be tolerated. Teachers and students should report suspected violations of standards of academic honesty to the instructor, department head, or dean.
Any and all suspicions of academic dishonesty shall be investigated in accordance with UW Regulation 6-802 (www.uwyo.edu/generalcounsel/_files/docs/uw-reg-6-802.pdf). Evidence of academic dishonesty will result in one or more of the recommended sanction, in accordance with UW Regulation 6-802 6.A.
”There are several misconceptions about intellectual diversity and academic freedom... ...the narrower
concept of academic freedom does not mean the freedom to say anything that one wants. For example,
freedom of speech does not mean that one can say something that causes physical danger to others. In a
learning context, one must both respect those who disagree with one and also maintain an atmosphere of
civility. Anything less creates a hostile environment that limits intellectual diversity and, therefore, the
quality of learning.”
Association of American Colleges and Universities
Board of Directors Statement on Academic Freedom and Responsibility 12/21/05
If you have a physical, learning, sensory or psychological disability and require accommodations, please let me know as soon as possible. You will need to register with, and possibly provide documentation of your disability to University Disability Support Services (UDSS) in SEO, room 109 Knight Hall. You may also contact UDSS at (307) 766-6189 or udss@uwyo.edu. Visit their website for more information: www.uwyo.edu/udss.
You are expected to treat all members of the class and your instructor with respect. Plan to attend class, take an active part in discussion or teamwork, and complete all readings and assignments by the deadlines listed in the syllabus.
I will follow a professional code of behavior and responsibility. I will treat all members of the class with respect. I will attend class and take an active part in your learning. In each class I will ask: 1) What do I want you - my students - to learn? 2) How will you learn it? 3) What do I want you to do with the information? and 4) How will I assess your learning?
This syllabus is only a guide for the course and is subject to change with advanced notice.1
39 scheduled meetings, two weeks with no scheduled meetings in March, one for project work, another for spring break. The course breaks down into roughly six 2-3 week overarching topics including: an overview of data mining and statistics, similarity, clustering, dimension reduction, recommenders, links and graphs, streams and large scale machine learning. There will be 5-6 programming homework assignments, an overarching project with checkpoints throughout the semester, and auto-graded content knowledge assessment.
Monday |
Tuesday |
Wednesday |
Thursday |
Friday |
Saturday |
Jan 23rd 1
Intro
|
24th
|
25th 2
Stats
|
26th
|
27th 3
Uplevel
|
28th
|
30th 4
MapReduce
|
31st
|
Feb 1st 5
MapReduce
|
2nd
|
3rd 6
Similarity:
Jaccard
|
4th
|
6th 7
Similarity:
MinHash
|
7th
|
8th 8
Similarity:
LSH
|
9th
|
10th 9
Similarity:
Distances
|
11th
HW1
|
13th 10
Similarity:
SIFT,
ANN
vs
LHS
|
14th
|
15th 11
Clustering:
Overview
|
16th
|
17th 12
Clustering:
Hierarchical
|
18th
|
20th 13
Clustering:
K-Means
|
21st
Project Proposal Due
|
22nd 14
Clustering:
CURE
|
23rd
|
24th 15
DimRedux:
|
25th
HW2
|
27th 16
DimRedux:
PCA
|
28th
|
Mar 1st 17
DimRedux:
SVD
|
2nd
|
3rd 18
DimRedux:
CUR
|
4th
|
6th
Project Work Week
|
7th
Data Collection Due
|
8th
Project Work Week
|
9th
|
10th
Project Work Week
|
11th
HW3
|
13th
Spring Break
|
14th
|
15th
Spring Break
|
16th
|
17th
Spring Break
|
18th
|
20th 19
Recommender:
|
21st
|
22nd 20
Recommender:
|
23rd
|
24th 21
Recommender:
|
25th
HW4
|
27th 22
Recommender:
|
28th
Intermediate Report Due
|
29th 23
Links:
|
30th
|
31st 24
Links:
|
Apr 1st
|
3rd 25
Links:
|
4th
|
5th 26
Massive
Graphs:
|
6th
|
7th 27
Massive
Graphs:
|
8th
HW5
|
10th 28
Massive
Graphs:
|
11th
|
12th 29
Massive
Graphs:
|
13th
|
14th 30
Massive
Graphs:
|
15th
|
17th 31
Streams:
|
18th
|
19th 32
Streams:
|
20th
|
21st 33
Streams:
|
22nd
HW6
|
24th 34
Streams:
|
25th
Report Due
|
26th 35
Lg.
Scale
ML
|
27th
|
28th 36
Lg.
Scale
ML
|
29th
Poster Outline Due
|
May 1st 37
Lg.
Scale
ML
|
2nd
|
3rd 38
Lg.
Scale
ML
|
4th
|
5th 39
Poster & Presentation Due
|
6th
|
8th 40
|
9th
|
10th 41
|
11th
|
12th 42
Final
Exam
|
13th
|
|
|||||
|
|||||
|
Each assignment will include a specific grading rubric. Generally, you will be expected to turn in:
The preference for code submissions is a link to a public git/cvs/svn repository. Alternately, provide a zip file containing all code and dependencies (with a MAKEFILE if needed). Homework is due no later then 2PM (Mountain) on the given due date (generally Saturday).
This assignment will be available no later than January 27th and will be due on February 11th.
This assignment will be available no later than February 10th and will be due on February 25th.
This assignment will be available no later than February 24th and will be due on March 11th.
This assignment will be available no later than March 3rd and will be due on March 25th.
This assignment will be available no later than March 21st and will be due on April 8th.
This assignment will be available no later than April 7th and will be due on April 22nd.
Objective: Data mine real world data (sets), to provide something new to the community.
This course will provide you with an overview of Data Mining fundamentals, but in order to truly understand the nuances and complexity of Data Mining, you have to work on real data, solving real problems. This project enables you to have a real-world experience that you bring to an interview, your own research, or some personal project. As with any real-world endeavor, you must be able to effectively communicate your work to your peers (experts and non-experts alike).
You will work in teams of 2 (teams of n=1 are highly discouraged barring special permission/requirements that should be brought to my attention as soon as possible).
All project components, except for the poster presentation, are due no later then 11:59PM (Mountain) on the given due date (generally a Tuesday. except for the poster components). The poster presentation will be held during our final day of class. In the event of a weather calamity day, the exam period will be split to accommodate the poster presentations. Project guidelines and scoring rubric will be provided no later than February 4th.