Machine Learning Techniques for Data Mining (DCS327)



Instructor: Dr. Christof Monz
Teaching Assistant: Sirvan Yahyaei
Time and location: Tuesday 10am-12pm (lecture, CS338), Tuesday 4pm-6pm (tutorial, Queens EB4)

Summary: The steady growth of digital data offers new possibilities for finding and linking parts of information to discover unknown relationship and to classify novel information. These issues are investigated in the research area of Data Mining. Data mining software has been used in numerous businesses and government organizations, including online vendors, news agencies, investment firms, and health care. Data mining techniques are used to support informed decision making, tailor marketing strategies, or detect fraudulent activities. As the vast majority of data mining approaches heavily use machine learning techniques, it is important to understand these approaches in order to develop new data mining techniques or adapt existing techniques to novel problems.

Assessment: 80% final exam, 20% coursework (hand in weekly homework exercises)

Reading material:
  • Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufmann Series in Data Management Systems. 2005.
  • Selected chapters from: Tom M. Mitchell. Machine Learning. McGraw-Hill. 1997
Optional Background Reading to Refresh your Math:
Data Sets Sofware Announcements: The tutorials have been moved to Queens EB4 (basement of Queen's building).

Schedule (not completed yet):

Week Topic Reading Slides Homework
1 Introduction to Data Mining and Machine Learning
Learning outcomes: basic understanding of the main concepts in data mining; understanding of potential applications of data mining techniques
(Witten & Frank, chapters 1 & 2) 2 per page ]
4 per page ]
none
2 Fundamentals of Probability Theory; Linear Regression
Learning outcomes: review of basic concepts in probability and information theory; understanding of numerical prediction with linear regression methods
- 2 per page ]
4 per page ]
Homework (due in week 3)
3 Decision Tree Learning
Learning outcomes: construction algorithms for decision trees; learning bias; dealing with missing values; data overfitting and pruning; learning with numeric attributes and classes
(Witten & Frank, chapters 3.1, 4.3, 6.1 & 6.5) and (Michtell, chapter 3) 2 per page ]
4 per page ]
Homework (due in week 4)
4 k-Nearest Neighbor and Naive Bayes Classification
Learning outcomes: lazy learning approaches; similarity between instances; classification with probabilities
(Witten & Frank, chapters 3.8) and (Michtell, chapter 8.2) 2 per page ]
4 per page ]
Homework (due in week 5)
5 Neural Networks
Learning outcomes: perceptrons; gradient descent algoeithm; multi-layer neural networks; backpropagation algorithm
(Michtell, chapter 4) 2 per page ]
4 per page ]
Homework (due in week 6)
6 Neural Networks (continuation) see previous week see previous week see previous week
7 K-Means and Hierarchical Clustering
Learning outcomes: understand the difference between clustering and classification; understand the algorithmic details of K-Means and its extensions; understand the underlying ideas of agglomerative hierarchical clustering
- 2 per page ]
4 per page ]
Homework (due in week 8)
8 Hidden Markov Models (Part I)
Learning outcomes: understand the relationship between HMM and Bayes theorem, understand the forward algorithm, understand the Viterbi algorithm, understand Baum-Welch paramter estimation
Rabiner's Tuorial (Section I-III) 2 per page ]
4 per page ]
9 Hidden Markov Models (Part II) see previous week see previous week Homework (due in week 10)
10 Association Analysis
Learning outcomes: understand the motivation and use of association analysis; understand how support and confidence are used for association analysis; understand the apriori algorithm; understand support and confidence based pruning strategies
- 2 per page ]
4 per page ]
Homework (due in week 11)
11 Support Vector Machines
Learning outcomes: understand the advantages of SVM classification over perceptrons; understand constrained optimization and the use of Lagrange multipliers; understand the Dual reformulation
- 2 per page ]
4 per page ]
no homework
12 Revision Class


Contact: For inquiries please contact Christof Monz () or Sirvan Yahyaei (sirvan@dcs.qmul.ac.uk). Note: Please send all your email from your DCS account as emails from Yahoo, Gmail, Hotmail, etc. are likely to get stuck in the departmental spam filter.

Christof Monz's office hours are Monday 3pm-4pm and Wednesday 11am-12pm or by appointment.