Data Mining is often defined as discovering useful but hidden patterns or relationships in a database, which is one of the popualr fields in computer science. Finding patterns, trends, and outliers in these datasets, and summarizing them with simple quantitative models, are one of the grand challenges of the information age—turning data into knowledge.
Data mining programs are intended to search through datum for hidden relationships and patterns in the datasets. This approach is particularly relative to intelligent transportation system. It can be very helpful for traffic researchers and managers to solve traffic problems.
This course provides an introduction to data mining as applied to transportation systems. It intends to cover the basic concepts of data mining as well as specific applications to transportation systems.
The objectives of the course are to present the basic concepts of data mining, the principles and ideas underlying the practice of data mining, including data preprocess, instance based learning, decision tree, support vector machine, outlier mining, and ensemble learning.
After completing this course, students will have the ability to understand the fundamental terms and concepts of data mining, and to use the methods taught in class for the analysis and processing of real transportation data.
Week 1. Introduction to Data Mining
1.1 What is Data Mining?
1.2 Data Mining Functionality
1.3 Data Mining Techniques
1.4 Summary
Courseware
Week 2. Data Pre-processing
2.1 Why Preprocess the Data?
2.2 Data Cleaning
2.3 Data Integration
2.4 Data Reduction
2.5 Data Transformation
2.6 Summary
Courseware
Week 3. Instance based Learning
3.1 Overview of IBL
3.2 Components of KNN
3.3 Variants of kNN
3.4 Summary
Courseware
Week 4. Decision Trees
4.1 Decision Tree Representation
4.2 Construct Decision Tree
4.3 Overfitting and Tree Pruning
4.4 Pros and Cons of DTs
Courseware
Week 5. Support Vector Machine
5.1 Linear SVMs
5.2 Non-linear SVMs
5.3 Multiclass
5.4 Support Vector Regression
5.5 Summary
Courseware
Week 6. Outlier Mining
6.1 Background of Outlier Detection
6.2 Statistic-based Method
6.3 Distance-based Method
6.4 Density-based Method
6.5 Conclusions
Courseware
Week 7. Ensemble learning
7.1 General Idea on Ensemble Methods
7.2 Popular methods for ensemble
7.3 Class-Imbalanced Data
7.4 Summary
Courseware
Knowledge of probability, statistics and linear algebra at the undergraduate level
Basic knowledge of traffic engineering and basic programing skills
Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann, 3rd edition, 2011.
Ian H.Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, San Francisco: Morgan Kaufmann Publishers, 3rd ed. 2011.
Charu C. Aggarwal, Data Mining: The Textbook, Springer, May 2015.
Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson, 1st Edition, 2005.
Christopher M. Bishop, Pattern recognition and machine learning, the Morgan Kaufmann series in information science and statistics, Springer Science, 2006.