DATA WAREHOUSING AND DATA MINING

Course Objectives:

The main objective of the course is to

  • Introduce basic concepts and techniques of data warehousing and data mining

  • Examine the types of the data to be mined and apply pre-processing methods on raw data

  • Discover interesting patterns, analyze supervised and unsupervised models and estimate the accuracy of the algorithms.


Course Outcomes:

By the end of the course student will be able to

  • Illustrate the importance of Data Warehousing, Data Mining and its functionalities and Design schema for real time data warehousing applications.

  • Demonstrate on various Data Preprocessing Techniques viz. data cleaning, data integration, data transformation and data reduction and Process raw data to make it suitable for various data mining algorithms.

  • Choose appropriate classification technique to perform classification, model building and evaluation.

  • Make use of association rule mining techniques viz. Apriori and FP Growth algorithms and analyze on frequent itemsets generation.

  • Identify and apply various clustering algorithm (with open source tools), interpret, evaluate and report the result.

UNIT I: Data Warehousing and Online Analytical Processing: Data Warehouse: Basic concepts, Data Warehouse Modelling: Data Cube and OLAP, Data Warehouse Design and Usage, Data Warehouse Implementation, Introduction: Why and What is data mining, What kinds of data need to be mined and patterns can be mined, Which technologies are used, Which kinds of applications are targeted.

UNIT II: Data Pre-processing: An Overview, Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data Discretization.

UNIT III: Classification: Basic Concepts, General Approach to solving a classification problem, Decision Tree Induction: Attribute Selection Measures, Tree Pruning, Scalability and Decision Tree Induction, Visual Mining for Decision Tree Induction.

UNIT IV: Association Analysis: Problem Definition, Frequent Item set Generation, Rule Generation: Confident Based Pruning, Rule Generation in Apriori Algorithm, Compact Representation of frequent item sets, FPGrowth Algorithm.

UNIT V: Cluster Analysis: Overview, Basics and Importance of Cluster Analysis, Clustering techniques, Different Types of Clusters; K-means: The Basic K-means Algorithm, K-means Additional Issues, Bi-secting K Means