- Course type
- Paid course
- All Levels
- 22 hours
- 136 lessons
- Available on completion
- Course author
- Geoffrey Hubona, Ph.D.
- Understand how to implement and evaluate a variety of predictive data mining models in three different domains, each described as extended case studies: (1) harmful plant growth; (2) fraudulent transaction detection; and (3) stock market index changes.
- Perform sophisticated data mining analyses using the "Data Mining with R" (DMwR) package and R software.
- Have a greatly expanded understanding of the use of R software as a comprehensive data mining tool and platform.
- Understand how to implement and evaluate supervised, semi-supervised, and unsupervised learning algorithms.
Case Studies in Data Mining was originally taught as three separate online data mining courses. We examine three case studies which together present a broad-based tour of the basic and extended tasks of data mining in three different domains: (1) predicting algae blooms; (2) detecting fraudulent sales transactions; and (3) predicting stock market returns. The cumulative "hands-on" 3-course fifteen sessions showcase the use of Luis Torgo's amazingly useful "Data Mining with R" (DMwR) package and R software. Everything that you see on-screen is included with the course: all of the R scripts; all of the data files and R objects used and/or referenced; as well as all of the R packages' documentation. You can be new to R software and/or to data mining and be successful in completing the course. The first case study, Predicting Algae Blooms, provides instruction regarding the many useful, unique data mining functions contained in the R software 'DMwR' package. For the algae blooms prediction case, we specifically look at the tasks of data pre-processing, exploratory data analysis, and predictive model construction. For individuals completely new to R, the first two sessions of the algae blooms case (almost 4 hours of video and materials) provide an accelerated introduction to the use of R and RStudio and to basic techniques for inputting and outputting data and text. Detecting Fraudulent Transactions is the second extended data mining case study that showcases the DMwR (Data Mining with R) package. The case is specific but may be generalized to a common business problem: How does one sift through mountains of data (401,124 records, in this case) and identify suspicious data entries, or "outliers"? The case problem is very unstructured, and walks through a wide variety of approaches and techniques in the attempt to discriminate the "normal", or "ok" transactions, from the abnormal, suspicious, or "fraudulent" transactions. This case presents a large number of alternative modeling approaches, some of which are appropriate for supervised, some for unsupervised, and some for semi-supervised data scenarios. The third extended case, Predicting Stock Market Returns is a data mining case study addressing the domain of automatic stock trading systems. These four sessions address the tasks of building an automated stock trading system based on prediction models that utilize daily stock quote data. The goal is to predict future returns for the S&P 500 market index. The resulting predictions are used together with a trading strategy to make decisions about generating market buy and sell orders. The case examines prediction problems that stem from the time ordering among data observations, that is, from the use of time series data. It also exemplifies the difficulties involved in translating model predictions into decisions and actions in the context of 'real-world' business applications.