- understand problem perspective, competing objectives and constraints
- uncover important factors influencing the outcome
Assess situationOnce the problem and criteria for its successfull solution are well defined, we have to assess all important aspects surrounding the problem:
- what is the expertise or background knowledge we have about the problem - do we understand problem terminology enough;
- data is the central item in a data mining problem - we have to be aware of its potential to be able to solve the problem;
- it is good to define specific terminology for the problem (problem domain terminology and related data mining terminology), in order to improve communication between domain experts and data mining experts;
- we must estimate the potential cost (duration) and benefits of the data mining project to be sure that it is feasible.
|problem solving goal||data mining goal|
|increase sales||determine customer properties with respect to their purchasing power||prevent credit card fraud||
find critical patterns for fraudolent card usage
build an accurate algorithm for automatic fraud detection
Definition of the problem and based on it its data mining goal is directly related to a basic division of data mining problem types (which are more thoroughly discussed in Modelling section):
- data description and summarization
- association discovery
- dependency analysis
Outputs of the data mining process differ depending on the techniques used, so once the problem type(s) are defined it is good to describe intended data mining outputs of the project. Succes criteria in data mining terminology should also be specified: we can request certain level of predictive accuracy (classification and prediction problems), propensity or lift, or try to define specific criteria of a domain expert in case we want a new insight into a problem solution.
© 2001 LIS - Rudjer Boskovic Institute
Last modified: September 09 2015 14:17:42.