DMS Home

DM Methodology

Problem Understanding

Determining objective

First important step in the whole data mining process is understanding the need to do data mining, i.e. understanding the problem we have to solve. This is the objective of the data mining effort. Problems can be diverse: optimizing response of customers to some marketing campaign, prevent fraudolent usage of credit cards, detection of hostile logging on computer systems, etc. To be capable to solve the problem efficiently we also have to:

Define success criteria

Once the problem is defined, it is advisable to define the success criteria: what makes our data mining succesful. Criteria can be objective (quantitative): for example improved number of detected deviations, improved response rate of customers to some marketing campaign, percentage of correct patient diagnoses. Criteria can be of subjective, or qualitative nature. In that case domain expert assesses the results of the data mining effort with respect to existing background knowledge about the problem. In such a case results must contain some new and useful insight into the relationships of domain variables.

Assess situation

Once the problem and criteria for its successfull solution are well defined, we have to assess all important aspects surrounding the problem:

Determine data mining goals

We have determined what is the problem and criteria for its successfull solution. We have to "translate" project goals into data mining terms. Data Mining goals differ from overall problem solving goals, as illustrated below:

problem solving goal data mining goal
increase sales determine customer properties with respect to their purchasing power
prevent credit card fraud find critical patterns for fraudolent card usage
build an accurate algorithm for automatic fraud detection

Definition of the problem and based on it its data mining goal is directly related to a basic division of data mining problem types (which are more thoroughly discussed in Modelling section):

Outputs of the data mining process differ depending on the techniques used, so once the problem type(s) are defined it is good to describe intended data mining outputs of the project. Succes criteria in data mining terminology should also be specified: we can request certain level of predictive accuracy (classification and prediction problems), propensity or lift, or try to define specific criteria of a domain expert in case we want a new insight into a problem solution.

Produce a project plan

Finally, we can make a plan. We have to set major steps to be performed, with deliverables defined at each step. We can also plan what techniques will be used at each stage.

© 2001 LIS - Rudjer Boskovic Institute
Last modified: September 09 2015 14:17:42.