How to Prepare an Input Data File ?
N - number of examples .............. max. 250
M - number of input attributes ....... max. 50
W - number of characters in an attribute name or value .... max. 30
Input data file for the rule induction process is an ASCII file with examples. Examples are the only source of information about the domain for the inductive system. The data file should include as much examples as possible (up to 250 for DMS), they should be good representatives of the domain, and they should be as diverse as possible. The examples contained in the input data file are sometimes referred also as learning examples in contrast to test examples used to test the quality of induced rules (knowledge) for prediction purposes.
Every example is represented in the data file by exactly one line. Line delimiters are Carriage Return (ASCII value 13 decimal) or New Line (ASCII value 10 decimal) or both of them. Input file with N examples will have N+1 lines. The first line is used to define attribute names , and remaining N lines present the examples described by attribute values. Every example is described by M+1 attribute values: one for the target attribute and M for each input attribute. The input file actually represents a table with N+1 rows and M+1 columns. The order of attribute values must be the same in all examples and it must correspond to attribute names in the first row. Maximal length of attribute names and attribute values is W characters. Attribute names in the first line and attribute values in lines 2 to N+1 must be separated by a delimiter. The server allows four different delimiter types but they can not be mixed in one data file.
In the data preparation phase the user has to select one (and only one) attribute which will be the object of the modeling process. This attribute is called target attribute. All other attributes are called input attributes. The result of the data mining process is the information about how the target attribute is connected with input attributes. This server can accept only problems in which the target attribute has exactly two classes: the positive and the negative class. Every example must be in one of these classes. The result of the data mining process is one or more models (rules) which describe the positive target class in contrast to the negative target class by properties of the input attributes.
© 2001 LIS - Rudjer Boskovic Institute
Last modified: September 08 2015 09:28:57.