What does it mean noise detection ?
Noise detection option enables you to make explicit search for outliers in your input data file. When this option is selected, a separate noise detection block is started before rule induction. The results of noise detection has no influence on the rule induction process. If you do not need explicit noise detection, do not select this option because the search may be time consuming.
The result of noise detection is a list of up to five examples selected as potential noise (outliers) in the submitted example set. The examples are referenced by the row in the input file. Please note that empty rows in the input file are not counted and that the first example is in the second row of the input file because the first row is reserved for attribute names. Potential noisy examples are listed in order of importance.
A heuristic noise detection algorithm is used in ILLM which neither guarantees that all noisy examples will be detected nor that every detected example is actually noisy. It can easily happen that slight modifications of the input file, change the result of the noise detection process significantly.
In this realization the user can not adjust parameters which influence the search process. The used parameter values are default values determined from the number of examples in the classes. If number of examples in a class is less than 5, then no noise can be detected in this class. For classes with less than 50 examples, used noise sensitivity is about 35% less than for larger classes.
The easiest way to make use of the results obtained by this option is to eliminate detected examples from the training set. For that it is not necessary to delete the examples from the input file. It is sufficient to change the class of these examples into the unknown state by inserting '?' as the first character in their target attribute value. The advantage of this approach is that many different experiments with the same input file can be done. Also, in this way the row numbers of examples will not change and it will be easy to compare results of different experiments.
© 2001 LIS - Rudjer Boskovic Institute
Last modified: September 09 2015 14:17:42.