Data preparation for Unsupervised Feature Ranking

Instances are defined by a set of feature values. Each instance is presented in one row of the data file.

Feature values may be nominal and numerical. Examples of valid nominal values are: A, val, and ha_12. Examples of valid numerical values are: 7, -3.1, and -3333.22.

Feature values may be unknown and they must be explicitly stated by some string whose first character is '?'. Both numerical and nominal attributes may include unknown values.

All instances must have the same number of features as defined by the number of values in the first row of the data file. It means that a formally correct data file will contain N rows with A values, where N is the number of instances and A is the number of features. For this server maximal value for N is 25000 and maximal value for A is 10000.

Feature values must be separated by delimiters. Valid delimiters are comma, semi-colon, and one or more spaces. These characters (',', ';', space, and TAB) may not be used within feature values. If, for example, an input value consists of two strings separated by a space then the server will interpret this as two nominal values and the row will have more values than expected. In this situation the server immediately stops with data processing and reports an error. Such situation represents a most often cause of problems with this server.

We have prepared a set of illustrative data files which may be used to see how correct input data files should look like. The instances are 155 countries. In the first file are 106 export characteristics of the countries for year 2012 while in the second layer are their 105 socio-economic characteristics for the same year. You may download these files to your computer and then send them to the server in order to test its functionality.

© 2016 LIS - Rudjer Boskovic Institute
Last modified: January 11 2016 15:51:37.