Working with Weka- Data Preprocessing- Error: “Array Index Out of Bounds Exception”

Problem

Once the data was properly indexed into SSMS (SQL Server Management Studio) the next step was to load a subset of it into Weka. As mentioned in my previous post on loading data in Weka, I first had to write a query to extract the relevant data from SSMS then convert it to .arff format. My CSV (comma separated value file had 8,777,75 rows of data in it. So when I tried to open this csv file in the Arff viewer of Weka, I got an error message, “Array Index Out Of Bounds Exception”. First I thought it was because of the huge size of the data. Unable to find a solution to it, I sought for help at Weka List

SOS Help

Try Weka List if you are working with Weka and have a problem. I must say, this list is managed by some very helpful folks who go to the length of explaining to you the solution they propose.

Solution

Apparently Weka treats a nominal value as a single entity. So if i code an attribute as a string datatype and give it a value with space in between then Weka will throw an IOException. The same happened for the data value TUMUYON KHULLEN. I had coded the village name as nominal with string datatype but given it values with spaces in between the words. So i removed the space between the words, And this I did for all the strings in the dataset. Weka then accepted the values.

Problem solved.
Advertisements