A Java based approach to elegant data mining

A few years back, i used to work for a technical contact center. It was while working there, i realised that there should be a system in place that could extract (some predefined) keywords from the call logs, count their frequency and map them with relevant technical solution knowledge base articles. This could then be uploaded on the company’s website.
So equipped with this idea, i have developed the following program in Java.
The following is a versatile program in Java that uses Collections API and XML to extract frequently occurring words in a given text file and then writes them out as an XML file format.
Mining big data has always fascinated me above all the programming languages that help facilitate this herculean task. Python is equally a strong contender for data mining tasks. However, please note even though this program performs data mining tasks it does so without using the Java’s predefined API for the same.
If you want to know more about “javax.datamining” API, please refer to the following website The Data Mining Java API.


import java.io.*;
import java.util.*;

/** This program is a prototype for data mining.
 * It reads a text file that is supposed to be the call log of a technical contact center.
 * It then extracts all the unique words and puts them in an ArrayList that cannot include repetitions
 * thus maintaining an unique keyword list. From this list it searches for a particular word specified by
 * user and displays its occurrence.
 * Finally, it writes these unique words in an XML format so as to be used in future as a database.
 * @author Ashish Dutt
 */
public class WordFinder{
 private String path;
 String wordToFind;
 PrintWriter output;
 Map mp;
 File file;
 static List list;
 static int count;
 //initialise the class vars
 WordFinder(){
 mp=new HashMap();
 path="C:\\Users\\Documents\\Java Pract\\carroll-alice.txt";
 file=new File(path);
 list = new ArrayList();
 count=0;
 wordToFind="Alice";

 } //End Constructor

 public void checkFileExists(){

 System.out.println("Is the file exist?: "+file.exists());
 System.out.println("Is Names file readable: "+file.canRead());
 //System.out.println("Length: "+file.length());
 System.out.println("Path: "+file.getAbsolutePath());
 }// End method Check File Exists

 public ArrayList populateList() throws FileNotFoundException
 {
 // Create a file that will read the contents of an existing file and pouplate the Array List with it
 Scanner input=new Scanner(new File(path));
 //System.out.println("Contents of the file:\n"+f.list());
 while(input.hasNext()){
 String word=input.next();
 list.add(word);
 }
 return (ArrayList) list;
 } // end method Array List

 public void checkWordInFile()throws FileNotFoundException{
 int eof; // End of File variable of type int
 Scanner console = new Scanner(new File(path));
 output=new PrintWriter("C:\\Users\\Documents\\Java Pract\\aliceTest.xml");
 System.out.print(list+" ");

 eof=list.size();
 System.out.println("\nList size: "+eof); // This will print the list size
 try{
 while(console.hasNext() && eof!=0)
 {
 //System.out.print("Reading the contents of list of size: "+eof);
 if(console.next().equals(wordToFind))
 {
 count=findMatchedString();
 output.write(count);
 //writeMatchWordToFile();
 }
 --eof; //Decrement the counter end of file eof
 }
 writeMatchWordToFile();
 //System.out.println("Matching Words Count is "+count);
 }catch(OutOfMemoryError omg){
 System.err.print("Out of Memory Error "+omg+"\n");
 }
 System.out.println("The word "+wordToFind+" occurs "+count+" time's.");
 console.close();
 output.close();
 } //end method checkWordInFile

 public int findMatchedString(){
 // Using an enhanced for loop to iterate through the List Collection
 for(Object value:list){
 Integer freq=mp.get(value);
 mp.put(value, (freq == null) ? 1 : freq + 1);
 }// End For
 count++;
 return count;
 }// End method

 public void writeMatchWordToFile()
 {
 String xmlStr;
 xmlStr=FrequentlyOccuringWords();
 output.write(xmlStr);
 //output.println(mp);

 } // End method
 public static String FrequentlyOccuringWords()
 {
 String [] dump=list.toArray(new String[]{});
 Maphashmap = new HashMap(dump.length);
 for(int i=0;i\n");
 Setkeys = hashmap.keySet();
 Iterator it = keys.iterator();

 while(it.hasNext())
 {
 String key = it.next();
 int value = hashmap.get(key);
 xmlString.append("<").append(key).append(">");
 xmlString.append(value);
 xmlString.append("</").append(key).append(">");
 }
 xmlString.append("\n
"); //System.out.println(xmlString); return xmlString.toString(); // StringBuffer being converted to String when returned } // End method public static void main(String[] args) { WordFinder findWord=new WordFinder(); findWord.checkFileExists(); try{ findWord.populateList(); findWord.checkWordInFile(); }catch(FileNotFoundException ex){ ex.printStackTrace(); } // end catch } // end Main method }//

I hope to further refine this program especially the xml file part but i believe for now this should suffice.

Do let me know what do you think of it.

Advertisements