A simple program to count the frequency of word’s using Scala in Spark

Continuing from my previous post on how to install Spark on a windows environment the next logical task to do was to execute a simple hello world type program. Therefore in this post I attempt to count the frequency of words in the complete works of William Shakespeare. This is my first attempt to understand how Spark works. So lets get started.

I downloaded the complete works of William Shakespeare from here as a text file format and then saved it as a text file in the data folder of my spark installation (‘D:\spark-1.2.1\data’). The size of the text file is 5.4 MB

I then powered on the spark shell by opening up the command prompt, navigating to the bin directory and executing the command ‘spark-shell’

At the scala prompt I begin by creating three variables filePath, countLove, countBlind as
scala> val filePath =sc.textFile ("D:\\spark\\data\\william.txt")
scala> val countLove= file.filter (line => line.contains ("love"))
scala> val countBlind= file.filter (line => line.contains ("blind"))
Now, I will use the count function to count the occurrence of these two words in the whole text like
scala> countLove.count ()
res0: Long = 2484  #2484 occurrences of the word 'love' found in the text
scala> countBlind.count()
res1: Long = 100   #100 occurrences of the word 'bind' found in the text

spark UI

Advertisements