PySpark in PyCharm on a remote server

Use Case: I want to use my laptop (using Win 7 Professional) to connect to the CentOS 6.4 master server using PyCharm.

Objective: To write the code in Pycharm on the laptop and then send the job to the server which will do the processing and should then return the result back to the laptop or to any other visualizing API.

My solution was to get the PyCharm Professional (you can download it as a 30 day evaluation version) edition which let me configure it. In the PyCharm environment, press the key combination “ctrl+alt+s” which will open up the settings window. From there click on the + sign next to Project: [your project name] in my case project name is Remote_Server as shown

pyspark-config-1 

My solution was to get the PyCharm Professional (you can download it as a 30 day evaluation version) edition which let me configure it. In the PyCharm environment, press the key combination “ctrl+alt+s” which will open up the settings window. From there click on the + sign next to Project: [your project name] in my case project name is Remote_Server as shown

1-pyspark configuration

 

 

 

 

 

 

 

Now click on Ok and write a sample program to test the connectivity. A sample program is given as


SPARK_HOME="spark://IP_ADDRESS_OF_YOUR_SERVER:7077"
try:
from pyspark import SparkContext
from pyspark import SparkConf
print ("Pyspark sucess")
except ImportError as e: print ("Error importing Spark Module", e)

try:</pre>
<pre>conf = SparkConf()
conf.setMaster(SPARK_HOME)
conf.setAppName("First_Remote_Spark_Program")
sc = SparkContext(conf=conf)
print ("connection succeeded with Master",conf)
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print(distData)
except:
print ("unable to connect to remote server")

Now, when you run this code you should see the pyspark interpreter as shown

pyspark-config-4pyspark-config-5

Advertisements