Apache Spark | Run first spark program
In this post we are going to run sample spark program. If you want to setup spark locally go to this post. In the last post we created slave node with 2 cores and 2g memory. This worker node will be used by master to run your spark jobs.
Spark installations provides some sample programs to run on cluster. This can be found at below location.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rohan@rohan:/opt/spark/spark-2.3.1-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples$ ll | |
total 64 | |
drwxrwxr-x 6 rohan rohan 4096 Jun 1 22:49 ./ | |
drwxrwxr-x 3 rohan rohan 4096 Jun 1 22:49 ../ | |
-rw-rw-r-- 1 rohan rohan 4506 Jun 1 22:49 JavaHdfsLR.java | |
-rw-rw-r-- 1 rohan rohan 4683 Jun 1 22:49 JavaLogQuery.java | |
-rw-rw-r-- 1 rohan rohan 4303 Jun 1 22:49 JavaPageRank.java | |
-rw-rw-r-- 1 rohan rohan 1943 Jun 1 22:49 JavaSparkPi.java | |
-rw-rw-r-- 1 rohan rohan 2721 Jun 1 22:49 JavaStatusTrackerDemo.java | |
-rw-rw-r-- 1 rohan rohan 3477 Jun 1 22:49 JavaTC.java | |
-rw-rw-r-- 1 rohan rohan 1968 Jun 1 22:49 JavaWordCount.java | |
drwxrwxr-x 2 rohan rohan 4096 Jun 1 22:49 ml/ | |
drwxrwxr-x 2 rohan rohan 4096 Jun 1 22:49 mllib/ | |
drwxrwxr-x 4 rohan rohan 4096 Jun 1 22:49 sql/ | |
drwxrwxr-x 2 rohan rohan 4096 Jun 1 22:49 streaming/ |
We will be running JavaWordCount program locally in spark cluster.
Spark programs can be pushed to cluster using ./bin/spark-submit script. More info on official website.
Place sample word.file under any dir(/opt/spark/spark-poc/words.txt for this example). Now run below command.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rohan@rohan:/opt/spark/spark-2.3.1-bin-hadoop2.7/bin$./spark-submit --class "org.apache.spark.examples.JavaWordCount" --master spark://rohan-Latitude-E5470:7077 /opt/spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar /opt/spark/spark-poc/words.txt |
As you can see, spark-submit takes below arguments.
--class [main-class of spark program]
--master [master node url]
[app-jar]
[app command-line arguments]
Check Spark master UI to see status of spark program.
Thanks for sharing this blog.
ReplyDeleteScala Training Course