“Every problem becomes childish when once it is explained to you”

Friday, July 8, 2016

Spark Hello World with Java 8 and Maven





Java8 is a major release for Java in recent times. Support for functional programming makes Java a feature rich programming language with new additions like Lambda expression,new Stream API  etc..Here we can see how the new features of java can be leveraged to write big data applications using the popular spark framework.
             Spark is written in Scala, which is naturally a functional programming language. Eventhough most of the spark libraries can be accessed via its java API, it wasn't really straightforward to write Spark programs with Java7 due to the lack of Functional nature in java7.
Java8 makes it easy to write Spark programs  with its functional features. 

Below program is written in Java8 on Apache Spark. Those who wanted to see the same in Java7 can refer my previousblog.

Eclipse IDE is used to create and run this program.

Follow the below steps to quickly setup a sample project.

  • Create a simple maven project and update pom.xml with below configuration


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.semanticeyes.sparkhelloworld</groupId>
<artifactId>spark-helloworld</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>Spark Helloworld</name>
<packaging>jar</packaging>
<repositories>
<repository>
<id>apache</id>
<url>https://repository.apache.org/content/repositories/releases</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
</dependencies>
<properties>
<java.version>1.8</java.version>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.2</version>
<configuration>
<descriptors>
<descriptor>src/main/assembly/assembly.xml</descriptor>
</descriptors>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>


·Create a HelloWorld Java class

package com.semanticeyes.sparkhelloworld;

import java.util.Arrays;
import java.util.List;

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class HelloSpark {
            public static void main(String[] args) {

                        // Local mode
                        JavaSparkContext sc = new JavaSparkContext("local", "HelloSpark");
                        String[] arr = new String[] { "John", "Paul", "Gavin", "Rahul", "Angel" };
                        List<String> inputList = Arrays.asList(arr);
                        JavaRDD<String> inputRDD = sc.parallelize(inputList);
                        inputRDD.foreach(x -> System.out.println(x));

            }
}


Thats it..Go and execute it...






No comments:

Post a Comment