A simple Spark test project

作者: NickYang 分类: 大数据,技术文章 发布时间: 2016-08-10 20:31

I start to learn Spark to process some log files, here is a simple example.

How to build Spark, please see http://spark.apache.org/docs/latest/building-spark.html

Scala file

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("wordCount")
    val sc = new SparkContext(conf)
    val input = sc.textFile("/home/nickyang/develop/spark/spark-1.6.1/README.md")
    val words = input.flatMap(line => line.split(" "))
    val couts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y}
    couts.saveAsTextFile("/home/nickyang/develop/spark/spark-1.6.1/examples/wordCount/result")
    }
}

sbt file(use sbt to build this example)

name := "SampleApp"
version := "0.0.1"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided"
sbt package
YOUR_SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[1] target/scala-2.11/sampleapp_2.10-0.0.1.jar

The result is  in result directory, two files, one is _SUCCESS that tells us the right result, the other one is “part-00000”, contains words and words’ count in this text file.

(package,1)
(For,2)
(Programs,1)
(processing.,1)
(Because, 1)
(The,1)
(cluster.,1)
(its,1)
([run,1)
(APIs,1)
(have,1)
(Try,1)

 

BTW. this article is written in Ubuntu, haven’t Chinese input method(English version).

如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!

发表评论

电子邮件地址不会被公开。 必填项已用*标注