WordCount application using hadoop map reduce algorithm Java Programs and Examples with Output

WordCount application using hadoop map reduce algorithm

Posted by Raju Gupta at 3:00 PM – 0 comments

WordCount is a simple application that counts the number of occurences of each word in a given input set using map reduce algorithm.

import java.io.IOException;  
import java.util.*;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.conf.*;  
import org.apache.hadoop.io.*;  
import org.apache.hadoop.mapred.*;  
import org.apache.hadoop.util.*;  

public class WordCount {  
 public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {  
  private final static IntWritable one = new IntWritable(1);  
  private Text word = new Text();  
  public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {  
   String line = value.toString();  
   StringTokenizer tokenizer = new StringTokenizer(line);  
   while (tokenizer.hasMoreTokens()) {  
    word.set(tokenizer.nextToken());  
    output.collect(word, one);  
   }  
  }  
 }  

 public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {  
  public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {  
   int sum = 0;  
   while (values.hasNext()) {  
    sum += values.next().get();  
   }  
   output.collect(key, new IntWritable(sum));  
  }  
 }  

 public static void main(String[] args) throws Exception {  
  JobConf conf = new JobConf(WordCount.class);  
  conf.setJobName("wordcount");  

  conf.setOutputKeyClass(Text.class);  
  conf.setOutputValueClass(IntWritable.class);  
  conf.setMapperClass(Map.class);  
  conf.setCombinerClass(Reduce.class);  
  conf.setReducerClass(Reduce.class);  
  conf.setInputFormat(TextInputFormat.class);  
  conf.setOutputFormat(TextOutputFormat.class);  
  FileInputFormat.setInputPaths(conf, new Path(args[0]));  
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));  
  JobClient.runJob(conf);  
 }  
}

Usage

Assuming HADOOP_HOME is the root of the installation and HADOOP_VERSION is the Hadoop version installed, compile WordCount.java and create a jar:

$ mkdir wordcount_classes

$ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java

$ jar -cvf /usr/raj/wordcount.jar -C wordcount_classes/ .

Assuming that:

/usr/raj/wordcount/input - input directory in HDFS

/usr/raj/wordcount/output - output directory in HDFS

Sample text-files as input:

$ bin/hadoop dfs -ls /usr/raj/wordcount/input/

/usr/raj/wordcount/input/file01

/usr/raj/wordcount/input/file02

$ bin/hadoop dfs -cat /usr/raj/wordcount/input/file01

Hello World Bye World

$ bin/hadoop dfs -cat /usr/raj/wordcount/input/file02

Hello Hadoop Goodbye Hadoop

Run the application:

$ bin/hadoop jar /usr/raj/wordcount.jar org.myorg.WordCount /usr/raj/wordcount/input /usr/raj/wordcount/output

Output:

$ bin/hadoop dfs -cat /usr/raj/wordcount/output/part-00000

Bye 1

Goodbye 1

Hadoop 2

Hello 2

World 2

Java Programs and Examples with Output

Pages

WordCount application using hadoop map reduce algorithm

Leave a Reply

List of Java Programs

Total Pageviews

Followers

Popular Posts of This Week

Archives

Our Blogs

Labels

Popular Posts