Map Reduce Programming is a simple programming model which is almost always coined up when we talk about Hadoop, a framework for storing, retrieving, processing and 
manipulating large amount of data sets which could go up to 
multi-terabytes. Hadoop makes use of Map Reduce Programming model to 
process this vast amount of data which is spread across large set of 
commodity hardware nodes known as clusters, in parallel. The Map Reduce 
Programming is specially used for aggregating or segregating the large 
dataset in a parallel fashion so that the processing power of all the 
nodes in the cluster is put to use and hence, high efficiency and quick 
results can be achieved. Here, in this blog I have tried to explain Map 
Reduce Programming in context to the Hadoop Framework.
The fundamental data type on which the Map Reduce 
programming operates is <Key, Value> pairs. These Key-Value pairs 
are analogous to HashMap in Java, Associated Arrays in PHP, Hash in Ruby
 and so on. Map Reduce job (program) mainly (not necessarily) consists 
two types of Job processes working in tandem. The first one is Map (referred to as Mapper) &
 another is Reduce (referred to as Reducer). The input data-set which does not 
necessarily has to be in <Key, Value> pairs, is fed into Mapper. The 
input is broken down into independent chunks and further into <Key, 
Value> pairs and fed to Map jobs. Map job is run exactly once for 
each Key, in parallel on nodes, and will break down <Key, Value> 
pairs further. In Hadoop, for each input <K, V>, there could zero 
or more output <K, V> pairs. So, one input <K, V> pair could
 produce one or more than one or do not generate a output <K, V> 
at all. Once this operation is completed on all nodes (which is running Map 
jobs), the Hadoop sorts the output of the Maps and pass them on to the 
Reducer job(s), which could also be running in parallel on various 
cluster nodes. While sorting the Maps output, the values within same key
 are aggregated & collected to form a list or an array (to be more 
specific) and form a new <K, V> pair which is like <K, [V1, V2,
 V3...]>. This aggregated <K, V> pair is then input to the Reduce
 jobs so that it can be processed further down to produce final <K, 
V> pairs. 
Let me explain this by an example. I am taking an 
example of small & simple program WordCount 1.0 which can be found 
on Hadoop site.  This example will get you started with Map Reduce Programming and you'll 
get the feel of how it works.
get the feel of how it works.
The Wordcount program simply counts number of occurrences of each word in 
given input set (two text files in this case)
given input set (two text files in this case)
So, lets assume we have two text files:-
File01.txt
Hello World Bye World
File02.txt
Hello Hadoop Goodbye Hadoop
Before
 feeding these input files to Map jobs, these files are broken into 
chunks 
separated by newlines. So, each map job will receive exactly one line at a time
to process. It will further break
separated by newlines. So, each map job will receive exactly one line at a time
to process. It will further break
it
 down along the white spaces and create Key-Value for each word found 
like <(Word), 1>. In our case the output of the map jobs will be like:-
like <(Word), 1>. In our case the output of the map jobs will be like:-
<Hello, 1>
<World, 1>
<Bye, 1>
<World, 1>
<Hello, 1>
<Hadoop, 1>
<Goodbye, 1>
<Hadoop, 1>
Once all the map jobs are done, these Key-Values are sorted by key and pushed 
to the Reducer jobs. Reducer aggregates these pairs to emit new Key-Value
 pairs like this:-
<Bye, 1>
<Goodbye, 1>
<Hadoop, 2>
<Hello, 2>
<World, 2>
Note
 the output. The keys are lexicographically sorted and aggregated. These
 output can be saved to output text files or depending on the needs, can
 be pushed into Database or other data sources. In context to Hadoop, 
Map Reduce programs can have number of Mapper and Reducer jobs which are 
customizable. Reducer jobs are completely optional and can be omitted for
 certain jobs.
I hope this small tutorial on Map Reduce Programming
 helped you understand the concept and get you started. In the meantime, if you have any questions/concerns, you can contact me at developer.vineet@gmail.com or you can comment here. You can also checkout this content at my blog http://www.technologywithvineet.com
