In this project, you are asked to work on the MapReduce framework. From the lecture, that you need to refer to, MapReduce is one of important techniques to solve Big Data problems.
Mainly, it has two main phases; namely Map phase and Reduce phase. In each one of these, you have sub-phases. Briefly, on a cluster of nodes/cores, during the Map phase, the cluster nodes running the map program should emit key-value pairs based on the split chunks of the input file. These key-value pairs will be consumed by the cluster nodes running the reduce program. The reduce component usually summarizes the data received by the map phase to produce the final output after the combining the output coming from several nodes.
plz see the attached file