Concept – Hadoop concept is simple. In order to provide
scalable, reliable and flexible large capacity for storing and analyzing Big
Data, The job is split into number of small tasks and allotted to number of
hardware devices which run in parallel.
HDFS system does the splitting and distribution of small
information packets on hard disk storages on different machines ensuring sufficient
redundancy. Map-Reduce system works in two steps. Mapping of the information packets with job tracker is done in key-value format where the key indicates the locationand type
of information whereas value represents actual information. Mapping also
converts the key-value input list to
output list with some predefined criteria. Reduce part does the aggregation of
the results from output list to provide desired summary information as the final output.
As the information
from Big Data sources is collected continuously the information flow can be
properly termed in terms of volume of data (bandwidth) varying with time. In that sense, Hadoop manages
streaming data and the terms input list and output list could be interpreted as
input and output data streams.
Significance for India – Thus the Hadoop system is based on parallel
processing concept. Though it is employed for collection and processing of Big
Data by using large clusters of computers, we can use this concept for handling
large scale computational requirements for monitoring countrywide development projects and variety
of large scale complex environmental and social challenges.
India is planning for development of 100 smart cities. The administration
of such cities needs webbased central control
software to cater for infrastructural services, energy and ecosystem
monitoring, health, education and business requirements. Cloud computing with
Hadoop system would be essential for such projects.
As regards, Digital India objective, Hadoop provides many
avenues for progress in this direction. Major software companies in India like
Infosys, Wipro and TCS and call centers are using human resource in similar
fashion. These companies can improve their functioning to achieve scalability, redundancy and cost optimization
by using Hadoop methodology.
The workload of projects
can be split in tiny tasks and distributed to large number of small software companies
located in villages and small towns with sufficient replication to safeguard
the project execution timeline even if
some providers fail to deliver the desired quality output. As the workforce
would be scattered and not located in costly urban centres, the salary burden
can be greatly reduced.
Moreover, this will give a big boost to small software
companies in semi urban or rural area, which are struggling for their survival
through transient staff and paucity of good projects. Strengthening of distributed
small software companies will help in achieving the goal of digital India by
providing live project training facilities to educated but unemployed youth who
can’t leave their home due to agriculture or family requirements.
Hadoop relies on
distributed parallel processing of Big Data by using computers. We can use the
same concept for handling large scale projects of any type by splitting the
work in large number of small work packets and replacing computers with skilled but scattered
human workforce. Fortunately, the internet connectivity and popularity of
mobile and tab devices has provided necessary hardware support for integrating
such tasks.
Thus Hadoop system is not only for Big Data computing system for large projects but
can provide a new way of distributed sustainable development in India.