The Hadoop system is developed to compile and analyze large
amount of varying data. It may appear that the technique would be useful only
in case of web search engines, countrywide
large projects or big multinational companies. However, it may be pointed out the volume of data depends not on scale or
extent of operation but on the precision
level of data monitoring. For example,
space research may deal with Big data composed of information about planets, stars,
galaxies in the universe. However, for study of molecular physics or DNA research
in bioinformatics, the information has the same characteristics of Big Data.
I remember to have read a book “Nature of physical world” by Eddincton ( if I remember correctly) which
starts with an example of the table that
can give different perception based on
viewers tool. To our eyes, it looks as a
piece of furniture with some dimensions, but if we see it through electron microscope, we find it to be a vast cluster of millions of atoms and molecules. The
information presence depends on our probing tool and could be conceived as a
small set of data units or a very large store of data comparable to Big Data.
Even for any
educational institute, city corporation or business organization, there is a
presence of lot of information sources which if monitored minutely will amount
to size of Big Data. The only reason we do not deal with such multifarious data
is that we do not have that type of large information processing system. Hence we only
consider easily manageable data units and build our decision support system on analysis of such small set of data.
Hadoop programming model has provided us
an effective tool for compiling, storing and analyzing large data with sufficient
redundancy to protect against loss of data, flexibility in handling varying
volume and type of data and astonishing speed of data crunching and analysis.
This has been made possible through distributed and parallel storage and processing on scalable cluster of computer devices.
If such is the case, then why not employ this effective tool
to solve seemingly small domain problems by expanding the data sources to cover
all minute features which affect the system behavior.
Let us take an example of a college level educational institute. There are
ample data resources as regards, infrastructure, faculty, students ,
curriculum, courses, amenities, events which are not explored in detail and not
considered in planning effective administration. Actually such institute generally
has large pool of computers which remain idle except during practicals. The data generated by students
through seminars, research projects is rarely compiled and converted to
asset. Archiving of student records over
long time periods, monitoring the alumni whereabouts, communication and
collaboration between departments and outside agencies is not
attended in majority of cases due to administration work overload and limited data collection.
If all such data are
compiled and processed to give effective administration of education institute
by developing a data centre with Hadoop
system utilizing existing computers in
the institute, it can achieve a
significant improvement in existing work
efficiency. The data backup can be
linked with cloud storage to safeguard against loss of data due to the total
system failure by any reason.
Thus Hadoop system may
prove to be a Big Next Change for many small and big organizations if proper
deployment and customization is done to suit domain specific requirements.This will increase efficiency, reduce infrastructure cost and provide reliability and flexibility in operation.
No comments:
Post a Comment