Friday, September 5, 2014

Small domain Big Data scenario for Hadoop



The Hadoop system is developed to compile and analyze large amount of varying data. It may appear that the technique would be useful only in case of web search engines,  countrywide large projects  or big multinational companies. However, it may be pointed out the volume of data depends not on scale or extent of operation but  on the precision  level of data monitoring. For example, space research may deal with Big data composed of information about planets, stars, galaxies in the universe. However, for study of molecular physics or DNA research in bioinformatics, the  information  has the same characteristics of Big Data.

I remember to have read a book  “Nature of physical world” by Eddincton ( if I remember correctly) which starts with an example of the table  that can  give different perception based on viewers tool. To our eyes, it looks  as a piece of furniture with some dimensions, but if we see it through electron  microscope, we find it to be a vast cluster  of millions of atoms and molecules. The information presence depends on our probing tool and could be conceived as a small set of data units or a very large store of data comparable to Big Data. 

Even  for any educational institute, city corporation or business organization, there is a presence of lot of information sources which if monitored minutely will amount to size of Big Data. The only reason we do not deal with such multifarious data is that we do not have that type of large  information processing system. Hence we only consider easily manageable data units and build our decision support system on analysis of such small set of data.

Hadoop programming model has provided us  an effective tool for compiling, storing and  analyzing large data with sufficient redundancy to protect against loss of data, flexibility in handling varying volume and type of data and astonishing speed of data crunching and analysis. This has been made possible through distributed and parallel storage and processing on scalable cluster of computer devices.

If such is the case, then why not employ this effective tool to solve seemingly small domain problems by expanding the data sources to cover all minute features which affect the system behavior.

Let us take an example of a college level educational institute. There are ample data resources as regards, infrastructure, faculty, students , curriculum, courses, amenities, events which are not explored in detail and not considered in planning effective administration. Actually such institute generally has large pool of computers which remain idle except during  practicals. The data generated by students through seminars, research projects is rarely compiled and converted to asset.  Archiving of student records over long time periods, monitoring the alumni whereabouts, communication and collaboration between departments and outside agencies  is  not attended in majority of cases due to administration work overload and limited data collection. 

If  all such data are compiled and processed to give effective administration of education institute by developing   a data centre with Hadoop system utilizing  existing computers in the institute, it  can achieve a significant improvement in existing  work efficiency.  The data backup can be linked with cloud storage to safeguard against loss of data due to the total system failure by any reason.

Thus Hadoop  system may prove to be a Big Next Change for many  small and big organizations if proper deployment and customization is done to suit domain specific requirements.This will increase efficiency, reduce infrastructure cost and provide reliability and flexibility in operation.

No comments:

Post a Comment