12.06.11
Hadoop Harnesses Huge Data

[Cartoon]
If you are not aware of Hadoop, then the world of IT is about to pass you by. If you haven’t already, you should read some articles about Hadoop and consider how it might work for you.
Have you ever been on the job, entered an account number or product code, and waited for a response from your corporate system? Have you ever wondered why Google can seem to search the entire web containing hundreds of millions of documents and respond in less than a second? It doesn’t seem to make any sense. You can give Google a combination of words and phrases and it doesn’t seem to slow it down. Ask your corporate system for anything extra and you will pay for it by waiting.
The key difference between accessing corporate data and accessing Google data is in the way the data is accessed and stored. Corporate data is centralized and structured according to the relational model. The Google data is highly distributed and runs batch processes in parallel.
Exactly how Google works is their trade secret, but Hadoop is constructed using the same concept. Data can be stored in thousands of files and accessed in parallel. The results are as astonishing as running a search on Google.
With corporate information growing at accelerated rates and more and more of the data being unstructured, Hadoop may be the solution needed. There has certainly been a lot of hoopla about Hadoop. There has been both praise and reservation.
The evaluation of Hadoop is something that Enterprise Architects are already considering. For them this is just another great innovation that is available in the open source environment. They pay attention because they see that Oracle, IBM, SAP, and other big names are also paying attention.
At the same time Hadoop is being praised, the centralized database administrators are less than convinced. They see Hadoop as an uncontrolled environment that would not stand up to the rigor of Master Data Management. They see a potential of a turf battle brewing and are ready to go the distance.
Hadoop is the kind of innovation that does not come along too often. It is the kind of innovation that when applied will change the core of how organizations manage their data. Instead of thinking big, centralized, relational databases, organizations can think of information as a distributed asset.
For decades, the big business software suppliers convinced us that all data communications should be controlled and operate from a centralized computer. Those were the days of leased lines, TCAM, Bisync, and other forgotten concepts. These concepts were blown away by the distributed concepts of the internet.
It appears that the centralized concepts of data management may give way to a distribute approach. Hadoop may be the beginning of that approach. Enterprise Architects are certainly considering the potential. It could be an upheaval in data management methods.

Enterprise Architects are well-aware of the continuing evolution of technology. They creatively look for technology convergence that can provide breakthroughs in thinking. We are at one of those convergent junctions today. What is about to happen will give non-professional information technologists control of their use of automation in their business. No longer will they simply peer through windows and see only what applications let them see. They will be able to go inside, see how things work, and control their automation. – Enterprise Architects Masters of the Unseen City
Closing the Business / IT gap.

