Getting Started with Apache Hadoop
MapReduce refers to a framework that runs on a computational cluster to mine large datasets. The name derives from the application of map() and reduce() functions repurposed from functional programming languages. The main advantage of a MapReduce framework is that it allows the application programmers to focus on building business logic without conerning themselves with infrastructure, scalability, network traffic, or multiprocessing issues.
Apache Hadoop is an open source Java framework for implementing reliable and scalable computational networks. It comprises the tools and utilities for data serialization, file system access, and interprocess communication pertaining to MapReduce implementations. So... how do you get started? Check out DZone's Refcard no. 117 and the topics it covers:
- Introduction to MapReduce
- Apache Hadoop and its components
- Hadoop cluster building blocks
- Tools and commands quick reference guide
- App development quick HOWTO
Here are the links for the companion Scalability and High Availability Systems Refcard and the NoSQL and Data Scalability Refcard - enjoy!
Scalable Systems Newsletter
Subscribe to the newsletter and get every issue mailed free - with access to the latest system scalability, high availability, and performance news.