Hadoop Books

These books are listed in order of publication, most recent first. The Apache Software Foundation does not endorse any specific book. The links to Amazon are affiliated with the specific author. That said, we also encourage you to support your local bookshops, by buying the book from any local outlet, especially independent ones.

Books in Print

Here are the books that are currently in print in order of publishing, along with the Hadoop version they were written against. One problem anyone writing a book will encounter is that Hadoop is a very fast-moving target, and that things can change fast. Usually this is for the better, when a book says "Hadoop can't" they really mean "the version of Hadoop we worked with couldn't", and that the situation may have improved since then. If you have any query about Hadoop, don't be afraid to ask on the relevant user mailing lists.

{{{#!wiki comment/dotted Attention people adding new entries.
# Only reference books about Hadoop and related programs, not random PHP stuff.
# Please include publishing date and version of Hadoop the book is relevant to.
# Please write this in a neutral voice, not "this book will help you", as that implies that the ASF has opinions on the matter. Someone will just edit the claims out.
# Please do not go overboard in exaggerating the outcome of reading a book, "readers of this book will become experts in advanced production-scale Hadoop Algorithms". Such claims will be edited out and not replaced.
# Please don't have tracking URLs. We'll only cut them.
}}}

Hands-On Big Data Processing with Hadoop 3 (Video)

Name: Hands-On Big Data Processing with Hadoop 3 (Video)

Author: Sudhanshu Saxena

Publisher: Packt

Date of Publishing: October 2018

Perform real-time data analytics, stream and batch processing on your application using Hadoop

Modern Big Data Processing with Hadoop

Name: Modern Big Data Processing with Hadoop

Author: V. Naresh Kumar, Prashant Shindgikar

Publisher: Packt

Date of Publishing: March 2018

A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop

Deep Learning with Hadoop

Name: Deep Learning with Hadoop

Author: Dipayan Dev

Publisher: Packt

Date of Publishing: February 2017

Build, implement and scale distributed deep learning models for large-scale datasets.

Hadoop Blueprints

Name: Hadoop Blueprints

Authors: Anurag Shrivastava, Tanmay Deshpande

Publisher: Packt

Date of Publishing: September 2016

Use Hadoop to solve business problems by learning from a rich set of real-life case studies.

Hadoop: Data Processing and Modelling

Name: Hadoop: Data Processing and Modelling

Authors: Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Publisher: Packt

Date of Publishing: August 2016

Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets.

Hadoop Explained (Free eBook Download)

Name: Hadoop Explained

Author: Aravind Shenoy

Publisher: Packt Publishing

Learn how MapReduce organizes and processes large sets of data and discover the advantages of Hadoop - from scalability to security, see how Hadoop handles huge amounts of data with care

Hadoop Real-World Solutions Cookbook- Second Edition

Name: Hadoop Real-World Solutions Cookbook- Second Edition

Author: Tanmay Deshpande

Publisher: Packt Publishing

Date of Publishing: March 2016

The book covers recipes that are based on the latest versions of Apache Hadoop 2.X, YARN, Hive, Pig, Sqoop, Flume, Apache Spark, Mahout etc.

Hadoop Security: Protecting Your Big Data Platform

Name: Hadoop Security: Protecting Your Big Data Platform

Author: Ben Spivey, Joey Echeverria

Publisher: O'Reilly Media

Date of Publishing: June 2015

Covers Hadoop security from a high level, down to how to set up a secure Hadoop cluster and the individual services within it.

Hadoop and Kerberos: The Madness Beyond the Gate

Name: Hadoop and Kerberos: The Madness Beyond the Gate

Author: Steve Loughran

Date of Publishing: June, 2015 +

This is an ongoing ebook project attempting to cover the internals of Hadoop + Kerberos. It is targeted at developers and people trying to understand obscure kerberos-related stack traces.

Apache Oozie Essentials

Name: Apache Oozie Essentials

Author: Jagat Jasjit Singh

Publisher: Packt Publishing

Date of Publishing: December, 2015

This book covers automating data and ML pipelines via Apache Oozie.

Data Lake Development with Big Data

Name: Data Lake Development with Big Data

Author: Pradeep Pasupuleti, Beulah Salome Purra

Publisher: Packt Publishing

Date of Publishing: November, 2015

This book is for architects and senior managers building a strategy around their current data architecture, helping them identify the need for a Data Lake implementation in an enterprise context.

Elasticsearch for Hadoop

Name: Elasticsearch for Hadoop

Author: Vishal Shukla

Publisher: Packt Publishing

Date of Publishing: October, 2015

Elasticsearch for Hadoop covers integrating Elasticsearch into Hadoop to visualize and analyze your data.

YARN Essentials

Name: YARN Essentials

Authors: Amol Fasale, Nirmal Kumar

Publisher: Packt Publishing

Date of Publishing: February, 2015

YARN Essentials is for developers with little knowledge of Hadoop 1.x and want to start afresh with YARN.

Learning YARN

Name: Learning YARN

Authors: Akhil Arora, Shrey Mehrotra

Publisher: Packt Publishing

Date of Publishing: August, 2015

Learning YARN is intended for those who want to understand what YARN is and how to efficiently use it for the resource management of large clusters.

Big Data Forensics: Learning Hadoop Investigations

Name: Big Data Forensics: Learning Hadoop Investigations

Author: Joe Sremack

Publisher: Packt Publishing

Date of Publishing: August, 2015

Big Data Forensics: Learning Hadoop Investigations will guide statisticians and forensic analysts with basic knowledge of digital forensics to conduct Hadoop forensic investigations.

Learning Hadoop 2

Name: Learning Hadoop 2

Authors: Garry Turkington, Gabriele Modena

Publisher: Packt Publishing

Date of Publishing: February, 2015

Learning Hadoop 2 is an introduction guide to building data-processing applications with the wide variety of tools supported by Hadoop 2.

Hadoop MapReduce v2 Cookbook - Second Edition

Name: Hadoop MapReduce v2 Cookbook - Second Edition

Authors: Thilina Gunarathne

Publisher: Packt Publishing

Date of Publishing: February, 2015

Hadoop MapReduce v2 Cookbook - Second Edition is a beginner's guide to explore the Hadoop MapReduce v2 ecosystem to gain insights from very large datasets.

Scaling Big Data with Hadoop and Solr - Second Edition

Name: Scaling Big Data with Hadoop and Solr - Second Edition

Authors: Hrishikesh Vijay Karambelkar

Hadoop Version: 2.6

Publisher: Packt Publishing

Date of Publishing: April, 2015

Scaling Big Data with Hadoop and Solr - Second Edition is aimed at developers, designers, and architects who would like to build big data enterprise search solutions for their customers or organizations

Hadoop for Finance Essentials

Name: Hadoop for Finance Essentials

Authors: Rajiv Tiwari

Publisher: Packt Publishing

Date of Publishing: April, 2015

Hadoop for Finance Essentials is for developers who would like to perform big data analytics with Hadoop for the financial sector.

Monitoring Hadoop

Name: Monitoring Hadoop

Authors: Gurmukh Singh

Publisher: Packt Publishing

Date of Publishing: April 28, 2015

Monitoring Hadoop is for Hadoop administrators who want to learn how to monitor and diagnose their clusters.

Hadoop Backup and Recovery Solutions

Name: Hadoop Backup and Recovery Solutions

Authors: Gaurav Barot, Chintan Mehta, Amij Patel

Hadoop Version: 2.7.x

Publisher: Packt Publishing

Date of Publishing: July 28, 2015

Hadoop Backup and Recovery Solutions demonstrates the strategies for data recovery from Hadoop backup clusters and troubleshoot problems.

Hadoop Essentials

Name: Hadoop Essentials

Authors: Shiva Achari

Hadoop Version: 2.6

Publisher: Packt Publishing

Date of Publishing: April 29, 2015

Hadoop Essentials explains the key concepts of Hadoop and gives a thorough understanding of the Hadoop ecosystem.

Hadoop in Practice, Second Edition

Name: Hadoop in Practice, Second Edition

Author: Alex Holmes

Hadoop Version: 2.x

Publisher: Manning

Date of Publishing: Fall 2014.

Sample Chapters: Chapter 2: Introduction to YARN, Chapter 9: SQL on Hadoop

The second edition of Hadoop in Practice includes over 100 Hadoop techniques. This edition covers Hadoop 2 (YARN and MapReduce 2) and updates include new techniques that show how to integrate Kafka, Impala, and Spark SQL with Hadoop.

Optimizing Hadoop for MapReduce

Name: Optimizing Hadoop for MapReduce

Author: Khaled Tannir

Publisher: Packt Publishing

Date of Publishing: February 21, 2014

Sample Chapter: Chapter 3: Detecting System Bottlenecks

Optimizing Hadoop for MapReduce book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance.

Scaling Big Data with Hadoop and Solr

Name: Scaling Big Data with Hadoop and Solr

Author: Hrishikesh Karambelkar

Publisher: Packt Publishing

Date of Publishing: August 26, 2013

Sample Chapter: Chapter 2: Understanding Solr

Scaling Big Data with Hadoop and Solr is a step-by-step guide to building a search engine while scaling data. Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some real-world use cases and sample Java code.

Hadoop Operations and Cluster Management Cookbook

Name: Hadoop Operations and Cluster Management Cookbook

Author: Shumin Guo

Hadoop Version: 2.x

Publisher: Packt Publishing

Date of Publishing: July 24, 2013

Sample Chapter: Chapter 3: Configuring a Hadoop Cluster

Hadoop Operations and Cluster Management Cookbook is a guide for designing and managing a Hadoop cluster.

Hadoop Beginner's Guide

Name: Hadoop Beginner's Guide

Author: Garry Turkington

Hadoop Version: 1.0.x

Publisher: Packt Publishing

Date of Publishing: February 22, 2013

Sample Chapter: Chapter 4: Developing MapReduce Programs

Written for complete beginners to Hadoop, covers how to install and run Hadoop on a local Ubuntu host or create an on-demand Hadoop cluster on Amazon Web Services (EC2), before getting to grips with MapReduce.

Hadoop Real World Solutions Cookbook

Name: Hadoop Real World Solutions Cookbook

Author: Jonathan Owens, Brian Femiano, Jon Lentz

Hadoop Version: CDH3

Publisher: Packt Publishing

Date of Publishing: February 7, 2013

Sample Chapter: Chapter 6: Big Data Analysis

Collection of real world code analytics and design patterns using various tools from the Hadoop community. Each recipe walks the reader through the implementation, or in some cases debugging and configuration tuning. The book covers various tools including MapReduce, Hive, Pig, MRUnit, serialization using Avro/Thrift/ProtoBuffs, Giraph, Accumulo and several others.

Hadoop MapReduce Cookbook

Name: Hadoop MapReduce Cookbook

Author: Srinath Perera, Thilina Gunarathne

Hadoop Version: 1.0.x

Publisher: Packt Publishing

Date of Publishing: January 25, 2013

Sample Chapter: Chapter 6: Analytics

Hadoop MapReduce Cookbook is a guide to processing large and complex data sets using Hadoop MapReduce.

Hadoop Operations

Name: Hadoop Operations

Author: Eric Sammers

Hadoop Version: 1.x, CDH3.x

Publisher: O'Reilly Press

Date of Publishing: September 2012.

A guide to running large-scale Hadoop clusters, written by someone who has practical experience in such deployments.

Hadoop in Practice

Name: Hadoop in Practice

Author: Alex Holmes

Hadoop Version: 1.0

Publisher: Manning

Date of Publishing: Fall 2012.

Sample Chapter: Chapter 1

Hadoop: The Definitive Guide, 3rd Edition

Name: Hadoop: The Definitive Guide, 3rd Edition

Author: Tom White

Hadoop Version: 1.x

Publisher: O'Reilly

Date of Publishing: May 2012

Sample Chapter: Sample Chapter

Hadoop in Action

Name: Hadoop in Action

Author: Chuck Lam

Hadoop Version: 0.19-0.20

Publisher: Manning

Date of Publishing: December, 2010

Sample Chapter: Chapter 1

Hadoop in Action introduces the subject and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming.

Hadoop: The Definitive Guide, 2nd Edition

Name: Hadoop: The Definitive Guide, 2nd Edition

Author: Tom White

Hadoop Version: 0.20-0.21

Publisher: O'Reilly

Date of Publishing: September 2010

Pro Hadoop

Name: Pro Hadoop

Author: Jason Venner

Hadoop Version: 0.20

Publisher: Apress

Date of Publishing: June 22, 2009

Jason says "This book is a step by step guide to writing, running and debugging Map/Reduce jobs using Hadoop, and to installing and managing Hadoop Clusters. It is ideal for training new Map/Reduce users and Cluster administrators and for polishing existing Hadoop skills."

Hadoop: The Definitive Guide

Name: Hadoop: The Definitive Guide

Author: Tom White

Hadoop Version: 0.20

Publisher: O'Reilly

Date of Publishing: June 19, 2009

Forthcoming Books

Hadoop in Action, Second Edition

Name: Hadoop in Action, Second Edition

Author: Chuck P. Lam, Mark W. Davis

Hadoop Version: 2.x

Publisher: Manning

Date of Publishing (est.): October 2015



Hadoop Videos


Hands-On Big Data Analysis with Hadoop 3 (Video)

Name: Hands-On Big Data Analysis with Hadoop 3 (Video)

Author: Tomasz Lelek

Publisher: Packt

Date of Publishing: August 2018

Perform real-time data analytics with Hadoop


Hands-On Beginner’s Guide on Big Data and Hadoop 3 (Video)

Name: Hands-On Beginner’s Guide on Big Data and Hadoop 3 (Video)

Author: Milind Jagre

Publisher: Packt

Date of Publishing: July 2018

Effectively store, manage, and analyze large Datasets with HDFS, SQOOP, YARN, and MapReduce


Hadoop Administration and Cluster Management (Video)

Name: Hadoop Administration and Cluster Management (Video)

Author: Gurmukh Singh

Publisher: Packt

Date of Publishing: May 2018

Planning, deploying, managing, monitoring and performance-tuning your Hadoop cluster with Apache Hadoop


Solving 10 Hadoop'able Problems (Video)

Name: Solving 10 Hadoop'able Problems (Video)

Author: Tomasz Lelek

Publisher: Packt

Date of Publishing: February 2018

Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop


Learn By Example: Hadoop, MapReduce for Big Data problems (Video)

Name: Learn By Example: Hadoop, MapReduce for Big Data problems (Video)

Author: Loonycorn

Publisher: Packt

Date of Publishing: Jan 2018

A hands-on workout in Hadoop, MapReduce and the art of thinking "parallel"


The Ultimate Hands-on Hadoop (Video)

Name: The Ultimate Hands-on Hadoop (Video)

Author: Frank Kane

Publisher: Packt

Date of Publishing: June 2017

Design distributed systems that manage Big Data using Hadoop and related technologies.


Getting Started with Hadoop 2.x (Video)

Name: Getting Started with Hadoop 2.x (Video)

Author: A K M Zahiduzzaman

Publisher: Packt

Date of Publishing: April 30, 2017

Build a strong foundation by exploring Hadoop ecosystem with real-world examples.


Taming Big Data with MapReduce and Hadoop - Hands On! (Video)

Name: Taming Big Data with MapReduce and Hadoop - Hands On! (Video)

Author: Frank Kane

Publisher: Packt

Date of Publishing: September 12, 2016

Master the art of processing Big Data using Hadoop and MapReduce with the help of real-world examples.



Hadoop in Action introduces the subject and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming.

  • No labels