Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This tutorial will appeal to Nutch administrators looking to improve runtime speed whilst maintaining MapReduce’s ability to scale to petabytes of data. Readers are encouraged to share their experienced using Nutch on Tez.

Related JIRA Tickets

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyNUTCH-2838

What is Apache Tez?

Apache Tez is described as an application framework which allows for a complex directed-acyclic-graph (DAG) of tasks for processing data. It is currently built atop Apache Hadoop YARN.

...

Running the Generator job on Tez

Run #YARN Engine# of URLSElapsed Time
1MapReduce1132200:01:19
2MapReduce1132200:01:18
3MapReduce1132200:01:22
4MapReduce1132200:01:23
5TezN/AN/A
6TezN/AN/A
7TezN/AN/A
8TezN/AN/A

As of it was discovered that the Generator job is incompatible with Tez. The job execution log below details the outcome.

...