org.apache.hadoop.mapreduce
Class Job

java.lang.Object
  extended by org.apache.hadoop.mapreduce.task.JobContextImpl
      extended by org.apache.hadoop.mapreduce.Job
All Implemented Interfaces:
JobContext

public class Job
extends JobContextImpl
implements JobContext

The job submitter's view of the Job. It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.


Nested Class Summary
static class Job.JobState
           
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl
conf, credentials, ugi
 
Fields inherited from interface org.apache.hadoop.mapreduce.JobContext
CACHE_ARCHIVES_VISIBILITIES, CACHE_FILE_VISIBILITIES, COMBINE_CLASS_ATTR, INPUT_FORMAT_CLASS_ATTR, JAR_UNPACK_PATTERN, JOB_ACL_MODIFY_JOB, JOB_ACL_VIEW_JOB, JOB_CANCEL_DELEGATION_TOKEN, JOB_NAMENODES, MAP_CLASS_ATTR, MAP_MEMORY_PHYSICAL_MB, MAP_OUTPUT_COLLECTOR_CLASS_ATTR, MAPREDUCE_TASK_CLASSPATH_PRECEDENCE, OUTPUT_FORMAT_CLASS_ATTR, PARTITIONER_CLASS_ATTR, REDUCE_CLASS_ATTR, REDUCE_MEMORY_PHYSICAL_MB, SHUFFLE_CONSUMER_PLUGIN_ATTR, USER_LOG_RETAIN_HOURS
 
Constructor Summary
Job()
           
Job(org.apache.hadoop.conf.Configuration conf)
           
Job(org.apache.hadoop.conf.Configuration conf, java.lang.String jobName)
           
 
Method Summary
 void failTask(TaskAttemptID taskId)
          Fail indicated task attempt.
 Counters getCounters()
          Gets the counters for this job.
static Job getInstance()
          Creates a new Job A Job will be created with a generic Configuration.
static Job getInstance(org.apache.hadoop.conf.Configuration conf)
          Creates a new Job with a given Configuration.
static Job getInstance(org.apache.hadoop.conf.Configuration conf, java.lang.String jobName)
          Creates a new Job with a given Configuration and a given jobName.
 java.lang.String getJar()
          Get the pathname of the job's jar.
 TaskCompletionEvent[] getTaskCompletionEvents(int startFrom)
          Get events indicating completion (success/failure) of component tasks.
 java.lang.String getTrackingURL()
          Get the URL where some job progress information will be displayed.
 boolean isComplete()
          Check if the job is finished or not.
 boolean isSuccessful()
          Check if the job completed successfully.
 void killJob()
          Kill the running job.
 void killTask(TaskAttemptID taskId)
          Kill indicated task attempt.
 float mapProgress()
          Get the progress of the job's map-tasks, as a float between 0.0 and 1.0.
 float reduceProgress()
          Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0.
 void setCancelDelegationTokenUponJobCompletion(boolean value)
          Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion.
 void setCombinerClass(java.lang.Class<? extends Reducer> cls)
          Set the combiner class for the job.
 void setCombinerKeyGroupingComparatorClass(java.lang.Class<? extends org.apache.hadoop.io.RawComparator> cls)
          Define the comparator that controls which keys are grouped together for a single call to combiner, Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
 void setGroupingComparatorClass(java.lang.Class<? extends org.apache.hadoop.io.RawComparator> cls)
          Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
 void setInputFormatClass(java.lang.Class<? extends InputFormat> cls)
          Set the InputFormat for the job.
 void setJarByClass(java.lang.Class<?> cls)
          Set the Jar by finding where a given class came from.
 void setJobName(java.lang.String name)
          Set the user-specified job name.
 void setMapOutputKeyClass(java.lang.Class<?> theClass)
          Set the key class for the map output data.
 void setMapOutputValueClass(java.lang.Class<?> theClass)
          Set the value class for the map output data.
 void setMapperClass(java.lang.Class<? extends Mapper> cls)
          Set the Mapper for the job.
 void setMapSpeculativeExecution(boolean speculativeExecution)
          Turn speculative execution on or off for this job for map tasks.
 void setNumReduceTasks(int tasks)
          Set the number of reduce tasks for the job.
 void setOutputFormatClass(java.lang.Class<? extends OutputFormat> cls)
          Set the OutputFormat for the job.
 void setOutputKeyClass(java.lang.Class<?> theClass)
          Set the key class for the job output data.
 void setOutputValueClass(java.lang.Class<?> theClass)
          Set the value class for job outputs.
 void setPartitionerClass(java.lang.Class<? extends Partitioner> cls)
          Set the Partitioner for the job.
 void setReducerClass(java.lang.Class<? extends Reducer> cls)
          Set the Reducer for the job.
 void setReduceSpeculativeExecution(boolean speculativeExecution)
          Turn speculative execution on or off for this job for reduce tasks.
 void setSortComparatorClass(java.lang.Class<? extends org.apache.hadoop.io.RawComparator> cls)
          Define the comparator that controls how the keys are sorted before they are passed to the Reducer.
 void setSpeculativeExecution(boolean speculativeExecution)
          Turn speculative execution on or off for this job.
 float setupProgress()
          Get the progress of the job's setup, as a float between 0.0 and 1.0.
 void setUserClassesTakesPrecedence(boolean value)
          Set the boolean property for specifying which classpath takes precedence - the user's one or the system one, when the tasks are launched
 void setWorkingDirectory(org.apache.hadoop.fs.Path dir)
          Set the current working directory for the default file system.
 void submit()
          Submit the job to the cluster and return immediately.
 boolean waitForCompletion(boolean verbose)
          Submit the job to the cluster and wait for it to finish.
 
Methods inherited from class org.apache.hadoop.mapreduce.task.JobContextImpl
getArchiveClassPaths, getArchiveTimestamps, getCacheArchives, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJobID, getJobName, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getProfileTaskRange, getReducerClass, getSortComparator, getSymlink, getUser, getWorkingDirectory, setJobID, userClassesTakesPrecedence
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapreduce.JobContext
getArchiveClassPaths, getArchiveTimestamps, getCacheArchives, getCacheFiles, getCombinerClass, getCombinerKeyGroupingComparator, getConfiguration, getCredentials, getFileClassPaths, getFileTimestamps, getGroupingComparator, getInputFormatClass, getJobID, getJobName, getJobSetupCleanupNeeded, getLocalCacheArchives, getLocalCacheFiles, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getMaxMapAttempts, getMaxReduceAttempts, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getProfileEnabled, getProfileParams, getReducerClass, getSortComparator, getSymlink, getUser, getWorkingDirectory, userClassesTakesPrecedence
 

Constructor Detail

Job

public Job()
    throws java.io.IOException
Throws:
java.io.IOException

Job

public Job(org.apache.hadoop.conf.Configuration conf)
    throws java.io.IOException
Throws:
java.io.IOException

Job

public Job(org.apache.hadoop.conf.Configuration conf,
           java.lang.String jobName)
    throws java.io.IOException
Throws:
java.io.IOException
Method Detail

getInstance

public static Job getInstance()
                       throws java.io.IOException
Creates a new Job A Job will be created with a generic Configuration.

Returns:
the Job
Throws:
java.io.IOException

getInstance

public static Job getInstance(org.apache.hadoop.conf.Configuration conf)
                       throws java.io.IOException
Creates a new Job with a given Configuration. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.

Parameters:
conf - the Configuration
Returns:
the Job
Throws:
java.io.IOException

getInstance

public static Job getInstance(org.apache.hadoop.conf.Configuration conf,
                              java.lang.String jobName)
                       throws java.io.IOException
Creates a new Job with a given Configuration and a given jobName. The Job makes a copy of the Configuration so that any necessary internal modifications do not reflect on the incoming parameter.

Parameters:
conf - the Configuration
jobName - the job instance's name
Returns:
the Job
Throws:
java.io.IOException

setNumReduceTasks

public void setNumReduceTasks(int tasks)
                       throws java.lang.IllegalStateException
Set the number of reduce tasks for the job.

Parameters:
tasks - the number of reduce tasks
Throws:
java.lang.IllegalStateException - if the job is submitted

setWorkingDirectory

public void setWorkingDirectory(org.apache.hadoop.fs.Path dir)
                         throws java.io.IOException
Set the current working directory for the default file system.

Parameters:
dir - the new current working directory.
Throws:
java.lang.IllegalStateException - if the job is submitted
java.io.IOException

setInputFormatClass

public void setInputFormatClass(java.lang.Class<? extends InputFormat> cls)
                         throws java.lang.IllegalStateException
Set the InputFormat for the job.

Parameters:
cls - the InputFormat to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setOutputFormatClass

public void setOutputFormatClass(java.lang.Class<? extends OutputFormat> cls)
                          throws java.lang.IllegalStateException
Set the OutputFormat for the job.

Parameters:
cls - the OutputFormat to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setMapperClass

public void setMapperClass(java.lang.Class<? extends Mapper> cls)
                    throws java.lang.IllegalStateException
Set the Mapper for the job.

Parameters:
cls - the Mapper to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setJarByClass

public void setJarByClass(java.lang.Class<?> cls)
Set the Jar by finding where a given class came from.

Parameters:
cls - the example class

getJar

public java.lang.String getJar()
Get the pathname of the job's jar.

Specified by:
getJar in interface JobContext
Overrides:
getJar in class JobContextImpl
Returns:
the pathname

setCombinerClass

public void setCombinerClass(java.lang.Class<? extends Reducer> cls)
                      throws java.lang.IllegalStateException
Set the combiner class for the job.

Parameters:
cls - the combiner to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setReducerClass

public void setReducerClass(java.lang.Class<? extends Reducer> cls)
                     throws java.lang.IllegalStateException
Set the Reducer for the job.

Parameters:
cls - the Reducer to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setPartitionerClass

public void setPartitionerClass(java.lang.Class<? extends Partitioner> cls)
                         throws java.lang.IllegalStateException
Set the Partitioner for the job.

Parameters:
cls - the Partitioner to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setMapOutputKeyClass

public void setMapOutputKeyClass(java.lang.Class<?> theClass)
                          throws java.lang.IllegalStateException
Set the key class for the map output data. This allows the user to specify the map output key class to be different than the final output value class.

Parameters:
theClass - the map output key class.
Throws:
java.lang.IllegalStateException - if the job is submitted

setMapOutputValueClass

public void setMapOutputValueClass(java.lang.Class<?> theClass)
                            throws java.lang.IllegalStateException
Set the value class for the map output data. This allows the user to specify the map output value class to be different than the final output value class.

Parameters:
theClass - the map output value class.
Throws:
java.lang.IllegalStateException - if the job is submitted

setOutputKeyClass

public void setOutputKeyClass(java.lang.Class<?> theClass)
                       throws java.lang.IllegalStateException
Set the key class for the job output data.

Parameters:
theClass - the key class for the job output data.
Throws:
java.lang.IllegalStateException - if the job is submitted

setSpeculativeExecution

public void setSpeculativeExecution(boolean speculativeExecution)
Turn speculative execution on or off for this job.

Parameters:
speculativeExecution - true if speculative execution should be turned on, else false.

setMapSpeculativeExecution

public void setMapSpeculativeExecution(boolean speculativeExecution)
Turn speculative execution on or off for this job for map tasks.

Parameters:
speculativeExecution - true if speculative execution should be turned on for map tasks, else false.

setReduceSpeculativeExecution

public void setReduceSpeculativeExecution(boolean speculativeExecution)
Turn speculative execution on or off for this job for reduce tasks.

Parameters:
speculativeExecution - true if speculative execution should be turned on for reduce tasks, else false.

setOutputValueClass

public void setOutputValueClass(java.lang.Class<?> theClass)
                         throws java.lang.IllegalStateException
Set the value class for job outputs.

Parameters:
theClass - the value class for job outputs.
Throws:
java.lang.IllegalStateException - if the job is submitted

setSortComparatorClass

public void setSortComparatorClass(java.lang.Class<? extends org.apache.hadoop.io.RawComparator> cls)
                            throws java.lang.IllegalStateException
Define the comparator that controls how the keys are sorted before they are passed to the Reducer.

Parameters:
cls - the raw comparator
Throws:
java.lang.IllegalStateException - if the job is submitted
See Also:
setCombinerKeyGroupingComparatorClass(Class)

setCombinerKeyGroupingComparatorClass

public void setCombinerKeyGroupingComparatorClass(java.lang.Class<? extends org.apache.hadoop.io.RawComparator> cls)
                                           throws java.lang.IllegalStateException
Define the comparator that controls which keys are grouped together for a single call to combiner, Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)

Parameters:
cls - the raw comparator to use
Throws:
java.lang.IllegalStateException - if the job is submitted

setGroupingComparatorClass

public void setGroupingComparatorClass(java.lang.Class<? extends org.apache.hadoop.io.RawComparator> cls)
                                throws java.lang.IllegalStateException
Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)

Parameters:
cls - the raw comparator to use
Throws:
java.lang.IllegalStateException - if the job is submitted
See Also:
setCombinerKeyGroupingComparatorClass(Class)

setJobName

public void setJobName(java.lang.String name)
                throws java.lang.IllegalStateException
Set the user-specified job name.

Parameters:
name - the job's new name.
Throws:
java.lang.IllegalStateException - if the job is submitted

setUserClassesTakesPrecedence

public void setUserClassesTakesPrecedence(boolean value)
Set the boolean property for specifying which classpath takes precedence - the user's one or the system one, when the tasks are launched

Parameters:
value - pass true if user's classes should take precedence

getTrackingURL

public java.lang.String getTrackingURL()
Get the URL where some job progress information will be displayed.

Returns:
the URL where some job progress information will be displayed.

setupProgress

public float setupProgress()
                    throws java.io.IOException
Get the progress of the job's setup, as a float between 0.0 and 1.0. When the job setup is completed, the function returns 1.0.

Returns:
the progress of the job's setup.
Throws:
java.io.IOException

mapProgress

public float mapProgress()
                  throws java.io.IOException
Get the progress of the job's map-tasks, as a float between 0.0 and 1.0. When all map tasks have completed, the function returns 1.0.

Returns:
the progress of the job's map-tasks.
Throws:
java.io.IOException

reduceProgress

public float reduceProgress()
                     throws java.io.IOException
Get the progress of the job's reduce-tasks, as a float between 0.0 and 1.0. When all reduce tasks have completed, the function returns 1.0.

Returns:
the progress of the job's reduce-tasks.
Throws:
java.io.IOException

isComplete

public boolean isComplete()
                   throws java.io.IOException
Check if the job is finished or not. This is a non-blocking call.

Returns:
true if the job is complete, else false.
Throws:
java.io.IOException

isSuccessful

public boolean isSuccessful()
                     throws java.io.IOException
Check if the job completed successfully.

Returns:
true if the job succeeded, else false.
Throws:
java.io.IOException

killJob

public void killJob()
             throws java.io.IOException
Kill the running job. Blocks until all job tasks have been killed as well. If the job is no longer running, it simply returns.

Throws:
java.io.IOException

getTaskCompletionEvents

public TaskCompletionEvent[] getTaskCompletionEvents(int startFrom)
                                              throws java.io.IOException
Get events indicating completion (success/failure) of component tasks.

Parameters:
startFrom - index to start fetching events from
Returns:
an array of TaskCompletionEvents
Throws:
java.io.IOException

killTask

public void killTask(TaskAttemptID taskId)
              throws java.io.IOException
Kill indicated task attempt.

Parameters:
taskId - the id of the task to be terminated.
Throws:
java.io.IOException

failTask

public void failTask(TaskAttemptID taskId)
              throws java.io.IOException
Fail indicated task attempt.

Parameters:
taskId - the id of the task to be terminated.
Throws:
java.io.IOException

getCounters

public Counters getCounters()
                     throws java.io.IOException
Gets the counters for this job.

Returns:
the counters for this job.
Throws:
java.io.IOException

setCancelDelegationTokenUponJobCompletion

public void setCancelDelegationTokenUponJobCompletion(boolean value)
Sets the flag that will allow the JobTracker to cancel the HDFS delegation tokens upon job completion. Defaults to true.


submit

public void submit()
            throws java.io.IOException,
                   java.lang.InterruptedException,
                   java.lang.ClassNotFoundException
Submit the job to the cluster and return immediately.

Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

waitForCompletion

public boolean waitForCompletion(boolean verbose)
                          throws java.io.IOException,
                                 java.lang.InterruptedException,
                                 java.lang.ClassNotFoundException
Submit the job to the cluster and wait for it to finish.

Parameters:
verbose - print the progress to the user
Returns:
true if the job succeeded
Throws:
java.io.IOException - thrown if the communication with the JobTracker is lost
java.lang.InterruptedException
java.lang.ClassNotFoundException


Copyright © 2009 The Apache Software Foundation