org.apache.hadoop.mapred.lib
Class InputSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.InputSampler<K,V>
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class InputSampler<K,V>
extends java.lang.Object
implements org.apache.hadoop.util.Tool

Utility for collecting samples and writing a partition file for TotalOrderPartitioner.


Nested Class Summary
static class InputSampler.IntervalSampler<K,V>
          Sample from s splits at regular intervals.
static class InputSampler.RandomSampler<K,V>
          Sample from random points in the input.
static interface InputSampler.Sampler<K,V>
          Interface to sample using an InputFormat.
static class InputSampler.SplitSampler<K,V>
          Samples the first n records from s splits.
 
Constructor Summary
InputSampler(JobConf conf)
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConf()
           
static void main(java.lang.String[] args)
           
 int run(java.lang.String[] args)
          Driver for InputSampler from the command line.
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
static
<K,V> void
writePartitionFile(JobConf job, InputSampler.Sampler<K,V> sampler)
          Write a partition file for the given job, using the Sampler provided.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InputSampler

public InputSampler(JobConf conf)
Method Detail

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

writePartitionFile

public static <K,V> void writePartitionFile(JobConf job,
                                            InputSampler.Sampler<K,V> sampler)
                               throws java.io.IOException
Write a partition file for the given job, using the Sampler provided. Queries the sampler for a sample keyset, sorts by the output key comparator, selects the keys for each rank, and writes to the destination returned from TotalOrderPartitioner.getPartitionFile(org.apache.hadoop.mapred.JobConf).

Throws:
java.io.IOException

run

public int run(java.lang.String[] args)
        throws java.lang.Exception
Driver for InputSampler from the command line. Configures a JobConf instance and calls writePartitionFile(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.lib.InputSampler.Sampler).

Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception


Copyright © 2009 The Apache Software Foundation