org.apache.hadoop.examples.terasort
Class TeraGen
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.examples.terasort.TeraGen
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class TeraGen
- extends org.apache.hadoop.conf.Configured
- implements org.apache.hadoop.util.Tool
Generate the official terasort input data set.
The user specifies the number of rows and the output directory and this
class runs a map/reduce program to generate the data.
The format of the data is:
- (10 bytes key) (10 bytes rowid) (78 bytes filler) \r \n
- The keys are random characters from the set ' ' .. '~'.
- The rowid is the right justified row id as a int.
- The filler consists of 7 runs of 10 characters from 'A' to 'Z'.
To run the program:
bin/hadoop jar hadoop-examples-*.jar teragen 10000000000 in-dir
Nested Class Summary |
static class |
TeraGen.SortGenMapper
The Mapper class that given a row number, will generate the appropriate
output line. |
Method Summary |
static void |
main(java.lang.String[] args)
|
int |
run(java.lang.String[] args)
|
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
TeraGen
public TeraGen()
run
public int run(java.lang.String[] args)
throws java.io.IOException
- Specified by:
run
in interface org.apache.hadoop.util.Tool
- Parameters:
args
- the cli arguments
- Throws:
java.io.IOException
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
Copyright © 2009 The Apache Software Foundation