Here is an example of the typical usage of the BlurOutputFormat. The Blur table has to be created before the MapReduce job is started. The setupJob method configures the following:
The reducer class to be DefaultBlurReducer
- The number of reducers to be equal to the number of shards in the table.
- The output key class to a standard Text writable from the Hadoop library
The output value class is a BlurMutate writable from the Blur library
The output format to be BlurOutputFormat
Sets the TableDescriptor in the Configuration
Sets the output path to the TableDescriptor.getTableUri() value
Also the job will use the BlurOutputCommitter class to commit or rollback the MapReduce job
Iface client = BlurClient.getClient("controller1:40010"); TableDescriptor tableDescriptor = client.describe(tableName); Job job = new Job(jobConf, "blur index"); job.setJarByClass(BlurOutputFormatTest.class); job.setMapperClass(CsvBlurMapper.class); job.setInputFormatClass(TextInputFormat.class); FileInputFormat.addInputPath(job, new Path(input)); CsvBlurMapper.addColumns(job, "cf1", "col"); BlurOutputFormat.setupJob(job, tableDescriptor); BlurOutputFormat.setIndexLocally(job, true); BlurOutputFormat.setOptimizeInFlight(job, false); job.waitForCompletion(true);
Enabled by default, this will enable local indexing on the machine where the task is running. Then when the RecordWriter closes the index is copied to the remote destination in HDFS.
- Sets the maximum number of documents that the buffer will hold in memory before overflowing to disk. By default this is 1000 which will probably be very low for most systems.
- Enabled by default, this will optimize the index while copying from the local index to the remote destination in HDFS. Used in conjunction with the setIndexLocally.
- This will multiple the number of reducers for this job. For example if the table has 256 shards the normal number of reducers is 256. However if the reducer multiplier is set to 4 then the number of reducers will be 1024 and each shard will get 4 new segments instead of the normal 1.