I am beginning to use the Hadoop variant of MapReduce and therefore have zero clue about the ins and outs. I understand how conceptually it's supposed to work.
My problem is to find a specific search string within a bunch of files I have been provided. I am not interested about the files - that's sorted. But how would you go about asking for input? Would you ask within the JobConf section of the program? If so, how would I pass the string into the job?
If it's within the map() function, how would you go about implementing it? Wouldn't it just ask for a search string every time the map() function is called?
Here's the main method and JobConf() section that should give you an idea:
public static void main(String[] args) throws IOException {
// This produces an output file in which each line contains a separate word followed by
// the total number of occurrences of that word in all the input files.
JobConf job = new JobConf();
FileInputFormat.setInputPaths(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("output"));
// Output from reducer maps words to counts.
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
// The output of the mapper is a map from words (including duplicates) to the value 1.
job.setMapperClass(InputMapper.class);
// The output of the reducer is a map from unique words to their total counts.
job.setReducerClass(CountWordsReducer.class);
JobClient.runJob(job);
}
And the map() function:
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
// The key is the character offset within the file of the start of the line, ignored.
// The value is a line from the file.
//This is me trying to hard-code it. I would prefer an explanation on how to get interactive input!
String inputString = "data";
String line = value.toString();
Scanner scanner = new Scanner(line);
while (scanner.hasNext()) {
if (line.contains(inputString)) {
String line1 = scanner.next();
output.collect(new Text(line1), new LongWritable(1));
}
}
scanner.close();
}
I am led to believe that I don't need a reducer stage for this problem. Any advice/explanations much appreciated!
Aucun commentaire:
Enregistrer un commentaire