lundi 20 avril 2015

Manipulating a user input string in MapReduce

I am beginning to use the Hadoop variant of MapReduce and therefore have zero clue about the ins and outs. I understand how conceptually it's supposed to work.

My problem is to find a specific search string within a bunch of files I have been provided. I am not interested about the files - that's sorted. But how would you go about asking for input? Would you ask within the JobConf section of the program? If so, how would I pass the string into the job?

If it's within the map() function, how would you go about implementing it? Wouldn't it just ask for a search string every time the map() function is called?

Here's the main method and JobConf() section that should give you an idea:

public static void main(String[] args) throws IOException {

    // This produces an output file in which each line contains a separate word followed by
    // the total number of occurrences of that word in all the input files.

    JobConf job = new JobConf();

    FileInputFormat.setInputPaths(job, new Path("input"));
    FileOutputFormat.setOutputPath(job, new Path("output"));

    // Output from reducer maps words to counts.
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);

    // The output of the mapper is a map from words (including duplicates) to the value 1.
    job.setMapperClass(InputMapper.class);

    // The output of the reducer is a map from unique words to their total counts.
    job.setReducerClass(CountWordsReducer.class);

    JobClient.runJob(job);
}

And the map() function:

public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

    // The key is the character offset within the file of the start of the line, ignored.
    // The value is a line from the file.

    //This is me trying to hard-code it. I would prefer an explanation on how to get interactive input!
    String inputString = "data"; 
    String line = value.toString();
    Scanner scanner = new Scanner(line);

    while (scanner.hasNext()) {
        if (line.contains(inputString)) {
            String line1 = scanner.next();
            output.collect(new Text(line1), new LongWritable(1));
        }
    }
    scanner.close();
}

I am led to believe that I don't need a reducer stage for this problem. Any advice/explanations much appreciated!

Aucun commentaire:

Enregistrer un commentaire