Question

I've written an InvertedIndex java program, where given a word, it searches the word in a certain static array of string, where each string is a url that needs to be searched on. It finally returns a list of all the url's the word was found in.

Here's my relevant code:

static final String[] URL_SEARCH_LIST = {
        "http://www.cnn.com", "http://www.daniel.com", "http://www.amazon.com"
    };
private static List<String> search (String query) {
        try {
            List<String> urlList = new ArrayList<String>();
            for (String site : URL_SEARCH_LIST) {
                URL url = new URL(site);
                HttpURLConnection conn = (HttpURLConnection) url.openConnection();
                conn.setRequestMethod("GET");
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        (conn.getInputStream())));
                String htmlContent;
                while ((htmlContent = br.readLine()) != null) {
                    if (htmlContent.contains(query)) {
                        urlList.add(site);
                        break;
                    }
                }
            }
            System.out.println("Search for: " + query + " Is Done!");
            return urlList;

        } catch (Exception e) {
            System.out.println(e.getMessage());
            return null;
        }
    }

Now I would like to make this run on Amazon EMR, which means I need to convert my program to a Map-Reduce program which does the same thing.

Given this code, can someone please help me to start? I didn't fully understand the concept of map and reduce...

Thanks in advance

Was it helpful?

Solution

Map-reduce is basically just divide and conquer plus a lot of infrastructure, so divide on (map) your URL_SEARCH_LIST array, create each local urlList, and combine (reduce) all of the urlLists for the final output

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top