Question

I have created the following code:

#!/usr/bin/env python
import mincemeat
import glob

all_files = glob.glob('textfiles/*.txt')

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

# The data source can be any dictionary-like object
datasource = dict((file_name, file_contents(file_name))
                  for file_name in all_files)

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

I am running this on my personal mac and running the client on the same machine. However, my question is if I run multiple clients on multiple machines, would the files be divided automatically? I mean will the mincemeat server assign the files to clients for processing? Also, in the example above I am not specifying a key in the mapper function. How can I specify a key e.g. a file name?

Was it helpful?

Solution

Yes, mincemeat will automatically spread the work evenly across clients (this is one of the central aims of MapReduce).

In your map function, each call to yield yields a key and a value. In this example, the key is the word you're currently iterating over.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top