Question

from the app-engine mapreduce console (myappid.appspot.com/mapreduce/status) I have a mapreduce defined with input_reader: mapreduce.input_readers.BlobstoreLineInputReader that I have used successfully with a regular blobstore file, but it doesn't work with a Blobkey created from cloud storage with create_gs_key. when I run it, I get the error "BadReaderParamsError: Could not find blobinfo for key THEKEY". The input reader checks for the existence of a BlobInfo. Is there any work around to this? shouldn't BlobInfo.get(BLOBKEY FROM CS) return a blobinfo?

to get a blob_key from a google cloud storage file, I run this:

from google.appengine.ext import blobstore
READ_PATH = '/gs/mybucket/myfile.json'
blob_key =  blobstore.create_gs_key(READ_PATH)
print blob_key
Was it helpful?

Solution

A community member created a LineInputReader for Cloud Storage as an issue on the appengine-mapreduce library: http://code.google.com/p/appengine-mapreduce/issues/detail?id=140

We've posted our modifications here: https://github.com/thinkjson/CloudStorageLineInputReader

We're using this to do MapReduce over about 4TB of data, and have been happy with it so far.

OTHER TIPS

Cloud Storage and BlobStore are two different storages, you can't pass a key from the Cloud Storage as a BlobStore key.
You will need to implement your own line reader over Cloud Storage file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top