There's two possibilities: Or you read the data in your map/reduce task from the distributed file system. Or you add it directly to the distributed cache. I just googled distributed cache size, and it can be controlled:
"The local.cache.size parameter controls the size of the DistributedCache. By default, it’s set to 10 GB."
So if you add the output of your first job to the distributed cache of the second you should be fine I think. Tens of thousands of entries are nowhere near the gigabyte range.
Adding a file to the distributed cache goes as follows:
TO READ in your mapper:
Path[] uris = DistributedCache.getLocalCacheFiles(context.getConfiguration());
String patternsFile = uris[0].toString();
BufferedReader in = new BufferedReader(new FileReader(patternsFile));
TO ADD to the DBCache:
DistributedCache.addCacheFile(new URI(file), job.getConfiguration());
while setting up your second job.
Let me know if this does the trick.