If you're using the grunt shell then the obvious way to do this is to call DUMP n;
, wait for the job to finish running and then copy the value into your define bloom...
call.
That's not a very satisfying answer, I'm guessing. Most likely you'll want to run this in a script. Here is a very hacky way to do it. You'll need 3 files:
'n_start.txt' which contains:
n='
'n_end.txt' which contains the single character:
'
'bloom_build.pig' which contains:
define bb BuildBloom('jenkins', '$n', '0.0001');
Once you have those you can run this script:
records = LOAD '$input' using PigStorage();
records = FOREACH records GENERATE
(long) $0 AS value_fld:long,
(chararray)$1 AS filter_fld:chararray;
records_fltr = FILTER records by (filter_fld=='$filter_value')
AND (value_fld is not null);
records_grp = GROUP records_fltr all;
records_count = FOREACH records_grp GENERATE
(chararray) COUNT(records_fltr.value_fld) AS count:chararray;
n = FOREACH records_count GENERATE flatten(count);
--the new part
STORE records_count INTO 'n' USING PigStorgae(',');
--this will copy what you just stored into a local directory
fs -copyToLocal n n
--this will cat the two static files we created prior to running pig
--with the count we just generated. it will pass it through tr which will
--strip out the newlines and then store it into a file called 'n.txt' which we
--will use as a parameter file
sh cat -s nstart.txt n/part-r-00000 nend.txt| tr -d '\n' > n.txt
--RUN makes pig call one script within another. Be forewarned that if pig returns
--a message with an error on a certain line, it is the line number of the expanded script
RUN -param_file n.txt bloom_bulid.pig;
After this, you can call bb
as you had previously intended to do. It's ugly and possibly someone better versed in unix could get rid of the n_start.txt and n_end.txt files.
The other option that is cleaner but more involved is to write a new UDF that (like BuildBloom) extends BuildBloomBase.java but has an empty constructor and can handle everything in the exec() method.