Check out the description in the ticket originally requesting this feature: https://issues.apache.org/jira/browse/PIG-1926
I haven't tested this, but it looks like this should work:
raw = LOAD '/data_dir';
samplerate = FOREACH (GROUP raw ALL) GENERATE 1000.0/COUNT_STAR(raw) AS rate;
thousand = SAMPLE raw samplerate.rate;
The important thing is to refer to your scalar by name (rate
), not by position ($0
).