There's probably more than one way to skin a cat here, but what comes to mind for me is a multi-table insert and using rand() to do the split:
from (
select *, (rand() * 100 <= x) as is_test_set from my_table
) t
insert overwrite directory '/test_set' select * where is_test_set = true
insert overwrite directory '/training_set' select * where is_test_set = false;
Using a row hash would also work. I would be weary of using a hash or partitioning on any actual data column, though; it may skew your sampling.