Whenever I had a similar requirement, I used almost the same approach being used by you as I couldn't find an efficiently working alternative.
However to make the process of Dynamic Partitioning
a bit fast, I tried setting few configuration parameters like:
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions = 2000;
set hive.exec.max.dynamic.partitions.pernode = 10000;
I am sure you must be using the first two, and the last two you can set depending on your data size.
You can check out this Configuration Properties page and decide for yourself which parameters might help in making your process fast e.g. increasing number of reducers used.
I can not guarantee that using this approach will save your time but definitely you will make the most out of your cluster set up.