Bootstrap actions are just standard unix scripts. Ensure the shebang points to an interpreter on the machines in your cluster and you're good to go.
When you say source files do you mean your pig scripts? These can be ran directly off s3. If you are talking about data, you should read these directly off s3 unless you have a use case for copying to the cluster first(like scanning the same data multiple times)