When Files are Added to the Datastore
Binaries are stored in the datastore very early, usually just after setting the binary property to a node (even if the node is not saved, and the change is still in the so called "transient space"). That means the file is added to the datastore even before the transaction is committed.
Files in the datastore will stay there until garbage collection is run. That means, even if the transaction is rolled back, the files will be kept.
To get rid of unreferenced files, you need to run datastore garbage collection.
How to Run Garbage Collection
As documented in the Jackrabbit wiki page about the datastore, "garbage collection is used to purge unused objects". This is a management task that you would need to add to your application. As a general rule, it is recommended to run garbage collection in the evening or at the weekend, when the system is not busy.
Garbage collection will basically traverse the repository and mark all files that are still in use, and at the very end will remove those files that were not marked (mark & sweep).