Can I save & store a user's submission in a way that proves that the data has not been altered, and that the timestamp is accurate?

https://softwareengineering.stackexchange.com/questions/244723

04-10-2020
|

Question

There are many situations where the validity of the timestamp attached to a certain post (submission of information) might be invaluable for the post owner's legal usage. I'm not looking for a service to achieve this, as requested in this great question, but rather a method for the achievement of such a service.

For the legal (in most any law system) authentication of text content and its submission time, the owner of the content would need to prove:

that the timestamp itself has not been altered and was accurate to begin with.
that the text content linked to the timestamp had not been altered

I'd like to know how to achieve this via programming (not a language-specific solution, but rather the methodology behind the solution).

Can a timestamp be validated to being accurate to the time that the content was really submitted?
Can data be stored in a form that it can be read, but not written to, in a proven way?

In other words, can I save & store a user's submission in a way that proves that the data has not been altered, and that the timestamp is accurate?

I can't think of any programming method that would make this possible, but I am not the most experienced programmer out there. Based on MidnightLightning's answer to the question I cited, this sort of thing is being done.

Clarification: I'm looking for a method (hashing, encryption, etc) that would allow an average guy like me to achieve the desired effect through programming.

I'm interested in this subject for the purpose of Defensive Publication.

I'd like to learn a method that allows an every-day programmer to pick up his computer, write a program, pass information through it, and say:

I created this text at this moment in time, and I can prove it.

This means the information should be protected from the programmer who writes the code as well. Perhaps a 3rd party API would be required. I'm ok with that.

Solution

Sure.

You can take the content, append the textual representation of the timestamp), and then generate a cryptographically secure hash to the result which you store along with the data. In the future, you can take the content that you've stored, the timestamp that you've stored, and repeat the process. If the hash comes out the same, you can say with high confidence the content is unchanged.

Of course, this is only as secure as the hash algorithm that you choose. Any hash will have some risk of collisions but a secure hash will make this risk incredibly small. Ideally, if a hash produces a 32-bit number, you'd expect the odds of a collision to be 1/2^32. A less secure hash will create the possibility that an attacker could change the timestamp and make a subtle change to the content that would produce the same hash when the two were concatenated together. Even that, though, would generally require a pretty sophisticated attacker-- generating non-obvious hash collisions for even relatively weak hashes like MD4 (i.e. that don't require that the text is completely garbled) is rather challenging.

Assuming a perfect hash algorithm, any given string is converted deterministically into a hash but every hash is equally likely given the universe of inputs and which has no recoverable information about the original input string. In a perfect world, if I have a 32-bit hash algorithm, knowing nothing about the string that I input, any of the 2^32 possible outputs would be equally likely. Changing 1 character of the input string would also make any of the 2^32 possible outputs equally likely. That makes it impractical, short of brute-force generating and trying 2^32 inputs, to find another input that produces a collision with any particular hash. Of course, real hash algorithms likely aren't perfect (just like real encryption algorithms likely aren't perfect) so even the best algorithms probably aren't as tough to break as their raw bit-length would predict. But they're likely good enough that no attacker is going to be able to produce a collision over the lifetime that you care about the information.

OTHER TIPS

Traditionally, you would use timestamping services for this sort of thing. A timestamp service is a well-known service that performs the following tasks:

You send a small bit of data to the service (such as a hash).
The service combines this data with the current timestamp.
The combined data is signed with the service's private key.
The service sends the combined data and signature back to the client.

This allows anybody to validate that the given hash was signed at the given time, by validating the signature using the service's public key.

In answer to your questions:

Can a timestamp be validated to being accurate to the time that the content was really submitted?

Yes, if everybody trusts the timestamp service (the timestamp service can be neutral and can sign timestamp submitted by friends and enemies alike). Consider for example a timestamp service offered by a national government (if you only care about transactions within a particular country), or by an entity such as a Swiss bank.

Can data be stored in a form that it can be read, but not written to, in a proven way?

Yes, because the signature of the timestamp service can be verified at any time using the service's public key.

NOTE: Both "Yes" above come with the usual caveats of properly implemented services. If the timestamp service is compromised in any way, signed timestamps may become suspect.

There are other solutions to this problem as well. The Wikipedia article on Trusted timestamping gives a good overview.

There is no way to do what you want with pure digital information theory. However, there is a way to use a trusted assistant to help you. The classic version of this involves posting information in a classified column on a particular day in a particular newspaper. It is exceedingly difficult to forge such a post after the fact, so it is generally treated as trustworthy.

Now the real challenge is deciding what information to post. Obviously posting all of your timestamped data to the classifieds would get old after a while. However, posting a hash of all of the timestamped data you collected (including hashing the timestamps themselves), and posting that to the newspaper daily would be reasonable. Then, if you have to prove the timestamp in a court of law, you merely reveal all timestamps in that block. Anyone verifying you simply needs to hash the data, then go look in the newspaper for that hash.

This of course, only provides one bound - you can't have possibly generated the timestamped data after the newspaper received the hash, but it could have been generated any time before hand and delayed until you "needed it." For many legal situations, just an upper bound is sufficient. For others, more careful bounds are needed. A common solution, which happens to be used by bitcoin, is to chain the data. Store your timestamped data, and hash it. You then create a chain of hashes combining this hash of timestamped data with the last entry in the chain. This does not prevent you from falsifying a time, but it does make it dramatically more difficult because now you have to falsify every entry that follows the injected data.

There are hundreds of variants of these verification schemes using tools like this, but they all fall into two steps:

Create a "proof" document which is small enough to be easy to manage (such as a hash)
Provide this "proof" document to a trusted third party who can reasonably vouch that it received the proof

Remember that legal requirements are usually to some level of doubt. This gives flexibility. For example, if you need to prove a log message's time to a minute, you might use a 2 step process. Daily, you post the hash of your entire log to a newspaper or other trusted entity. Every minute, you publish the hash of the logs on a public website. If nobody downloads those minute logs, they may try to discredit them, but they cannot discredit the newspaper data, and the mere fact that you constantly publish verifiable data may be sufficient to assuage doubt by demonstrating that you had a desire to be held to a strong standard.

I worked at a shop that stored images of legal documents in a database. In order to achieve security, all user documents were stored this way:

take a hashcode of the document
encrypt the document & store it
encrypt the hashcode & store it
take a hashcode of the encrypted document & store it
take a hashcode of the encrypted hashcode & store it

Now, it's still possible for someone who has all the encryption keys to change everything, but if any one of those get out of sync, they could detect the tampering.

(Full disclosure: to this day, I'm unsure if that process was brilliant or just a convincing way to sound brilliant without actually accomplishing anything.)

I'm not certain that's a sufficient, or even good, solution, but essentially the answer to your question boils down to hashcodes, encryption, and saving different pieces in different places.

In your case, it might be sufficient to save the text and the hash of the text in separate data stores. On any retrieval, check to make sure they still match. However, someone with root access could still defeat that if they knew what they needed to do.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange