The best way of handing a form post as an API type of submission

https://softwareengineering.stackexchange.com/questions/314681

13-12-2020
|

سؤال

I have recently been tasked with a project to create an API that handles a form post using PHP.

I have never done something like this before and this will be handling payments so I would like to do it as best I can. Basically the objective of the project as a whole is to incorporate multiple payment API's into one single API that people can integrate with using a form post like this:

<form METHOD="POST" ACTION="https://MyURL/index.php" id=aForm name=aForm>
    <input type="hidden" id="Lite_Merchant_ApplicationID" name="Lite_Merchant_ApplicationID" value="Your Application Id">
    (various other fields to be processed)
</form>

The project will have to include documentation and the URL will contain version info, so once the system is up it is highly unlikely that it will change (changes will be made at a different URL).

People should be able to integrate in using any language as long as they can perform the form post, but like I said I will have to develop in PHP.

I have looked at creating a RESTful API but I am not sure if I could apply that to a form submission or if I can just handle the form as I normally would. Seeing that this will be handling payments I would also like to make it as secure as possible(the actual URL will also be a "https" if that makes any difference).

I have no problem doing the validation and the form will have to contain some login credentials etc. I am mainly concerned with getting the actual form data in a manner which can handle high volumes and is not prone to breaking, from there I should be fine.

I am a little lost and unsure of how to proceed and what to Google, any advise or guidance will be appreciated and thank you in advance

المحلول

The main issue here is that data coming from the client cannot be trusted. So you need to be concerned about replay attacks and people tampering your messages.

Tamper-protection

I have actually implemented a real-world client for a payment service very similar to what you have described, and for tamper protection it used a message authentication code (MAC) system. As that is the system I know the best, I'll describe it here briefly, and you can read more about the others from Wikipedia or other sources.

A MAC is used to make sure that a message comes from the stated sender and not just someone pretending to be them. It requires that you have a shared secret between the parties -- basically a password. The secret is then used as a "salt" for a hash over the entire message.

Consider a bank with a message format "username;command". If you wanted to make a one-time transfer to me, the command the system produced for you would look like this: "Dave;Transfer $10 to ZeroOne". Now, anyone could produce messages like that and seriously mess up that bank! OK, so add a password? "Dave;asd123;Transfer $100 to ZeroOne". Little better, but if anyone intercepted that message, or spied your username and password, it's no better than before. The command could also be changed from "Transfer $10 to ZeroOne" to "Transfer $1000 to ZeroOne" and nobody would notice until after I would've spent that extra money already. ;)

A MAC would protect the authenticity (it was sent by Dave) and the integrity (it wasn't changed on its way) of the message. Let's change the protocol to "username;command;MAC". You see I've removed the password -- it's now actually a part of the MAC. The MAC could be formed by calculating a hash over all the fields of the message and the password. So for the sake of simplicity let's take an MD5 hash (don't use it in real life, it's not considered cryptographically secure anymore!). You would calculate a hash of the string "Dave;Transfer $10 to ZeroOne;asd123" and end up with "ce55ff1f69399f2f09adaee03d51e3cb". To the wire, you'd send the message "Dave;Transfer $10 to ZeroOne;ce55ff1f69399f2f09adaee03d51e3cb"! Now, I don't have the computing resources to reverse an MD5 hash to figure out that the secret is "asd123" so I cannot tamper the message.

The server receiving the message, however, has a trivial task of looking up that Dave's secret is "asd123", calculating the expected hash, and noticing that it matches the received hash so the message must have been sent by Dave and it must not have been altered after Dave "signed" it. (Notice again that the password should be much longer than "asd123", say, 20 characters or more.)

So, a MAC prevents tampering the message. If anyone now changes any value of any of the fields, the MAC doesn't match the one that the server expects and the message is discarded.

Replay attack

Right now our banking protocol is "username;command;MAC", and the messages cannot be faked. However, If I could intercept that message "Dave;Transfer $10 to ZeroOne;ce55ff1f69399f2f09adaee03d51e3cb" I could replay it to the bank and $10 by $10 empty your account! That, basically, is a replay attack.

The message could've been intercepted and replayed by a man-in-the-middle, or it could be sent again and again by the original sender, either maliciously or accidentally (think "Why isn't this page loading?? I'll click again!"). It doesn't matter, you just need to prevent duplicate messages from having an effect on your system.

So we need to make sure that each and every message is unique, and the receiving end should discard duplicate messages. The Wikipedia article about replay attacks lists (at least some of) the available countermeasures: session tokens, one-time passwords and nonces with message authentication codes (MAC).

A nonce is short for "number used only once". The server would keep a record of the nonces seen so far and it would discard duplicates. A nonce often includes a time-based component, such as the current Unix timestamp to a millisecond precision. It would, however, be a good idea to combine it with some unique client id, which in your case could be the order reference id.

As I'm typing the current millisecond precision timestamp is 1460496283511. If my order reference id was 123456, the nonce could be 1460496283511123456, i.e. just the two values concatenated. It doesn't really even need to be a number, it could've been "123456-3511123456" i.e. the reference id, a dash, and the last 10 digits of the timestamp. Let's see how our protocol might look like now: "username;command;nonce;MAC". And the resulting message, as well as the MAC, would be different every time, thanks to the ever-changing nonce. (Again, the server needs to keep track of the nonces that have been used so far.)

General observations

If you end up using a hash code, which is usually rendered as a string of hex digits, you should document its format carefully: whether it should be uppercase or lowercase, or if either one will do, and which character encoding (such as UTF-8) should be used for the fields when the hash is calculated over them. Believe me, I've had problems with both: a test environment correctly using UTF-8 but a production environment using ISO-8859-1, rendering the hashes for messages with certain characters (like ä or ö, very common in Finnish and Swedish) entirely different in the two environments. Also, I've seen a hash string being compared with a plain "equals" method, so that "01234567890abcdef" and "0123456789ABCDEF" were seen as different, non-matching hashes.

About the API; I'm not sure you need to think about any restfullness at the moment. You can create any kind of API you wish, but just be careful that the HTTP response codes are actually meaningful. "200 OK" only when the post succeeds (or "303 See Other" if it's an immediate redirect as a response to POST), "401 Unauthorized" when an authentication is required, "403 Forbidden" when an authentication succeeds but the user does still not have the right to do whatever it was that they were trying to do, and, perhaps, "400 Bad Request" when the MAC is not what it is supposed to be.

Finally, about the shared secrets. Instead of a hash function like SHA-512 you might think you'd like to use some fancy key derivation function, like PBKFD2. However, consider the fact that it's an intentionally slow function to perform. The more traffic your server receives the more time it will be spending on cryptography only, and CPU will soon become a bottleneck. SHA-512 and such functions are fast. This also applies to the secrets: you might want to store them in their encrypted form, which is nice, but then you must make sure the server caches the decrypted result. Otherwise, again, the CPU becomes a bottleneck when, for each and every request, the stored password is decrypted in order to be hashed and compared against the MAC. I have seen this happen too.

Bottom line

You should care about preventing replay attacks and the tampering of the messages. You should not trust any data sent from the client, everything must be validated on the server side again. Use HTTP status codes wisely, and choose the right algorithms for the right purposes.

نصائح أخرى

Yes you can create a RESTful service to receive the form submission. If you specify that people submitting data to your RESTful PHP page do so by encoding the data as "application/x-www-form-urlencoded" content (the default anyway) PHP will be able to access the form data easily.

The beauty of RESTful services is that they process data like form submissions normally. You would then want to write out some JSON from your PHP page to tell the caller whether the submission was a success (and maybe some other information).

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange