Javascript - Sanitize Malicious code from file (string)

Question 1

Instead of having them include var IAmGoodData =, make them simply provide JSON (which is basically what the rest of the file is, or seems to be). Then you parse it as JSON, using JSON.parse(). If it fails, they either didn't follow the JSON format well, or have external code, and in either case you would ignore the response.

For example, you'd expect data from the external file like:

[
{ Name :'test', Type:'Test2' },
{ Name :'test1', Type:'Test21' },
{ Name :'test2', Type:'Test22' }
]

which needs to be properly serialized as JSON (double quotes instead of single quotes, and double quotes around the keys). In your code, you'd use:

var json;
try {
    json = JSON.parse(client.responseText);
catch (ex) {
    // Invalid JSON
}

if (json) {
    // Do something with the response
}

Then you could loop over json and access the Name and Type properties of each.

Random Note:

In your client.onreadystatechange callback, make sure you check client.readyState === 4 && client.status === 200, to know that the request was successful and is done.

Question 2

This is extremely difficult to do. There are no intrinsically malicious keywords or functions in JavaScript, there are malicious applications. You could be getting false positives for "malicious" activity and prevent a legitimate code with a real purpose from being executed. And at the same time, anyone with a little bit of imagination could bypass any "preventive" method you may implement.

I'd suggest you look for a different approach. This is one of those problems (like CAPTCHA) in which it's trivial for a human to solve while for a machine is practically impossible to do so. You could try having a moderator or some human evaluator to interpret the code and accept it.

Question 3

You should have them provide valid JSON rather than arbitrary Javascript.
You can then call JSON.parse() to read their data without any risk of code execution.

In short, data is not code, and should not be able to contain code.

Question 4

You shouldn't. The user should be allowed to type whatever they want, and it's your job to display it.

It all depends on where it is being put, of course:

Database: mysql_real_escape_string or equivalent for whatever engine you're using.
HTML: htmlspecialchars in PHP, createTextNode or .replace(/</g,"<") in JavaScript
JavaScript: json_encode in PHP, JSON.stringify in JavaScript.

At the end of the day, just don't be Yahoo