Question

I have a data javascript file, which is being dynamically added to website via some custom code. This file comes from a third party vendor, who could potentially add malicious code in the file

Before this file is added to the website, I would like to parse through it, and look for malicious code, such as redirects or alerts, that inherently get executed upon a files inclusion in the project/website.

For example, my js file could look like this :

alert ('i am malicious');
var IAmGoodData = 
[
{ Name :'test', Type:'Test2 },
{ Name :'test1', Type:'Test21' },
{ Name :'test2', Type:'Test22' }
]

I load this file into a object via a XMLHttpRequest call, and when this call returns, I can use the variable (which is my file text) and search it for words:

var client = new XMLHttpRequest();
client.open('GET', 'folder/fileName.js');

client.onreadystatechange = function() 
{
        ScanText(client.responseText);
}
client.send();

function ScanText(text)
{
        alert(text);
        var index = text.search('alert');  //Here i can search for keywords
}

The last line would return index of 0, as the word alert is found at index 0 in the file.

Questions:

  1. Is there a more efficient way to search for keywords in the file?
  2. What specific keywords should i be searching for to prevent malicious code being run? ie redirects, popups, sounds etc.....
Was it helpful?

Solution

Instead of having them include var IAmGoodData =, make them simply provide JSON (which is basically what the rest of the file is, or seems to be). Then you parse it as JSON, using JSON.parse(). If it fails, they either didn't follow the JSON format well, or have external code, and in either case you would ignore the response.

For example, you'd expect data from the external file like:

[
{ Name :'test', Type:'Test2' },
{ Name :'test1', Type:'Test21' },
{ Name :'test2', Type:'Test22' }
]

which needs to be properly serialized as JSON (double quotes instead of single quotes, and double quotes around the keys). In your code, you'd use:

var json;
try {
    json = JSON.parse(client.responseText);
catch (ex) {
    // Invalid JSON
}

if (json) {
    // Do something with the response
}

Then you could loop over json and access the Name and Type properties of each.

Random Note:

In your client.onreadystatechange callback, make sure you check client.readyState === 4 && client.status === 200, to know that the request was successful and is done.

OTHER TIPS

This is extremely difficult to do. There are no intrinsically malicious keywords or functions in JavaScript, there are malicious applications. You could be getting false positives for "malicious" activity and prevent a legitimate code with a real purpose from being executed. And at the same time, anyone with a little bit of imagination could bypass any "preventive" method you may implement.

I'd suggest you look for a different approach. This is one of those problems (like CAPTCHA) in which it's trivial for a human to solve while for a machine is practically impossible to do so. You could try having a moderator or some human evaluator to interpret the code and accept it.

You should have them provide valid JSON rather than arbitrary Javascript.
You can then call JSON.parse() to read their data without any risk of code execution.

In short, data is not code, and should not be able to contain code.

You shouldn't. The user should be allowed to type whatever they want, and it's your job to display it.

It all depends on where it is being put, of course:

  • Database: mysql_real_escape_string or equivalent for whatever engine you're using.
  • HTML: htmlspecialchars in PHP, createTextNode or .replace(/</g,"&lt;") in JavaScript
  • JavaScript: json_encode in PHP, JSON.stringify in JavaScript.

At the end of the day, just don't be Yahoo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top