Question

Users submit code (mainly java) on my site to solve simple programming challenges, but sending the code to a server to compile and execute it can sometimes take more than 10 seconds.

To speed up this process, I plan to first check the submissions database to see if equivalent code has been submitted before. I realize this will cause Random methods to always return the same result, but that doesn't matter much. Is there any other potential problem that could be caused by not running the code?

To find matches, I remove comments and whitespace when comparing code. However, the same code can still be written in different ways, such as with different variable names. Is there a way to compare code that will find more equivalent code?

Was it helpful?

Solution 2

Variable names:
You can write code to match variable names in one file with the variable names in the other, then you can replace both sets with a consistent variable name.

File 1: var1 += this(var1 - 1);

File 2: sum += this(sum - 1);

After you read File 1, you look for what variable name File 2 is using in the place of sum, then make the variable names the same across both files.
*Note, if variables are used in similar ways you may get incorrect substitutions. This is most likely when variables are being declared. To help mitigate this, you can start searching for variable names at the bottom of the file and work up.

Short hands:
Force {} and () braces into each if/else/for/while/etc...
rewrite operations like "i+=..." as "i=i+..."

Functions:
In cases where function order doesn't matter, you can make sure functions are equivalent and then ignore them.

Operator precedence:
"3 + (2 * 4)" is usually equivalent to "2 * 4 + 3"
A way around this could be by determining the precedence of each operation and then matching it to an operation of the same precedence in the other set of code. Once a set of operations have been matched, you can replace them with a variable to represent them.

Ex.

(2+4) * 3 + (2+6) * 5 == someotherequation
//substitute most precedent: (2+4) and (2+6) for a and b  
... a * 3 + b * 5   
//substitute most precedent: (a*3) and (b*5) for c and d   
... c + d   
//substitute most precedent....   

These are just a couple ways I could think of. If you do it this way, it'll end up being quite a big project... especially if you're working with multiple languages.

OTHER TIPS

You could store a SHA1 hash of the code to compare with a previous submission. You are right that different variable names would give different hashes. Try running the code through a minifier or obfuscator. That way, variable cat and dog will both end up like a1, then you could see if they are unique. The only other way would be to actually compile it into bytecode, but then it's too late.

Instead of analyzing the source code, why not speed up the compilation? Try having a servlet container always running with a custom ClassLoader, and use the JDK tools.jar to compile on the fly. You could even submit the code via AJAX REST and get the results back the same way.

Consider how Eclipse compiles your files in the background.

Also, consider how http://ideone.com implements their online compiler.

FYI It is a big security risk to allow random code execution. You have to be very careful about hackers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top