Question

I have written a bunch of web apps and know how to protect against mysql injections and such. I am writing a log storage system for a project in C and I was advised to make sure that it was hack free in the sense that the user could not supply bad data like foo\b\b\b and try to hack into the OS with some rm -rf /* kind of crud. I looked online and found a similar question here: how to check for the "backspace" character in C

This is at least what I thought of, but I know there are probably other things I need to protect against. Can someone who has a bit more experience help me list out the things I need to validate when I am saving files onto a server using user input as part of the hierarchical file naming system?

Example file: /home/webapp/data/{User input}/{Machine-ID}/{hostname}/{tag} where all of these fields could be "faked" when submitted to our log storing system.

No correct solution

OTHER TIPS

Instead of checking for bad characters, turn the problem on its head and specify the good characters. E.g. require {User Input} be a single directory name made of [[:alnum:]_] characters; {Machine-ID} must be made of [[:xdigit:]] to your liking, etc. That gets rid of all the injection stuff quickly.

If you're only ever using these inputs as file names inside your program, and you're storing them on a native Linux filesystem, then the critical things to watch for are:

  • absolutely proscribe any file name starting with ../ or containing /../ or ending with /... Such file names could allow the user to reach files outside the directory tree that you're working in.
  • Be wary of any file name containing / as these allow the user to name subdirectories, possibly with unintended consequences.

Other things that could cause trouble include:

  • Non-ASCII characters that may have a different meaning if used in a different locale.
  • Some ASCII punctuation characters may have a special meaning in parts of your processing system or may be invalid in some filesystems.
  • Some parts of your system may be case-sensitive with other parts being case-insensitive. Consider normalizing the case.

If applicable, restrict each field to something that isn't going to cause any trouble. For example:

  • A machine ID should probably consist of only ASCII lower letters and digits (or only ASCII uppercase letters and digits).
  • A hostname should consist of only ASCII lowercase letters and digits, plus - but not in an initial position (use Punycode for non-ASCII host names). If these are fully qualified host names, as opposed to host names in a network, then . is also valid, but not in initial position.
  • No field should be empty or contain a / or start with a . (an initial . could be . or .. — see above — and would be a dot file that ls doesn't show by default and isn't included in the pattern * in shells, so they're best avoided).

While control characters such as backspace aren't directly harmful, they can be indirectly harmful in that if you're investigating an issue on the command line, they can cause you to make mistakes. Do not allow them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top