문제

I have a few HTML forms, and I am implementing filtering of these fields on the server-side (using Java Servlets), and I was wondering what I should allow, or perhaps what I should disallow. For e-mail addresses I remove anything that matches this:

[^A-Za-z0-9._%-@]

What are some similar rules I could apply to name, message and phone number fields.

I'm assuming that < and > should be escaped as &lt; and &gt;, what else should I replace?

Along those lines, are there any recommendations for the maximum length allowed for such fields?

도움이 되었습니까?

해결책

You need to escape & to &amp; first, then < to &lt;. Contrary to popular belief, it is not necessary to escape > to &gt;. There is no need to protect the bracket that closes an HTML tag if there is no way to open one.

Your call on whether it should be escaped before being written to the database, or whether you should do it as it's read from the database each time. Doing it on the input side is going to be faster; doing it on the output side is going to be more secure and also make interchanging data with other apps easier if you don't have to always unescape stuff before sending it off to another app. I personally would pay the performance price and unescape on the output side. Caching can help.

The rest of the validation you'll want to do depends on the type of data. For an e-mail address, check to make sure it has an @ and at least one . after that, then, if you care whether it's valid or not, send the address a test e-mail. It is next to impossible to completely validate an e-mail address much further than that, and even if the address is syntactically valid, that still doesn't mean it can be delivered. Similarly, allow almost anything as a URL and then try to retrieve it to see if it's valid. For a billing/shipping address, use the USPS Web service to validate and get the data in the best format (for U.S. addresses).

다른 팁

You should allow anything through for names. Consider "O'Malley" or "Hudson-Walker". Some languages (such as Salish) include numbers so you can have "Sqwxwu7mish". Then there are accented characters, Hebrew, Cyrillic, Greek, Chinese, Korean, and even the musician formerly known as Prince.

Message text should be similarly unconstrained. If messages can contain HTML then you'll have to parse the HTML (with a real HTML parser) and apply tag and attribute whitelists to only allow things through that you are expecting.

Phone numbers should be pretty close to free form too. North American formats are different than European ones, some people like to say "(555) 555-5555" while others like "555-555-5555", some phone numbers have extensions and some don't.

The only encoding that you should worry about on input is that everything is in UTF-8 (including your database). And, when talking to your database, don't try to encode anything yourself, use the database driver's quoting mechanism and placeholders.

Lengths should generally be a lot bigger than you think they should so double (at least) your first guess at a reasonable maximum. The storage difference between 20 characters for a name and 100 isn't going to be important for most applications so be generous.

You shouldn't worry about HTML encoding until output and then you should use whatever HTML and URL encoding tools your environment supports, do not try to build your own.

Don't over-constrain your inputs, be as loose and forgiving as possible. Be very strict with your outputs though.

Maximum length: I always apply a max length on my fields on the client side and server side. The values match the max values set in the database.

I agree with escaping <,> and &gt,&lt.

I think it is a good habit to have very good validation. If I were working with name,message and phone number fields I would do the following.

For each text box make it so that the textbox won't take the invalid values at all.
Name: aA-zZ
Message: 'aA-zZ' '0-9' '.' ',' ';' etc..
Phone number:'0-9' Don't allow any space but do allow '-', you can always parse the string server side.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top