문제

Lets take a look at this scenario: you have a textbox that allows the user to copy any kind of text (UTF8 or Chinese or Arabic characters), then a Submit button to insert that text into MySQL DB.

Normally, I use URLEncoder.encode(text,"UTF-8") & my App runs really stably; I never worried if the users inserted any special characters since the text was encoded so when I read the text, I just decoded it & the text came out exactly the way it was before.

But some guys said that we can set UTF8 in MySQL and Tomcat server or something so we don't need to encode, but this solution requires configuration and I hate configuration as it is not a very sound solution.

Besides, users can enter junk code to hack the DB.

So, In Java & MYSQL, is it good practice to encode text when it is inserted into the DB?

Some people in other forum said it is very bad to store encoded text in DB, but they don't say why it is bad.

So this question is for people who have a lot of experience in Java and MySQL to answer!

도움이 되었습니까?

해결책

The problem with putting URL or XML encoded text into the database is that makes life difficult for querying and doing other processing of that text.

The other problem is that there are different types of escaping that are required in different contexts.

... but this solution requires configuration & I hate configuration as it is not a very sound solution.

Ermm, asserting that configuration is "not a very sound solution" is not a rational argument. The vast majority of applications with a database component require some kind of database configuration.

Besides, users can enter junk code to hack the DB.

The real solution to SQL injection is to use PreparedStatement and fixed SQL query, insert, update, etc strings. Use placeholders for all of the query parameters and use the PreparedStatement set parameter methods to supply their values. This will properly quote the text in the parameters to remove the possibility of SQL injection attacks.

The other thing you need to worry about is people using unescaped XML / HTML metacharacters (like <, > and quotes) to effect XSS attacks against other users. The way to defeat that is to escape the text at the point you are creating the HTML. For instance, you can use the <c:out> to escape the text.

Finally, HTML URL encoded text can't be inserted directly into an HTML page. The URL encoding scheme (using %'s and +'s) is not the correct encoding scheme for text in an HTML page. There you need to use &...; character entities to encode things. A %xx in text will appear as exactly that when you display your web page in a browser. Try it and see!


Answering the questions in the comments:

iamthepiguy said "encode everything before putting it into Db", but u said "No". Suppose i put Html text into DB, there a lot of special characters & many other stuffs, how can we let Db to handle all of them, for example, if mysql doesn't recognize a char, it will turn to "?" & it means the text got corrupted, it mean the users lost that text. How Mysql handle all kind of special characters?

If you use a PreparedStatement with SQL that has placeholders for all of the text parameters, then the JDBC driver takes care of the escaping automatically.

Also, since there is a very diversity of UTF & special chars, so how many other things we need to worry if we do not encode text to make sure the system run stably?

Same answer.

Encoded text make the system run a bit slower, but we are headache-free.

There are no headaches if you use prepared statements and <c:out> (or the equivalent).

you sid "The way to defeat that is to escape the text at the point you are creating the HTML." so we have to use Java to encode right?

Yes, but you ONLY HTML encode the text when you output it for inclusion in a web page. If you output it as JSON, you encode using JSON escaping ... or more likely, you let the JSON serializer do it for you. If you send the text in other formats, or include it in other things, you encode it as required ... or not at all.

But the point is that you don't store it in the database in encoded form. If you do, then in nearly all cases (including HTML!!) you'd need to decode the HTML URL-encoded text before encoding it in the correct way.

다른 팁

It is somewhat better in terms of stability and configuration, as well as safety from XSS attacks, to encode everything before putting it in the database. The disadvantages are it takes slightly longer, and slightly more space in the DB, and you could escape everything when it is created again, but it's easier to escape everything.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top