Question

Existing app passes XML to a sproc in SQLServer 2000, input parameter data type is TEXT; The XML is derived from Dataset.GetXML(). But I notice it doesn't specify an encoding.

So when the user sneaks in an inappropriate character into the dataset, specifically ASCII 146 (which appears to be an apostrophe) instead of ASCII 39 (single quote), the sproc fails.

One approach is to prefix the result of GetXML with

<?xml version="1.0" encoding="ISO-8859-1"?>

It works in this case, but what would be a more correct approach to ensure the sproc does not crash (if other unforeseen characters pop up)?

PS. I suspect the user is typing text into MS-Word or similar editor, and copy & pasting into the input fields of the app; I would probably want to allow the user to continue working this way, just need to prevent the crashes.

EDIT: I am looking for answers that confirm or deny a few aspects, For example:
- as per title, whats the default encoding if none specified in the XML?
- Is the encoding ISO-8859-1 the right one to use?
- if there a better encoding that would encompass more characters in the english-speaking world and thus less likely to cause an error in the sproc?
- would you filter at the app's UI level for standard ASCII (0 to 127 only), and not allow extended ASCII?
- any other pertinent details.

Was it helpful?

Solution

DataSet.GetXml() returns a string. In .NET, strings are internally encoded using UTF-16, but that is not really relevant here.

The reason why there's no <?xml encoding=...> declaration in the string is because that declaration is only useful or needed to parse XML in a byte stream. A .NET string is not a byte stream, it's just text with well-defined codepoint semantics (which is Unicode), so it is not needed there.

If there is no XML encoding declaration, UTF-8 is to be assumed by the XML parser in the absence of BOM. In your case, however, it is also entirely irrelevant since the problem is not with an XML parser (XML isn't parsed by SQL Server when it's stored in a TEXT column). The problem is that your XML contains some Unicode characters, and TEXT is a non-Unicode SQL type.

You can encode a string to any encoding using Encoding.GetBytes() method.

OTHER TIPS

I believe your approach should be to use WriteXml instead of GetXml. That should allow you to specify the encoding.

However, note that you will have to write through an intermediate stream - if you output directly to a string, it will always use UTF-16. Since you are using a TEXT column, that will permit characters not valid for TEXT.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top