Frage

This might have been asked before, but I can't find any such posts. Is there a class to work with ASCII Strings? The benefits are numerous:

  1. Comparison should be faster since its just byte-for-byte (instead of UTF-8 with variable encoding)
  2. Memory efficient, should use about half the memory in large strings
  3. Faster versions of ToUpper()/ToLower() which use a Look-Up-Table that is language invariant

Jon Skeet wrote a basic AsciiString implementation and proved #2, but I'm wondering if anyone took this further and completed such a class. I'm sure there would be uses, although no one would typically take such a route since all the existing String functions would have to be re-implemented by hand. And conversions between String <> AsciiString would be scattered everywhere complicating an otherwise simple program.

Is there such a class? Where?

War es hilfreich?

Lösung

I thought I would post the outcome of my efforts to implement a system as described with as much string support and compatibility as I could. It's possibly not perfect but it should give you a decent base to improve on if needed.

The ASCIIChar struct and ASCIIString string implicitly convert to their native counterparts for ease of use.

The OP's suggestion for replacements of ToUpper/Lower etc have been implemented in a much quicker way than a lookup list and all the operations are as quick and memory friendly as I could make them.

Sorry couldn't post source, it was too long. See links below.

  • ASCIIChar - Replaces char, stores the value in a byte instead of int and provides support methods and compatibility for the string class. Implements virtual all methods and properties available for char.

  • ASCIIChars - Provides static properties for each of the valid ASCII characters for ease of use.

  • ASCIIString - Replaces string, stores characters in a byte array and implements virtually all methods and properties available for string.

Andere Tipps

Dotnet has no ASCII string support directly. Strings are UTF16 because Windows API works with ASCII (onr char - one byte) or UTF16 only. Utf8 will be the best solution (java uses it), but .NET does not support it because Windows doesn't.


Windows API can convert between charsets, but windows api only works with 1 byte chars or 2 byte chars, so if you use UTF8 strings in .NET you must convert them everytime which has impact in performace. Dotnet can use UTF8 and other encondings via BinaryWriter/BinaryReader or a simple StreamWriter/StreamReader.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top