Question

A similar question was asked before, but this is different in that the string I'm trying to pull random characters from may contain multi-byte chars. I'm basically making a pseudo-"leet" generator, which takes a string and changes all of the characters into randomly chosen chars from the extended Unicode that look similar, to give it a sort of "hacker" type look. (It's for a game, and one section needs to use this style. Don't judge me.) So I've got an extension method:

private static Random rand = new Random();
public static char random(this string str)
{
    return str[rand.Next(str.Length)];
}

And the way it works, is I look at each char in a string and it's called like this:

public static string leetify(this string str)
{
    StringBuilder sb = new StringBuilder();

    foreach (char c in str)
    {
        switch (char.ToLower(c))
        {
            case 'a':
                sb.Append("4ÀÁÂÃÄÅàáâãäåĀāĂ㥹ǎǍǺǻȀȁȂȃȦȧȺɅɐɑɒªΆѦѧᴀᾼ₳".random());
                break;
                ...  //More of the same for each letter

                //Okay, the letter 's' definitely has a failure case,
                //not the only one, but needed an example
            case 's':
                sb.Append("ŚśŜŝŞşŠšƧƨȘșȿʂϨϩЅѕᵴṠṡṢṣṤṥṦṧṨṩ$§".random());
                break;
                ...
            default:
                sb.Append(c);
                break;
        }
    }
    return sb.toString();
}

With similar code for the rest of the letters, of course. The final string is then displayed in a TextBox and possibly various other controls. Now, I've checked, and all of the characters that I chose are perfectly capable of displaying in a TextBox with the font I've chosen - I can copy/paste them into there and it works. But when I run this, I get a lot of error characters appearing in the strings. The failure point I believe is that my random function doesn't understand that the string contains multi-byte characters. Is there some way to modify it so that it does?

Edit: Added the 's' set which definitely produces a failure.

Edit 2: Alternatively, if there were some way to easily tell which chars in my string were multi-byte, I could just remove them and have a smaller selection of chars to choose from. I'm not using the characters for their intended purpose, obviously, so I'd be fine with sacrificing a bit of variety for simplicity.

Was it helpful?

Solution

The problem could lie in one of the other letter sets and the combination of characters is what is causing your issue. For instance I can cause the test case by @Harrison to fail by including a combining diacritical mark such as \u0301 to the string. So without seeing the other sets and the input test case you are using it is hard to say.

Ignoring all that the correct way to do this if you do have combining chars or surrogate pairs is to use StringInfo.GetTextElementEnumerator to iterate over the strings logical chars. Here is a badly performing example that would replace your current Random implementation.

public static class Extensions
{
    private static Random rand = new Random(1);

    public static string Random(this string str)
    {
        var chars = new List<string>();
        var strElements = StringInfo.GetTextElementEnumerator(str);
        while (strElements.MoveNext())
        {
            chars.Add(strElements.GetTextElement());
        }
        return chars[rand.Next(chars.Count)];
    }
}

This will cover all cases for instance the letter "ś" can be defined by its literal and has a length of 1 or with a combining char over s "s\u0301" which has a length of 2. These both represent the same glyph when rendering.

OTHER TIPS

There is no error in your function. The following Test passes which uses all 31 letters in your s string.

public static class Extensions
{
    private static Random rand = new Random(1);

    public static char Random(this string str)
    {
        return str[rand.Next(str.Length)];
    }
}

[TestClass]
public class StackOverflow
{
    [TestMethod]
    public void MyTestMethod()
    {
        string s = "ŚśŜŝŞşŠšƧƨȘșȿʂϨϩЅѕᵴṠṡṢṣṤṥṦṧṨṩ$§";
        HashSet<char> expected = new HashSet<char>();
        HashSet<char> actual = new HashSet<char>();

        foreach (char c in s)
        {
            expected.Add(c);
        }

        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 1000; i++)
        {
            sb.Append(s.Random());
        }

        string str = sb.ToString();

        foreach (char c in str)
        {
            actual.Add(c);
        }

        Assert.AreEqual(1000, str.Length);
        CollectionAssert.AreEquivalent(expected.ToList(), actual.ToList());
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top