Question

I read some values from MS SQL database and I like to make some operations on string. Here is the code I am using to check if some string starts with another string:

String input = "Основното jавно обвинителство денеска поднесе пријава против БМ (59) од Битола заради постоење основи на сомнение дека сторил кривични дела „тешки дела против безбедноста на луѓето и имотот во сообраќајот“ и „неукажување помош на лице повредено во сообраќајна незгода“";
String subString = "Основното јавно обвинителство";
if (input.StartsWith(subString))
{
    Response.Write("OK");
}

However input.StartsWith(subString) does not return true. Does anybody have an idea why?

Was it helpful?

Solution

The difference is in the character j in the position 10: its code is 106 in the input, but in your substring it's 1112 (0x458 - see demo).

Your second j comes from Unicode page 4

ј   1112    458 0xD1 0x98   CYRILLIC SMALL LETTER JE

It looks the same, but has a different code.

Re-typing j in the substring fixes this problem.

OTHER TIPS

The second words in the input and the subString don't match. Put the strings in notepad++ and select each word at a time. The first and last word in the subString match but not the middle one.

This sample demonstrates the problem:

void Main()
{
    var test = "Основното јавно обвинителство";
    var tost = "Основното jавно обвинителство";

    for(var i = 0; i < test.Length; i++){
        Console.WriteLine(string.Format("1: {0}, 2: {1}, Equal: {2}", test[i], tost[i], test[i] == tost[i]));
        if(test[i] != tost[i]){ Console.WriteLine (string.Format("1: {0}, 2: {1}", (int) test[i], (int) tost[i])); }
    }

    Console.WriteLine (test == tost);
}

Relevant output:

1: ј, 2: j, Equal: False
1: 1112, 2: 106

The strings that you're posted are not equal. Do this:

string s1 = "Основното јавно обвинителство";
string s2 = "Основното jавно обвинителство";
var bt = Encoding.UTF8.GetBytes(s1);
var bt_1 = Encoding.UTF8.GetBytes(s2);

Output will look similar to the following:

56
55

The actual difference is as follows. The "j" in the first string is:

[19]    209 byte
[20]    152 byte

whereas the "j" in the second string is:

[19]    106 byte

First one represents ј with 0xD1 0x98 hexadecimal code and second one represent j with 0x6A hexadecimal code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top