Generated file with protobuf-net for c# is different a little from the same file generated in C++

https://stackoverflow.com/questions/16440286

14-04-2022
|

Frage

I'm having a wired issue, I'm using protobuf-net for C# to generate a file which based on Google Protocol Buffer message and then upload it to one of my company's servers.

I created a tool in C# that generates a .proto file to .cs, then I'm using its classes (from .cs file) to fill all the required fields in the message and after that I called to Serializer.Serialze() function and its creates for me the requested file.

BUT, and this is the problem, I have another file (same file) which created in another tool that wrote in C++ (uses the same .proto file that I used), but when I'm trying to upload my file to our servers I'm getting an error that something is wrong.

I compared the 2 files with "Win Merge" software and I noticed a very little differences in 3 different lines (out of 7000+ lines in each file) compared to the file that generated in the C++ tool.

Here is example of 2 line captured from Win Merge tool (on the Left the C++, on the right C#):

enter image description here

Another example:

I notice that the differences is in the rectangulars (which I don't understand what is there meaning) with the bytes inside them...

Here is the .proto file that I'm using:

message Package {

message ArchDepend {

    message Arch {
        required string version = 1;
        required string description = 2;
    }

    message Firmware {
        required string version = 1;
        required string description = 2;
        required bytes file = 3;

        repeated string from_version = 4;
    }

    message Rsu {
        required string version = 1;
        required string description = 2;
        required bytes file = 3;
    }

    required Arch arch = 1;

    optional Firmware firmware = 2;
    optional Rsu rsu = 3;
}

message DefaultEeprom {
    required string version = 1;
    required string description = 2;
    required bytes file = 3;

    message Migration {
        required string from_version = 1;
        required bytes file = 2;
    }

    repeated Migration migrations = 4;
}

required string name = 1;
optional ArchDepend archDepend = 2;
optional DefaultEeprom defaultEeprom = 3;

}

The fields that I'm insert in the .cs file are strings and files(*.bin) here is example for the strings:

"PowerMaster-30"

"JS702394 K17.A20"

etc..

They are inserted to most of the strings fields in the .proto file.

In the files fields (.proto) I'm loading a binary files that my company uses (the same files which loaded to the C++ tool).

Here is a screen shot of the binary file which I'm reading data from, opened in program called "Falsher.exe", in the left is converted to Hex view, in the right is the ASCII:

enter image description here

And here is the code that reading that binary file:

       private string[] FindPanelVersionInBinFile(string path)
    {
        string currentline;
        int flag = 0;
        string[] namesArray = new string[3]; // contains all the strings which I get from the BIN file.

        using (StreamReader sr = new StreamReader(path))
        {
            while ((currentline = sr.ReadLine()) != null && flag < 3)
            {
                if (currentline.Contains("PRODUCT_FAMILY"))
                {
                    int index = currentline.IndexOf("PRODUCT_FAMILY");
                    namesArray[0] = currentline.Substring(index + 16, 14); // index of product family"PowerMaster-xx"
                    flag++;
                }
                if (currentline.Contains("SW_VERSION"))
                {
                    int index = currentline.IndexOf("SW_VERSION");
                    namesArray[1] = currentline.Substring(index + 12, 17); // index of software version "JSxxxxx Kxx.yyy"
                    flag++;                       
                }
                if (currentline.Contains("compatibility"))
                {
                    int index = currentline.IndexOf("compatibility");
                    namesArray[2] = currentline.Substring(index + 21, 7); // index of compatibility number "xx.yyy"
                    flag++;
                }                  
            }
        }
        return namesArray;

After all that, I'm using this code to generates my file:

                        byte[] data;
                        using (var ms = new MemoryStream())
                        {
                            Serializer.Serialize(ms, package);
                            data = ms.ToArray();
                        }
                        string packageFilePath = Path.Combine(savePath, package.Name);
                        File.WriteAllBytes(packageFilePath, data);

Someone can help me please to explain to me what is exactly the differences and from what reason they happened?

Thank you!!

Orion.

Lösung

It looks like the difference is simply zero-terminated strings.

I'm guessing (please correct me) that the data on the left is from protobuf-net, and the data on the right is from the C++ implementation. On the left we have " [10], some data, and then [1A] (I believe the [1A] is the next field-header: field 3, length-prefixed). Supporting this hypothesis, the "character" (loosely speaking) before the [10]/[11] is ", i.e. ASCII 34 - which is (in protobuf) means "field 2, length-prefixed". So I'm content to say that the " [10] is telling us "field 2, string of length 16", and the " [11] is telling us "field 2, string of length 17".

As such, it feels logical that the string in C# is "JS702415 K17.020", and the string in C++ is the same but with a nul-terminator. Here is gets interesting: I do not believe that it should be including the nul-terminator. So: either the C++ API is making a mistake (which I doubt), or when passing the data to C++ you are accidentally telling it to (incorrectly) include the nul-terminator to the string.

I'm sure that the encoding is not meant to include nul-terminators, because the protocol specification gives the example of the string "testing" (as field 2), which it encodes as:

12 07 74 65 73 74 69 6e 67

the 12 is the field-header (field 2, length-prefixed); the 07 is the length, and the next 7 bytes (74...67) is the payload, UTF-8 encoded. Note: no nul-terminator.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow