Generated file with protobuf-net for c# is different a little from the same file generated in C++

StackOverflow https://stackoverflow.com/questions/16440286

Domanda

I'm having a wired issue, I'm using protobuf-net for C# to generate a file which based on Google Protocol Buffer message and then upload it to one of my company's servers.

I created a tool in C# that generates a .proto file to .cs, then I'm using its classes (from .cs file) to fill all the required fields in the message and after that I called to Serializer.Serialze() function and its creates for me the requested file.

BUT, and this is the problem, I have another file (same file) which created in another tool that wrote in C++ (uses the same .proto file that I used), but when I'm trying to upload my file to our servers I'm getting an error that something is wrong.

I compared the 2 files with "Win Merge" software and I noticed a very little differences in 3 different lines (out of 7000+ lines in each file) compared to the file that generated in the C++ tool.

Here is example of 2 line captured from Win Merge tool (on the Left the C++, on the right C#):

enter image description here

Another example:

I notice that the differences is in the rectangulars (which I don't understand what is there meaning) with the bytes inside them...

Here is the .proto file that I'm using:

message Package {

message ArchDepend {

    message Arch {
        required string version = 1;
        required string description = 2;
    }

    message Firmware {
        required string version = 1;
        required string description = 2;
        required bytes file = 3;

        repeated string from_version = 4;
    }

    message Rsu {
        required string version = 1;
        required string description = 2;
        required bytes file = 3;
    }

    required Arch arch = 1;

    optional Firmware firmware = 2;
    optional Rsu rsu = 3;
}

message DefaultEeprom {
    required string version = 1;
    required string description = 2;
    required bytes file = 3;

    message Migration {
        required string from_version = 1;
        required bytes file = 2;
    }

    repeated Migration migrations = 4;
}

required string name = 1;
optional ArchDepend archDepend = 2;
optional DefaultEeprom defaultEeprom = 3;

}

The fields that I'm insert in the .cs file are strings and files(*.bin) here is example for the strings:

"PowerMaster-30"

"JS702394 K17.A20"

etc..

They are inserted to most of the strings fields in the .proto file.

In the files fields (.proto) I'm loading a binary files that my company uses (the same files which loaded to the C++ tool).

Here is a screen shot of the binary file which I'm reading data from, opened in program called "Falsher.exe", in the left is converted to Hex view, in the right is the ASCII:

enter image description here

And here is the code that reading that binary file:

       private string[] FindPanelVersionInBinFile(string path)
    {
        string currentline;
        int flag = 0;
        string[] namesArray = new string[3]; // contains all the strings which I get from the BIN file.

        using (StreamReader sr = new StreamReader(path))
        {
            while ((currentline = sr.ReadLine()) != null && flag < 3)
            {
                if (currentline.Contains("PRODUCT_FAMILY"))
                {
                    int index = currentline.IndexOf("PRODUCT_FAMILY");
                    namesArray[0] = currentline.Substring(index + 16, 14); // index of product family"PowerMaster-xx"
                    flag++;
                }
                if (currentline.Contains("SW_VERSION"))
                {
                    int index = currentline.IndexOf("SW_VERSION");
                    namesArray[1] = currentline.Substring(index + 12, 17); // index of software version "JSxxxxx Kxx.yyy"
                    flag++;                       
                }
                if (currentline.Contains("compatibility"))
                {
                    int index = currentline.IndexOf("compatibility");
                    namesArray[2] = currentline.Substring(index + 21, 7); // index of compatibility number "xx.yyy"
                    flag++;
                }                  
            }
        }
        return namesArray;

After all that, I'm using this code to generates my file:

                        byte[] data;
                        using (var ms = new MemoryStream())
                        {
                            Serializer.Serialize(ms, package);
                            data = ms.ToArray();
                        }
                        string packageFilePath = Path.Combine(savePath, package.Name);
                        File.WriteAllBytes(packageFilePath, data);

Someone can help me please to explain to me what is exactly the differences and from what reason they happened?

Thank you!!

Orion.

È stato utile?

Soluzione

It looks like the difference is simply zero-terminated strings.

I'm guessing (please correct me) that the data on the left is from protobuf-net, and the data on the right is from the C++ implementation. On the left we have " [10], some data, and then [1A] (I believe the [1A] is the next field-header: field 3, length-prefixed). Supporting this hypothesis, the "character" (loosely speaking) before the [10]/[11] is ", i.e. ASCII 34 - which is (in protobuf) means "field 2, length-prefixed". So I'm content to say that the " [10] is telling us "field 2, string of length 16", and the " [11] is telling us "field 2, string of length 17".

As such, it feels logical that the string in C# is "JS702415 K17.020", and the string in C++ is the same but with a nul-terminator. Here is gets interesting: I do not believe that it should be including the nul-terminator. So: either the C++ API is making a mistake (which I doubt), or when passing the data to C++ you are accidentally telling it to (incorrectly) include the nul-terminator to the string.

I'm sure that the encoding is not meant to include nul-terminators, because the protocol specification gives the example of the string "testing" (as field 2), which it encodes as:

12 07 74 65 73 74 69 6e 67

the 12 is the field-header (field 2, length-prefixed); the 07 is the length, and the next 7 bytes (74...67) is the payload, UTF-8 encoded. Note: no nul-terminator.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top