I am trying to calculate the frequency of symbols in a binary file in c#
, I have already done that in c++ and that works fine and i have switched to c# from c++ because i have to implement the same in c#.
Note: I don't have to use LUTs/Arrays only the linkedlist must be used.
By frequency
i mean number of repetitions of symbols and by symbols
i mean if we see binary file using xxd -b BinaryFile.bin
then we will get lot of 8 bits
of combinations of 0
and 1
. So how many times each symbol repeats is it's frequency.
Now, How i have tried to do so:
I achieve it in c#
by doing mono filename.exe BinaryFile.bin
at terminal with code written in filename inside notepad++.
Logic:
I read each symbol in the BinaryFile if it is not repeated then i add it at the tail of linked list and if it is repeated then i increase it's frequency. This i repeat for full binary file.
Code:
Code for c# (full): (which don't work properly, I have pointed the problem containing part in the code, I put full code because may be you will need it):
////Problem containing part starts here ////////
public Huffman(string[] args) //called from MyClass
{
Node tree = null;
int counter = 0;
using(var stream = new BinaryReader(System.IO.File.OpenRead(args[0])))
{
while (stream.BaseStream.Position < stream.BaseStream.Length)
{
int processingValue = stream.ReadByte();
Node ppt, pt, temp;
bool is_there = false;
ppt = pt = tree;
while (pt != null)
{
if (pt.symbol == processingValue)
{
pt.freq++;
is_there = true;
break;
}
ppt = pt;
pt = pt.next;
}
if (is_there == false)
{
temp = new Node();
temp.symbol = processingValue;
temp.freq = 1;
temp.left = null;
temp.right = null;
temp.next = null;
temp.id = (++total_nodes);
temp.is_processed = 0;
if (tree == null)
{
tree = temp;
}
else
{
ppt.next = temp;
}
//The same check/debugging which i was doing in c++ to know what symbol and freq contains but they contains different values.
//And the output of both c#/c++ are different where as it was supposed to be same.
Node chc = tree;
while (chc != null)
{
Console.WriteLine(" sym: " + chc.symbol);
Console.WriteLine(" freq: " + chc.freq);
chc = chc.next;
}
}
}
stream.Close();
}
}
////Problem containing part Ends here ////////
Difference between the output of the c++ and c#:
(1) When i display the output of the c++ it works correctly and when i see the code part which i have written in my code to debug/check the output on the terminal for that shows the correct execution of code. whereas the same debugging code in c# don't show the same output with c++. which it was supposed to be do because both of these code to print "symbol" and "freq" are kept at the same place in program.
(2) The output of c# makes me feel that while loop is executed less number of times in c# then in c++ because in c# output terminal don't show large amount repetition of freq and symbol. Please see the output of both :
c# output at terminal :
hp@ubuntu:~/Desktop/$ mono check1.exe out.bin
sym: 0
freq: 1
sym: 0
freq: 200
sym: 1
freq: 1
sym: 0
freq: 200
sym: 1
freq: 198
sym: 2
freq: 1
sym: 0
freq: 200
sym: 1
freq: 198
sym: 2
freq: 195
sym: 3
freq: 1
sym: 0
freq: 200
sym: 1
freq: 198
sym: 2
freq: 195
sym: 3
freq: 189
sym: 4
freq: 1
hp@ubuntu:~/Desktop/
whereas the output of c++
is : (here the value of counter actually started form "0" (Not "196") but is not able to show the full output because the file is larger so output is big terminal is not able to show all, it just shows the output at the end)
hp@ubuntu:~/Desktop/$ ./filename out.bin
//Counter starts from "0" but terminal is not able to show all.So doing from "196"
counter: 196
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 1
counter: 197
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 2
counter: 198
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 3
counter: 199
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 4
counter: 200
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 5
counter: 201
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 6
counter: 202
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 7
counter: 203
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 8
counter: 204
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 9
counter: 205
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 10
counter: 206
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 11
counter: 207
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 12
counter: 208
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 13
counter: 209
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 14
counter: 210
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 15
counter: 211
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 16
counter: 212
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 17
counter: 213
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 18
counter: 214
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 19
counter: 215
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 20
counter: 216
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 21
counter: 217
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 22
counter: 218
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 23
counter: 219
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 24
counter: 220
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 25
counter: 221
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 26
counter: 222
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 27
counter: 223
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 28
counter: 224
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 29
counter: 225
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 30
counter: 226
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 31
counter: 227
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 32
counter: 228
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 33
counter: 229
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 34
counter: 230
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 35
counter: 231
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 36
counter: 232
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 37
counter: 233
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 38
counter: 234
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 39
counter: 235
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 40
counter: 236
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 41
counter: 237
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 42
counter: 238
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 43
counter: 239
sym: 0
freq: 50
sym: 1
freq: 50
sym: 2
freq: 48
sym: 3
freq: 48
sym: 4
freq: 44
Questions:
(1) Why the output shown by c# code is different from c++ ? Even i have tasted on same BinaryFile (out.bin in my case).
(2) Could you please help me in coming out of this problem ? Big thanks