Question

I am trying to calculate the frequency of symbols in a binary file in c#, I have already done that in c++ and that works fine and i have switched to c# from c++ because i have to implement the same in c#.

Note: I don't have to use LUTs/Arrays only the linkedlist must be used.

By frequency i mean number of repetitions of symbols and by symbols i mean if we see binary file using xxd -b BinaryFile.bin then we will get lot of 8 bits of combinations of 0 and 1. So how many times each symbol repeats is it's frequency.

Now, How i have tried to do so: I achieve it in c# by doing mono filename.exe BinaryFile.bin at terminal with code written in filename inside notepad++.

Logic: I read each symbol in the BinaryFile if it is not repeated then i add it at the tail of linked list and if it is repeated then i increase it's frequency. This i repeat for full binary file.

Code:

Code for c# (full): (which don't work properly, I have pointed the problem containing part in the code, I put full code because may be you will need it):

  ////Problem containing part starts here ////////
        public Huffman(string[] args) //called from MyClass 
        {
            Node tree = null;
            int counter = 0;
            using(var stream = new BinaryReader(System.IO.File.OpenRead(args[0]))) 
            {
                while (stream.BaseStream.Position < stream.BaseStream.Length) 
                {
                    int processingValue = stream.ReadByte();
                    Node ppt, pt, temp;
                    bool is_there = false;
                    ppt = pt = tree;
                    while (pt != null) 
                    {
                        if (pt.symbol == processingValue) 
                        {
                            pt.freq++;
                            is_there = true;

                            break;
                        }
                        ppt = pt;
                        pt = pt.next;
                    }
                    if (is_there == false) 
                    {
                        temp = new Node();
                        temp.symbol = processingValue;
                        temp.freq = 1;
                        temp.left = null;
                        temp.right = null;
                        temp.next = null;
                        temp.id = (++total_nodes);
                        temp.is_processed = 0;
                        if (tree == null) 
                        {
                            tree = temp;
                        } 
                        else 
                        {
                            ppt.next = temp;
                        }
//The same check/debugging which i was doing in c++ to know what symbol and freq contains but they contains different values. 
//And the output of both c#/c++ are different where as it was supposed to be same.
                        Node chc = tree;
                        while (chc != null) 
                        {
                            Console.WriteLine("  sym: " + chc.symbol);
                            Console.WriteLine("  freq: " + chc.freq);
                            chc = chc.next;
                        }
                    }
                }
                stream.Close();
            }


        }
 ////Problem containing part Ends here ////////

Difference between the output of the c++ and c#:

(1) When i display the output of the c++ it works correctly and when i see the code part which i have written in my code to debug/check the output on the terminal for that shows the correct execution of code. whereas the same debugging code in c# don't show the same output with c++. which it was supposed to be do because both of these code to print "symbol" and "freq" are kept at the same place in program.

(2) The output of c# makes me feel that while loop is executed less number of times in c# then in c++ because in c# output terminal don't show large amount repetition of freq and symbol. Please see the output of both :

c# output at terminal :

hp@ubuntu:~/Desktop/$ mono check1.exe out.bin 
  sym: 0
  freq: 1
  sym: 0
  freq: 200
  sym: 1
  freq: 1
  sym: 0
  freq: 200
  sym: 1
  freq: 198
  sym: 2
  freq: 1
  sym: 0
  freq: 200
  sym: 1
  freq: 198
  sym: 2
  freq: 195
  sym: 3
  freq: 1
  sym: 0
  freq: 200
  sym: 1
  freq: 198
  sym: 2
  freq: 195
  sym: 3
  freq: 189
  sym: 4
  freq: 1
hp@ubuntu:~/Desktop/

whereas the output of c++ is : (here the value of counter actually started form "0" (Not "196") but is not able to show the full output because the file is larger so output is big terminal is not able to show all, it just shows the output at the end)

    hp@ubuntu:~/Desktop/$ ./filename out.bin

  //Counter starts from "0" but terminal is not able to show all.So doing from "196"
    counter: 196
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  1
    counter: 197
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  2
    counter: 198
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  3
    counter: 199
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  4
    counter: 200
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  5
    counter: 201
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  6
    counter: 202
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  7
    counter: 203
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  8
    counter: 204
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  9
    counter: 205
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  10
    counter: 206
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  11
    counter: 207
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  12
    counter: 208
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  13
    counter: 209
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  14
    counter: 210
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  15
    counter: 211
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  16
    counter: 212
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  17
    counter: 213
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  18
    counter: 214
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  19
    counter: 215
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  20
    counter: 216
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  21
    counter: 217
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  22
    counter: 218
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  23
    counter: 219
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  24
    counter: 220
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  25
    counter: 221
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  26
    counter: 222
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  27
    counter: 223
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  28
    counter: 224
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  29
    counter: 225
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  30
    counter: 226
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  31
    counter: 227
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  32
    counter: 228
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  33
    counter: 229
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  34
    counter: 230
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  35
    counter: 231
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  36
    counter: 232
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  37
    counter: 233
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  38
    counter: 234
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  39
    counter: 235
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  40
    counter: 236
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  41
    counter: 237
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  42
    counter: 238
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  43
    counter: 239
      sym:  0
      freq:  50
      sym:  1
      freq:  50
      sym:  2
      freq:  48
      sym:  3
      freq:  48
      sym:  4
      freq:  44

Questions:

(1) Why the output shown by c# code is different from c++ ? Even i have tasted on same BinaryFile (out.bin in my case).

(2) Could you please help me in coming out of this problem ? Big thanks

Was it helpful?

Solution

I might be totally misunderstanding this... Are you just wanting to know the number of occurrences of each byte value (0..255) within a specified file?

If so, a simple way to do it is like this:

var counts = new int[256];  // Assumes files aren't longer than 2GB.

string filename = "<Your filename goes here>";

foreach (byte b in File.ReadAllBytes(filename)) // Will run out of memory
    ++counts[b];                                // for very large files!

for (int i = 0; i < counts.Length; ++i)
    Console.WriteLine("Symbol {0} occurred {1} times.", i, counts[i]);

However, this is so much simpler than what you're doing that I feel I must be misunderstanding....


[EDIT]

I can't fix your original code, but here's a sample program that works that uses a linked-list to solve the problem:

using System;
using System.IO;

namespace ConsoleApp1
{
    public sealed class Node
    {
        public byte Symbol { get; set; }
        public int Count   { get; set; }
        public Node Next   { get; set; }
    }

    sealed class Program
    {
        private void run()
        {
            var linkedList = new Node();

            string filename = @"C:\Test\t.cs";

            foreach (byte symbol in File.ReadAllBytes(filename))
                addSymbol(symbol, linkedList);

            for (int symbol = 0; symbol < 256; ++symbol)
            {
                int count = countForSymbol((byte)symbol, linkedList);
                Console.WriteLine("Symbol {0} occurred {1} times.", symbol, count);
            }
        }

        private static void addSymbol(byte symbol, Node head)
        {
            Node last = head;

            while (head != null)
            {
                last = head;

                if (head.Symbol == symbol)
                {
                    ++head.Count;
                    return;
                }
                else
                {
                    head = head.Next;
                }
            }

            last.Next = new Node
            {
                Symbol = symbol, 
                Count = 1
            };
        }

        private int countForSymbol(byte symbol, Node head)
        {
            while (head != null)
            {
                if (head.Symbol == symbol)
                    return head.Count;
                else
                    head = head.Next;
            }

            return 0;
        }

        private static void Main()
        {
            new Program().run();
        }
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top