I wrote this Hamming Encoding code for class. Why is it so slow?

Question 1

std::cin is open in text-mode, and as such it is constantly on the lookout for all kinds of things to watch for (like newlines, etc).

Given the constant character sniffing by the std::cin input stream, i'm not surprised it takes longer, but it does seem a little excessive. the following, bypassing iostream and using FILE stream directly will probably do what you were expecting:

#include <cstdlib>
#include <cstdio>

int main(int argc, char *argv[])
{
    static unsigned char const codebook[] =
    {
        0x00, 0x1E, 0x2D, 0x33, 0x4B, 0x55, 0x66, 0x78,
        0x87, 0x99, 0xAA, 0xB4, 0xCC, 0xD2, 0xE1, 0xFF
    };

    for (int c = std::fgetc(stdin); c!=EOF; c=std::fgetc(stdin))
    {
        std::fputc(codebook[c >> 4], stdout);
        std::fputc(codebook[c & 0x0F], stdout);
    }

    return EXIT_SUCCESS;
}

I tested the exact code above on an 10MB random file loaded with chars ranging from a to z, and the results were ridiculously long when using std::cin and std::cout. Using the FILE streams directly, the difference was enormous. All of the code in this answer was tested with Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn) using -O3 optimization.

Using FILE streams

time ./hamming < bigfile.txt > bigfile.ham 
real 0m1.855s
user 0m1.812s
sys  0m0.041s

Using std::cin and std::cout

time ./hamming < bigfile.txt > bigfile.ham
real 0m23.819s
user 0m7.416s
sys  0m16.377s

Using std::cin and std::cout with std::cout.sync_with_stdio(false);

time ./hamming < bigfile.txt > bigfile.ham
real 0m24.867s
user 0m7.705s
sys  0m17.118s

In summary, ouch. Of note is the time spent in system. If I get a chance to update this with the std::istream::get() and put() methods I will, but honestly I don't expect any miracles on that. Unless there is some magic (to me, not to others) way of turning off io xlat from std::cin the FILE streams may be a reasonable alternative. I haven't yet investigated whether slurping std::cin's rdbuf() is a viable option either, but it may too have promise.

Edit: Using std::istreambuf_iterator<char>

Using the streambuf iterator class has a significant improvement, as it essentially bypasses all the inline slat junk, but it still isn't as efficient as FILE streams:

#include <iostream>
#include <cstdlib>
#include <cstdio>

int main(int argc, char *argv[])
{
    static unsigned char const codebook[] =
    {
        0x00, 0x1E, 0x2D, 0x33, 0x4B, 0x55, 0x66, 0x78,
        0x87, 0x99, 0xAA, 0xB4, 0xCC, 0xD2, 0xE1, 0xFF
    };

    std::istreambuf_iterator<char> cin_it(std::cin), cin_eof;
    std::for_each(cin_it, cin_eof, [](char c)
        {
          std::cout.put(static_cast<char>(codebook[static_cast<unsigned char>(c) >> 4]));
          std::cout.put(static_cast<char>(codebook[static_cast<unsigned char>(c) & 0x0F]));
        });

    return EXIT_SUCCESS;
}

Results:

time ./hamming < bigfile.txt > bigfile.ham

real 0m6.062s
user 0m5.795s
sys  0m0.053s

Note that system is now comparable to the FILE stream results, but the overhead from rest of the iostream templates in user seems a sore point (but still better than the other iostream attempts). You win some, you lose some =P

Edit: Unbuffered System IO

In effort to be completely fair, bypassing all runtime buffering and relying solely on the system calls to do this madness, the following is also worth noting:

#include <cstdlib>
#include <cstdio>
#include <unistd.h>

int main(int argc, char *argv[])
{
    static unsigned char const codebook[] =
    {
        0x00, 0x1E, 0x2D, 0x33, 0x4B, 0x55, 0x66, 0x78,
        0x87, 0x99, 0xAA, 0xB4, 0xCC, 0xD2, 0xE1, 0xFF
    };

    unsigned char c;
    while (read(STDIN_FILENO, &c, 1)> 0)
    {
        unsigned char duo[2] =
        {
            codebook[ c >> 4 ],
            codebook[ c & 0x0F ]
        };
        write(STDOUT_FILENO, duo, sizeof(duo));
    }

    return EXIT_SUCCESS;
}

The results, as you would expect, were dreadful:

time ./hamming < bigfile.txt > bigfile.ham

real 0m26.509s
user 0m2.370s
sys  0m24.087s

Question 2

I got close to an order of magnitude improvement by making 2 small changes.

Adding std::ios_base::synch_with_stdio(false) (no noticeable difference, though the impact is often implementation specific)
Buffering the output before writing (this made the most difference)

The updated code looks like this:

int main()
{

    //encodes a file using the (8,4) Hamming Code.
    //usage : HammingEncode.out < inputFile > outputFile 
        unsigned char const codebook[] = { 0x00, 0x1E, 0x2D, 0x33, 0x4B, 0x55, 0x66, 0x78, 0x87, 0x99, 0xAA, 0xB4, 0xCC, 0xD2, 0xE1, 0xFF };
        unsigned char in, nextByte;
        unsigned char const leftMask = 0xF0, rightMask = 0x0F;
        std::stringstream os;

        std::ios_base::sync_with_stdio(false);
        in = std::cin.get();
        while (std::cin) {
            nextByte = (in & leftMask) >> 4;
            os.put(codebook[nextByte]);
            nextByte = in & rightMask;
            os.put(codebook[nextByte]);
            in = std::cin.get();
        }
        std::cout << os.rdbuf();
}

Update

I tried one more implementation - using the underlying std::streambuf.

On my system the original code was taking around 14 seconds to process the full King James Bible - around 4.3 MB

The code in my original attempt took around 2.1 seconds to process.

This new implementation took 0.71 seconds to process the same document.

int main()
{

    //encodes a file using the (8,4) Hamming Code.
    //usage : HammingEncode.out < inputFile > outputFile 
    unsigned char const codebook[] = { 0x00, 0x1E, 0x2D, 0x33, 0x4B, 0x55, 0x66, 0x78, 0x87, 0x99, 0xAA, 0xB4, 0xCC, 0xD2, 0xE1, 0xFF };
    unsigned char in, nextByte;
    unsigned char const leftMask = 0xF0, rightMask = 0x0F;
    std::stringstream os;

    std::ios_base::sync_with_stdio(false);

std::streambuf * pbuf = std::cin.rdbuf();
    do {
        in = pbuf->sgetc();
        nextByte = (in & leftMask) >> 4;
        os << codebook[nextByte];
        nextByte = in & rightMask;
        os << codebook[nextByte];
    } while (pbuf->snextc() != EOF);
    std::cout << os.rdbuf();
}

Question 3

C++ iostreams has bad opinion for being rather inefficient, although different numbers suggest that it's rather quality-of-implementation issue, rather than iostream disadvantage.

Anyway, just to be sure that it's not anything like slow hdd you might compare execution time to e.g.

cat file1 > file2

Of cource cat will be a bit quicker, as it doesn't double the size of your data.

Then try to compare efficieny of your code with following:

#include <stdio.h>
#include <unistd.h>

int main()
{
    unsigned char buffer[1024*1024]; // 1MB buffer should be enough

    while (!eof(STDIN_FILENO)){
        size_t len = read(STDIN_FILENO, &buffer[0], 1024*1024);
        write(STDOUT_FILENO, &buffer[0], len);
        write(STDOUT_FILENO, &buffer[0], len);
    }

    return 0;
}

EDIT:

Sorry, my bad. Try

#include <stdio.h>
#include <unistd.h>

int main()
{
    unsigned char buffer[1024*1024]; // 1MB buffer should be enough

    size_t len = read(STDIN_FILENO, &buffer[0], 1024*1024);

    while(len > 0){
        write(STDOUT_FILENO, &buffer[0], len);
        write(STDOUT_FILENO, &buffer[0], len);
        len = read(STDIN_FILENO, &buffer[0], 1024*1024);
    }

    return 0;
}

Question 4

If I understand correctly, for each byte you read, you then write 2 bytes. So output file will be double size of the input. If your input is big enough - total IO (read + 2 * write) time will be significant.

In Huffman coding this is not the case - since you usually write less than you read, and total IO time will be much lower.

EDIT: As Blorgbeard said, it can be the difference in buffering. C++ also does buffering, but may be default buffers are much smaller than in Java. Also the fact that the HDD head should constantly jump between reading file in one location and then writing in another - significantly impacts overall IO performance.

In any case, encoding should be done in chunks in order to ensure sequential read and write of large blocks