I've written a simple Huffman encoding in Ruby. As output I've got an array, for example:

["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

I need to write, and then read, it to and from a file. I tried several methods:

IO.binwrite("out.cake", array)

I get a simple text file and not binary.

Or:

File.open("out.cake", 'wb' ) do |output|
  array.each do | byte |
       output.print byte.chr
  end
end

Which looks like it works, but then I can't read it into array.

Which encoding should I use?

有帮助吗?

解决方案

I think you can just use Array#pack and String#unpack like the following code:

# Writing
a = ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]
File.open("out.cake", 'wb' ) do |output|
  output.write [a.join].pack("B*")
end

# Reading
s = File.binread("out.cake")
bits = s.unpack("B*")[0] # "01011111010110111000111000010011"

I don't know your preferred format for the result of reading and I know the above method is inefficient. But anyway you can take "0" or "1" sequentially from the result of unpack to traverse your Huffman tree.

其他提示

If you want bits, then you have to do both packing and unpacking manually. Neither Ruby nor any other common-use language will do it for you.

Your array contains strings that are groups of characters, but you need to build an array of bytes and write those bytes into the file.

From this: ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

you should build these bytes: 01011111 01011011 10001110 00010011

Since it's just four bytes, you can put them into a single 32-bit number 01011111010110111000111000010011 that is 5F5B8E13 hex.

Both samples of your code do different things. The first one writes into the file a string representation of a Ruby array. The second one writes 32 bytes where each is either 48 ('0') or 49 ('1').

If you want bits, then your output file size should be just four bytes.

Read about bit operations to learn how to achieve that.


Here is a draft. I didn't test it. Something may be wrong.

a = ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

# Join all the characters together. Add 7 zeros to the end.
bit_sequence = a.join + "0" * 7  # "010111110101101110001110000100110000000"

# Split into 8-digit chunks.
chunks = bit_sequence.scan(/.{8}/)  # ["01011111", "01011011", "10001110", "00010011"]

# Convert every chunk into character with the corresponding code.
bytes = chunks.map { |chunk| chunk.to_i(2).chr }  # ["_", "[", "\x8E", "\x13"]

File.open("my_huffman.bin", 'wb' ) do |output|
  bytes.each { |b| output.write b }
end

Note: seven zeros are added to handle case when the total number of characters is not divisible by 8. Without those zeros, bit_sequence.scan(/.{8}/) will drop the remaining characters.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top