Question

I want to use a lazy Bytestring to represent a stream of bits. I need to be able to take arbitrary slices of bits from this stream efficiently. For example, I might have a ByteString of length 10, and I'd like slice a new ByteString consisting of bits 24-36 from the original ByteString.

The problem is that ByteStrings are arrays of Word8s, so taking ranges that are not multiples of 8 is difficult. The best I've been able to come up with is this, using Data.Binary and Data.Binary.Bits. Note that get32BitRange is specifically for ranges <= 32.

get32BitRange :: Int -> Int -> ByteString -> ByteString
get32BitRange lo hi = runPut . putWord32be
                    . runGet (runBitGet . block $ word8 (8 - (lo `quot` 8)) *> word32be len)
                    . drop offset
    where len = hi - lo
          lo' = lo `div` 8
          offset = fromIntegral lo' - 1

The algorithm is:

  • find the index of the first Word8 containing the bits I want
  • drop from the ByteString up to that index
  • if the low end of the bit range is not a multiple of 8, there will be some excess bits at the beginning of the Word8, so skip those
  • get (hi - lo) bits, and store in a Word32
  • put that Word32 into a ByteString

It looks more than a little ugly, is there a more efficient way to grab arbitrary slices of bits from a ByteString?

EDIT: here is a more efficient version

get32BitRange :: Int -> Int -> ByteString -> Word32
get32BitRange lo hi = runGet get
    where get = runBitGet . block $ byteString byteOff *> word8 bitOff *> word32be len
          len = hi - lo
          (byteOff, bitOff) = lo `quotRem` 8
Was it helpful?

Solution 3

I'm going to mark this as resolved. This is what I ended up using:

get32BitRange :: Int -> Int -> ByteString -> Word32
get32BitRange lo hi = assert (lo < hi) $
    runGet (runBitGet bitGet)
    where bitGet = block $ byteString byteOff
                         *> word8 bitOff
                         *> word32be len
          len = hi - lo
          (byteOff, bitOff) = lo `quotRem` 8

OTHER TIPS

I think other solutions are way better but you can use the Internal module to get at the underlying structure: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/src/Data-ByteString-Internal.html#ByteString

data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
                     {-# UNPACK #-} !Int                -- offset
                     {-# UNPACK #-} !Int                -- length

Then you can use standard pointer tools to generate ByteStrings pointing exactly where you want, by manipulating the ForeignPtr directly...

You can't make this efficient with ByteString as your API type, because it doesn't carry the information that the "bits" you want really start at some offset into the first byte.

Best bet is to make a wrapper type:

data BitStream =
    BitStream {
        info :: ByteString,
        -- values from 0-7: ignore all bits in the first byte up to
        -- but not including this offset
        firstBitOffset :: !Int,to but not including this offset
        -- values from 0-7: ignore all bits in the last byte after
        -- but not including this offset
        lastBitOffset :: !Int
    }

Then you can design a bit-based API around this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top