Pretty file size in Ruby?

Question 1

How about the Filesize gem ? It seems to be able to convert from bytes (and other formats) into pretty printed values:

example:

Filesize.from("12502343 B").pretty      # => "11.92 MiB"

http://rubygems.org/gems/filesize

Question 2

If you use it with Rails - what about standard Rails number helper?

http://api.rubyonrails.org/classes/ActionView/Helpers/NumberHelper.html#method-i-number_to_human_size

number_to_human_size(number, options = {})

?

Question 3

I agree with @David that it's probably best to use an existing solution, but to answer your question about what you're doing wrong:

The primary error is dividing s by self rather than the other way around.
You really want to divide by the previous s, so divide s by 1024.
Doing integer arithmetic will give you confusing results, so convert to float.
Perhaps round the answer.

So:

class Integer
  def to_filesize
    {
      'B'  => 1024,
      'KB' => 1024 * 1024,
      'MB' => 1024 * 1024 * 1024,
      'GB' => 1024 * 1024 * 1024 * 1024,
      'TB' => 1024 * 1024 * 1024 * 1024 * 1024
    }.each_pair { |e, s| return "#{(self.to_f / (s / 1024)).round(2)}#{e}" if self < s }
  end
end

lets you:

1.to_filesize
# => "1.0B"
1020.to_filesize
# => "1020.0B" 
1024.to_filesize
# => "1.0KB" 
1048576.to_filesize
# => "1.0MB"

Again, I don't recommend actually doing that, but it seems worth correcting the bugs.

Question 4

This is my solution:

def filesize(size)
  units = %w[B KiB MiB GiB TiB Pib EiB ZiB]

  return '0.0 B' if size == 0
  exp = (Math.log(size) / Math.log(1024)).to_i
  exp += 1 if (size.to_f / 1024 ** exp >= 1024 - 0.05)
  exp = units.size - 1 if exp > units.size - 1

  '%.1f %s' % [size.to_f / 1024 ** exp, units[exp]]
end

Compared to other solutions it's simpler, more efficient, and generates a more proper output.

Format

All other methods have the problem that they report 1023.95 bytes wrong. Moreover to_filesize simply errors out with big numbers (it returns an array).

 -       method: [     filesize,     Filesize,  number_to_human,  to_filesize ]
 -          0 B: [        0.0 B,       0.00 B,          0 Bytes,         0.0B ]
 -          1 B: [        1.0 B,       1.00 B,           1 Byte,         1.0B ]
 -         10 B: [       10.0 B,      10.00 B,         10 Bytes,        10.0B ]
 -       1000 B: [     1000.0 B,    1000.00 B,       1000 Bytes,      1000.0B ]
 -        1 KiB: [      1.0 KiB,     1.00 KiB,             1 KB,        1.0KB ]
 -      1.5 KiB: [      1.5 KiB,     1.50 KiB,           1.5 KB,        1.5KB ]
 -       10 KiB: [     10.0 KiB,    10.00 KiB,            10 KB,       10.0KB ]
 -     1000 KiB: [   1000.0 KiB,  1000.00 KiB,          1000 KB,     1000.0KB ]
 -        1 MiB: [      1.0 MiB,     1.00 MiB,             1 MB,        1.0MB ]
 -        1 GiB: [      1.0 GiB,     1.00 GiB,             1 GB,        1.0GB ]
 -  1023.95 GiB: [      1.0 TiB,  1023.95 GiB,          1020 GB,    1023.95GB ]
 -        1 TiB: [      1.0 TiB,     1.00 TiB,             1 TB,        1.0TB ]
 -        1 EiB: [      1.0 EiB,     1.00 EiB,             1 EB,        ERROR ]
 -        1 ZiB: [      1.0 ZiB,     1.00 ZiB,          1020 EB,        ERROR ]
 -        1 YiB: [   1024.0 ZiB,  1024.00 ZiB,       1050000 EB,        ERROR ]

Performance

Also, it has the best performance (seconds to process 1 million numbers):

 - filesize:           2.15
 - Filesize:          15.53
 - number_to_human:  139.63
 - to_filesize:        2.41

Question 5

Here is a method using log10:

def number_format(d)
   e = Math.log10(d).to_i / 3
   return '%.3f' % (d / 1000 ** e) + ['', ' k', ' M', ' G'][e]
end

s = number_format(9012345678.0)
puts s == '9.012 G'

https://ruby-doc.org/core/Math.html#method-c-log10

Question 6

You get points for adding a method to Integer, but this seems more File specific, so I would suggest monkeying around with File, say by adding a method to File called .prettysize().

But here is an alternative solution that uses iteration, and avoids printing single bytes as float :-)

def format_mb(size)
  conv = [ 'b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb' ];
  scale = 1024;

  ndx=1
  if( size < 2*(scale**ndx)  ) then
    return "#{(size)} #{conv[ndx-1]}"
  end
  size=size.to_f
  [2,3,4,5,6,7].each do |ndx|
    if( size < 2*(scale**ndx)  ) then
      return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
    end
  end
  ndx=7
  return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end

Question 7

@Darshan Computing's solution is only partial here. Since the hash keys are not guaranteed to be ordered this approach will not work reliably. You could fix this by doing something like this inside the to_filesize method,

 conv={
      1024=>'B',
      1024*1024=>'KB',
      ...
 }
 conv.keys.sort.each { |s|
     next if self >= s
     e=conv[s]
     return "#{(self.to_f / (s / 1024)).round(2)}#{e}" if self < s }
 }

This is what I ended up doing for a similar method inside Float,

 class Float
   def to_human
     conv={
       1024=>'B',
       1024*1024=>'KB',
       1024*1024*1024=>'MB',
       1024*1024*1024*1024=>'GB',
       1024*1024*1024*1024*1024=>'TB',
       1024*1024*1024*1024*1024*1024=>'PB',
       1024*1024*1024*1024*1024*1024*1024=>'EB'
     }
     conv.keys.sort.each { |mult|
        next if self >= mult
        suffix=conv[mult]
        return "%.2f %s" % [ self / (mult / 1024), suffix ]
     }
   end
 end