Question

I've found lots of information about converting raw byte information into a human-readable format, but I need to do the opposite, i.e. convert the String "1.6 GB" into the long value 1717990000. Is there an in-built/well-defined way to do this, or will I pretty much have to roll my own?

[Edit]: Here is my first stab...

static class ByteFormat extends NumberFormat {
    @Override
    public StringBuffer format(double arg0, StringBuffer arg1, FieldPosition arg2) {
        // TODO Auto-generated method stub
        return null;
    }

    @Override
    public StringBuffer format(long arg0, StringBuffer arg1, FieldPosition arg2) {
        // TODO Auto-generated method stub
        return null;
    }

    @Override
    public Number parse(String arg0, ParsePosition arg1) {
        return parse (arg0);
    }

    @Override
    public Number parse(String arg0) {
        int spaceNdx = arg0.indexOf(" ");
        double ret = Double.parseDouble(arg0.substring(0, spaceNdx));
        String unit = arg0.substring(spaceNdx + 1);
        int factor = 0;
        if (unit.equals("GB")) {
            factor = 1073741824;
        }
        else if (unit.equals("MB")) {
            factor = 1048576;
        }
        else if (unit.equals("KB")) {
            factor = 1024;
        }

        return ret * factor;
    }
}
Was it helpful?

Solution 2

I've never heard about such well-known library, which implements such text-parsing utility methods. But your solution seems to be near from correct implementation.

The only two things, which I'd like to correct in your code are:

  1. define method Number parse(String arg0) as static due to it utility nature

  2. define factors for each type of size definition as final static fields.

I.e. it will be like this one:

private final static long KB_FACTOR = 1024;
private final static long MB_FACTOR = 1024 * KB_FACTOR;
private final static long GB_FACTOR = 1024 * MB_FACTOR;

public static double parse(String arg0) {
    int spaceNdx = arg0.indexOf(" ");
    double ret = Double.parseDouble(arg0.substring(0, spaceNdx));
    switch (arg0.substring(spaceNdx + 1)) {
        case "GB":
            return ret * GB_FACTOR;
        case "MB":
            return ret * MB_FACTOR;
        case "KB":
            return ret * KB_FACTOR;
    }
    return -1;
}

OTHER TIPS

Spring Framework, on version 5.1, added a DataSize class which allows parsing human-readable data sizes into bytes, and also formatting them back to their human-readable form. It can be found here.

If you use Spring Framework, you can upgrade to >=5.1 and use this class. Otherwise you can c/p it and the related classes (while complying to the license).

Then you can use it:

DataSize dataSize = DataSize.parse("16GB");
System.out.println(dataSize.toBytes());

will give the output:

17179869184

However, the pattern used to parse your input

  • Does not support decimals (so, you can use 1GB, 2GB, 1638MB, but not 1.6GB)
  • Does not support spaces (so, you can use 1GB but not 1 GB)

I would recommend to stick to the convention for compatibility/easy maintainability. But if that does not suit your needs, you need to copy & edit the file - it is a good place to start.

A revised version of Andremoniy's answer that properly distinguishes between kilo and kibi, etc.

private final static long KB_FACTOR = 1000;
private final static long KIB_FACTOR = 1024;
private final static long MB_FACTOR = 1000 * KB_FACTOR;
private final static long MIB_FACTOR = 1024 * KIB_FACTOR;
private final static long GB_FACTOR = 1000 * MB_FACTOR;
private final static long GIB_FACTOR = 1024 * MIB_FACTOR;

public static double parse(String arg0) {
    int spaceNdx = arg0.indexOf(" ");
    double ret = Double.parseDouble(arg0.substring(0, spaceNdx));
    switch (arg0.substring(spaceNdx + 1)) {
        case "GB":
            return ret * GB_FACTOR;
        case "GiB":
            return ret * GIB_FACTOR;
        case "MB":
            return ret * MB_FACTOR;
        case "MiB":
            return ret * MIB_FACTOR;
        case "KB":
            return ret * KB_FACTOR;
        case "KiB":
            return ret * KIB_FACTOR;
    }
    return -1;
}

All in one answer, parses to long:

public class SizeUtil {

    public static String units = "BKMGTPEZY";

    public static long parse(String arg0) {
        int spaceNdx = arg0.indexOf(" ");    
        double ret = Double.parseDouble(arg0.substring(0, spaceNdx));
        String unitString = arg0.substring(spaceNdx+1);
        int unitChar = unitString.charAt(0);
        int power = units.indexOf(unitChar);
        boolean isSi = unitString.indexOf('i')!=-1;
        int factor = 1024;
        if (isSi) 
        {
            factor = 1000;
        }

        return new Double(ret * Math.pow(factor, power)).longValue();
    }

    public static void main(String[] args) {
        System.out.println(parse("300.00 GiB")); // requires a space
        System.out.println(parse("300.00 GB"));
        System.out.println(parse("300.00 B"));
        System.out.println(parse("300 EB"));
    }
}

I know this is much later but I was looking for a similar function which takes into account the SI prefix as well. So I landed up creating one myself and I thought it might be useful for other people.

public static String units = "KMGTPE";

/**
 * Converts from human readable to byte format
 * @param number The number value of the amount to convert
 * @param unit The unit: B, KB, MB, GB, TB, PB, EB
 * @param si Si prefix
 * @return byte value
 */
public static double parse(double number, String unit, boolean si)
{
    String identifier = unit.substring(0, 1);
    int index = units.indexOf(identifier);
    //not already in bytes
    if (index!=-1)
    {
        for (int i = 0; i <= index; i++)
            number = number * (si ? 1000 : 1024);
    }
    return number;
}

I'm sure this is possible to do with recursion as well. It was too simple to bother...

Following approach can also be used and makes it generic, and not dependent on space character to parse.

Thanks to @RobAu for the hint above. Added a new method to get the index of first letter in the string, and changed the parse method to get index based on this new method. I have kept the original parse method and added a new parseAny method, so the results can be compared. Hope it helps someone.

Also, thanks to this answer for the indexOf method - https://stackoverflow.com/a/11214786/6385674.

public class ConversionUtil {

    public static String units = "BKMGTPEZY";

    public static long parse(String arg0) {
        int spaceNdx = arg0.indexOf(" ");    
        double ret = Double.parseDouble(arg0.substring(0, spaceNdx));
        String unitString = arg0.substring(spaceNdx+1);
        int unitChar = unitString.charAt(0);
        int power = units.indexOf(unitChar);
        boolean isSi = unitString.indexOf('i')!=-1;
        int factor = 1024;
        if (isSi) 
        {
            factor = 1000;
        }

        return new Double(ret * Math.pow(factor, power)).longValue();
    }
    /** @return index of pattern in s or -1, if not found */
    public static int indexOf(Pattern pattern, String s) {
        Matcher matcher = pattern.matcher(s);
        return matcher.find() ? matcher.start() : -1;
    }    
    public static long parseAny(String arg0)
    {
        int index = indexOf(Pattern.compile("[A-Za-z]"), arg0);
        double ret = Double.parseDouble(arg0.substring(0, index));
        String unitString = arg0.substring(index);
        int unitChar = unitString.charAt(0);
        int power = units.indexOf(unitChar);
        boolean isSi = unitString.indexOf('i')!=-1;
        int factor = 1024;
        if (isSi) 
        {
            factor = 1000;
        }

        return new Double(ret * Math.pow(factor, power)).longValue();       

    }
    public static void main(String[] args) {
        System.out.println(parse("300.00 GiB")); // requires a space
        System.out.println(parse("300.00 GB"));
        System.out.println(parse("300.00 B"));        
        System.out.println(parse("300 EB"));
        System.out.println(parseAny("300.00 GiB"));
        System.out.println(parseAny("300M"));
    }
}

I write a file size human readable utility enum class, Hope it helps you!

/**
 * The file size human readable utility class, 
 * provide  mutual conversions from human readable size to byte size
 * 
 * The similar function in stackoverflow, linked:
 *  https://stackoverflow.com/questions/3758606/how-to-convert-byte-size-into-human-readable-format-in-java?r=SearchResults
 * 
 * Apache also provide similar function
 * @see org.apache.commons.io.FileUtils#byteCountToDisplaySize(long)
 * 
 * @author Ponfee
 */
public enum HumanReadables {

    SI    (1000, "B", "KB",  "MB",  "GB",  "TB",  "PB",  "EB" /*, "ZB",  "YB" */), // 

    BINARY(1024, "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB"/*, "ZiB", "YiB"*/), // 
    ;

    private static final String FORMAT = "#,##0.##";
    private static final Pattern PATTERN = Pattern.compile(".*[0-9]+.*");

    private final int      base;
    private final String[] units;
    private final long[]   sizes;

    HumanReadables(int base, String... units) {
        this.base  = base;
        this.units = units;
        this.sizes = new long[this.units.length];

        this.sizes[0] = 1;
        for (int i = 1; i < this.sizes.length; i++) {
            this.sizes[i] = this.sizes[i - 1] * this.base; // Maths.pow(this.base, i);
        }
    }

    /**
     * Returns a string of bytes count human readable size
     * 
     * @param size the size
     * @return human readable size
     */
    public strictfp String human(long size) {
        if (size == 0) {
            return "0" + this.units[0];
        }

        String signed = "";
        if (size < 0) {
            signed = "-";
            size = size == Long.MIN_VALUE ? Long.MAX_VALUE : -size;
        }

        /*int unit = (int) Maths.log(size, this.base);
        return signed + format(size / Math.pow(this.base, unit)) + " " + this.units[unit];*/

        int unit = find(size);
        return new StringBuilder(13) // 13 max length like as "-1,023.45 GiB"
            .append(signed)
            .append(formatter().format(size / (double) this.sizes[unit]))
            .append(" ")
            .append(this.units[unit])
            .toString();
    }

    public strictfp long parse(String size) {
        return parse(size, false);
    }

    /**
     * Parse the readable byte count, allowed suffix units: "1", "1B", "1MB", "1MiB", "1M"
     * 
     * @param size   the size
     * @param strict the strict, if BINARY then verify whether contains "i"
     * @return a long value bytes count
     */
    public strictfp long parse(String size, boolean strict) {
        if (size == null || size.isEmpty()) {
            return 0L;
        }
        if (!PATTERN.matcher(size).matches()) {
            throw new IllegalArgumentException("Invalid format [" + size + "]");
        }

        String str = size = size.trim();
        long factor = this.sizes[0];
        switch (str.charAt(0)) {
            case '+': str = str.substring(1);               break;
            case '-': str = str.substring(1); factor = -1L; break;
        }

        int end = 0, lastPos = str.length() - 1;
        // last character isn't a digit
        char c = str.charAt(lastPos - end);
        if (c == 'i') {
            // last pos cannot end with "i"
            throw new IllegalArgumentException("Invalid format [" + size + "], cannot end with \"i\".");
        }

        if (c == 'B') {
            end++;
            c = str.charAt(lastPos - end);

            boolean flag = isBlank(c);
            while (isBlank(c) && end < lastPos) {
                end++;
                c = str.charAt(lastPos - end);
            }
            // if "B" head has space char, then the first head non space char must be a digit
            if (flag && !Character.isDigit(c)) {
                throw new IllegalArgumentException("Invalid format [" + size + "]: \"" + c + "\".");
            }
        }

        if (!Character.isDigit(c)) {
            // if not a digit character, then assume is a unit character
            if (c == 'i') {
                if (this == SI) {
                    // SI cannot contains "i"
                    throw new IllegalArgumentException("Invalid SI format [" + size + "], cannot contains \"i\".");
                }
                end++;
                c = str.charAt(lastPos - end);
            } else {
                if (this == BINARY && strict) {
                    // if strict, then BINARY must contains "i"
                    throw new IllegalArgumentException("Invalid BINARY format [" + size + "], miss character \"i\".");
                }
            }

            switch (c) {
                case 'K': factor *= this.sizes[1]; break;
                case 'M': factor *= this.sizes[2]; break;
                case 'G': factor *= this.sizes[3]; break;
                case 'T': factor *= this.sizes[4]; break;
                case 'P': factor *= this.sizes[5]; break;
                case 'E': factor *= this.sizes[6]; break;
                /*
                case 'Z': factor *= this.bytes[7]; break;
                case 'Y': factor *= this.bytes[8]; break;
                */
                default: throw new IllegalArgumentException("Invalid format [" + size + "]: \"" + c + "\".");
            }

            do {
                end++;
                c = str.charAt(lastPos - end);
            } while (isBlank(c) && end < lastPos);
        }

        str = str.substring(0, str.length() - end);
        try {
            return (long) (factor * formatter().parse(str).doubleValue());
        } catch (NumberFormatException | ParseException e) {
            throw new IllegalArgumentException("Failed to parse [" + size + "]: \"" + str + "\".");
        }
    }

    public int base() {
        return this.base;
    }

    public String[] units() {
        return Arrays.copyOf(this.units, this.units.length);
    }

    public long[] sizes() {
        return Arrays.copyOf(this.sizes, this.sizes.length);
    }

    private int find(long bytes) {
        int n = this.sizes.length;
        for (int i = 1; i < n; i++) {
            if (bytes < this.sizes[i]) {
                return i - 1;
            }
        }
        return n - 1;
    }

    private DecimalFormat formatter() {
        return new DecimalFormat(FORMAT);
    }

    private boolean isBlank(char c) {
        return c == ' ' || c == '\t';
    }

}

Another option based on @gilbertpilz code. In this case using regex to get the value and the factor. It is also case insensitive.

    private final static long KB_FACTOR = 1000;
    private final static long KIB_FACTOR = 1024;
    private final static long MB_FACTOR = 1000 * KB_FACTOR;
    private final static long MIB_FACTOR = 1024 * KIB_FACTOR;
    private final static long GB_FACTOR = 1000 * MB_FACTOR;
    private final static long GIB_FACTOR = 1024 * MIB_FACTOR;

    private long parse(String arg0) throws ParseException {
        Pattern pattern = Pattern.compile("([0-9]+)(([KMG])I?B)");
        Matcher match = pattern.matcher(arg0);

        if( !match.matches() || match.groupCount()!=3)
            throw new ParseException("Wrong format", 0);

        long ret = Long.parseLong(match.group(0));
        switch (match.group(2).toUpperCase()) {
            case "GB":
                return ret * GB_FACTOR;
            case "GIB":
                return ret * GIB_FACTOR;
            case "MB":
                return ret * MB_FACTOR;
            case "MIB":
                return ret * MIB_FACTOR;
            case "KB":
                return ret * KB_FACTOR;
            case "KIB":
                return ret * KIB_FACTOR;
        }

        throw new ParseException("Wrong format", 0);
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top