Question

I'm trying to determine file type which is being recieved through a stream (in order to name it with the proper file extension). I've written determineFormat(String str) method which is feed by bytesToHex() method (bytes are from the buffer). Unfortunately this doesn't work as expected; determineFormat() always return .aac extension even though .mp3 is being recived.

 public String determineFormat(String str)  {

    Pattern aacPattern = Pattern.compile("FFF1|FFF9");
    Pattern mp3Pattern = Pattern.compile("494433|FFFB");

        Matcher matcher = aacPattern.matcher(str);
        if(matcher.find())  {
            return "aac";

        }

     matcher = mp3Pattern.matcher(str);
    if(matcher.find())  {
        return "mp3";
    }

    return "unknown";
}

I feed my determineFormat() method using this:

public String bytesToHex(byte[] bytes) {
    char[] hexChars = new char[bytes.length * 2];
    int v;
    for ( int j = 0; j < bytes.length; j++ ) {
        v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}
Was it helpful?

Solution 2

The problem reveals to be simpler than it seemed to be. I was testing my app with MPEG-2 Audio Layer 3 with ID3v2. I've decided to read the raw "HexToString` output:

0DCB1C992B37173740244875C143D50ACDBA0422CD01D73D3C78F05ED7BBC2B33F9D78A7FFF342C0241C6B56B11EC1867984C20F42A4FAC5B9C0
42220314C006D94E124673CD4CC27FC2FCE12215410F12086BE5A3EDFC6DB2BEB0EAEC6EAAA4BF997FFB3337F914AB1A89C808EA6D338912D72E
99CE11E899999D3AE1092590FB2B71D736DC544B0AFD1035A3FFF340C00E178B62E5BE48C46F04B8EFC106AE3F17DDE08B5FD48672EBEABB216A
8438B6FB3B33BF91D3F3EBFCE14184320532ABA37FFD59BFF6ABAD1AA9AADEE73220679D2C7DDBAB766433A99D8CA752B383067465691750A24A
00F32A5078E29258F6D87A620AFFF342C00A158B22E5BE5944BAE8BA2C54739BE486B719A76DF5FD984D5257DBEAC43B238598EFAB3592DE8DD5

The "real" file signature reveals to be FFF3. After that I've found this site, which describes mpeg Layer 3 files: http://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=687&strPageToDisplay=signatures . Finally I was able to get my code to work nicely with fixed patterns:

Pattern aacPattern = Pattern.compile("(FFF1|FFF9)");
Pattern mp3Pattern = Pattern.compile("(FFF3|FFFA|FFFB)");

At the beginning I was mislead by information about signatures I got from this site: http://www.garykessler.net/library/file_sigs.html

OTHER TIPS

I think it's because you match your pattern against the whole file. Change the patterns to

Pattern aacPattern = Pattern.compile("^(FFF1|FFF9)");
Pattern mp3Pattern = Pattern.compile("^(494433|FFFB)");

And then of course it's enough if you pass in only the first couple of bytes. For getting the bytes in hex you could rather do something easy like

StringBuilder sb = new StringBuilder();
for (byte b : bytes) {
    sb.append(String.format("%02X", b));
}
// sb.toString();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top