How to convert IPv6 from binary for storage in MySQL

https://stackoverflow.com/questions/1120371

13-09-2019
|

Question

I am trying to store IPv6 addresses in MySQL 5.0 in an efficient way. I have read the other questions related to this, such as this one. The author of that question eventually chose for two BIGINT fields. My searches have also turned up another often used mechanism: Using a DECIMAL(39,0) to store the IPv6 address. I have two questions about that.

What are the advantages and disadvantages of using DECIMAL(39,0) over the other methods such as 2*BIGINT?
How do I convert (in PHP) from the binary format as returned by inet_pton() to a decimal string format usable by MySQL, and how do I convert back so I can pretty-print with inet_ntop()?

Solution 2

Here are the functions I now use to convert IP addresses from and to DECIMAL(39,0) format. They are named inet_ptod and inet_dtop for "presentation-to-decimal" and "decimal-to-presentation". It needs IPv6 and bcmath support in PHP.

/**
 * Convert an IP address from presentation to decimal(39,0) format suitable for storage in MySQL
 *
 * @param string $ip_address An IP address in IPv4, IPv6 or decimal notation
 * @return string The IP address in decimal notation
 */
function inet_ptod($ip_address)
{
    // IPv4 address
    if (strpos($ip_address, ':') === false && strpos($ip_address, '.') !== false) {
        $ip_address = '::' . $ip_address;
    }

    // IPv6 address
    if (strpos($ip_address, ':') !== false) {
        $network = inet_pton($ip_address);
        $parts = unpack('N*', $network);

        foreach ($parts as &$part) {
            if ($part < 0) {
                $part = bcadd((string) $part, '4294967296');
            }

            if (!is_string($part)) {
                $part = (string) $part;
            }
        }

        $decimal = $parts[4];
        $decimal = bcadd($decimal, bcmul($parts[3], '4294967296'));
        $decimal = bcadd($decimal, bcmul($parts[2], '18446744073709551616'));
        $decimal = bcadd($decimal, bcmul($parts[1], '79228162514264337593543950336'));

        return $decimal;
    }

    // Decimal address
    return $ip_address;
}

/**
 * Convert an IP address from decimal format to presentation format
 *
 * @param string $decimal An IP address in IPv4, IPv6 or decimal notation
 * @return string The IP address in presentation format
 */
function inet_dtop($decimal)
{
    // IPv4 or IPv6 format
    if (strpos($decimal, ':') !== false || strpos($decimal, '.') !== false) {
        return $decimal;
    }

    // Decimal format
    $parts = array();
    $parts[1] = bcdiv($decimal, '79228162514264337593543950336', 0);
    $decimal = bcsub($decimal, bcmul($parts[1], '79228162514264337593543950336'));
    $parts[2] = bcdiv($decimal, '18446744073709551616', 0);
    $decimal = bcsub($decimal, bcmul($parts[2], '18446744073709551616'));
    $parts[3] = bcdiv($decimal, '4294967296', 0);
    $decimal = bcsub($decimal, bcmul($parts[3], '4294967296'));
    $parts[4] = $decimal;

    foreach ($parts as &$part) {
        if (bccomp($part, '2147483647') == 1) {
            $part = bcsub($part, '4294967296');
        }

        $part = (int) $part;
    }

    $network = pack('N4', $parts[1], $parts[2], $parts[3], $parts[4]);
    $ip_address = inet_ntop($network);

    // Turn IPv6 to IPv4 if it's IPv4
    if (preg_match('/^::\d+.\d+.\d+.\d+$/', $ip_address)) {
        return substr($ip_address, 2);
    }

    return $ip_address;
}

OTHER TIPS

We went for a VARBINARY(16) column instead and use inet_pton() and inet_ntop() to do the conversions:

https://github.com/skion/mysql-udf-ipv6

The functions can be loaded into a running MySQL server and will give you INET6_NTOP and INET6_PTON in SQL, just as the familiar INET_NTOA and INET_ATON functions for IPv4.

Edit: There are compatible functions in MySQL now, just with different names. Only use the above if you are on pre-5.6 MySQL and are looking for a convenient future upgrade path.

DECIMAL(39)

Pros:

Works with basic arithmetic operators (such as + and -).
Works with basic indexing (exact or range).
Format is display friendly.

Cons:

Can accept out of range values for IPv6.
Is not a very efficient storage mechanism.
Can cause confusion as to which mathematical operators or functions work and which don't.

BINARY(16)...

Pros:

Most efficient format for exact representation.
Works with basic indexing (exact and range).
Works with prefix indexing for prefixes that are multiples of 8 bits.
Stores only valid IPv6 values (although does not guarantee valid addressing).
MySQL in later versions has functions that support conversions for this format to and from IPv6 representations (but not 4in6).

Cons:

Not friendly for display.
Isn't friendly with operators or functions meant for numbers.

BINARY(39)...

This is for full addresses (using hexdec even for 4in6). Can also be ascii rather than binary.

Pros:

Human readable (if you can call IPv6 that).
Supports basic indexing (exact and range).
Supports prefix indexing for multiple of 4 bits.
Directly IPv6 compatible. No conversion needed.

Cons:

Doesn't work well with any mathematical functions or operators.
Most inefficient storage.
Can allow invalid representations.

Oddities:

Gets complex if you want things such as case insensitive.
IPv6 has other display formats although using those makes for more complexities such as you can have two representations of the same address or you lose range lookups. Can even end up having to make it 45 bytes long or using varchar/varbinary.
Variances of this can support preserving the address as originally received. That may rarely be desired but when it you lose a lot of benefits.
Remove the separators with full format and just store is as hex string for less hassles and a little more efficiency. You can take this a long way if prefix indexing is important (BINARY(128)).

BIGINT UNSIGNED * 2

Pros:

Works with mathematical operators and functions with the caveat of having to do extra things around it being two columns.
Efficient but again with the caveat that it being two columns will add some overhead.
Works with basic indexes (exact, range).
Works with prefix index when prefix is 64 bits.
Display friendly format.

Cons:

Two columns makes it non-atomic and means doubling of a lot of operations on it.

Oddities:

Many modern languages and systems give 64 bit ints but not unsigned. Signed is problematic. Negative numbers present as lower than positive but their bit sequences are actually higher. For this reason it is common instead to use 4 * INT UNSIGNED.
Similarly people might break it up for prefix indexing and you can go at least as far as 8 bits (TINYINT UNSIGNED). Some people might also make use of the BIT(1) type for full prefix indexing, assuming MySQL co posit indexes on bit types properly.
Again similarly with four columns some operations that require things like carrying from on to another are ironically easier due to slack bits during calculations (intermediate values in calculations can still be 64 bit).

Summary

People will use different formats for different reasons. Backwards compatibility may be one reason and that depends on what was being done for IPv4. Others depend on how the addresses are being used and optimisations around that. You may see more than one approach being used.

B16 is a good default approach since it's the most efficient and hassle free.

For conversions in PHP you can do them by hand if you research:

gmp or bcmath
PHP's number handling and bitwise operators, be especially aware of limitations on int or float as well as functions that depend on them that might otherwise seem useful
The IPv6 formats
pack/unpack, bin2hex/hex2bin.

I would recommend however using a common library for dealing with IPv6's various display formats.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow