What is the possibility of CRC16 collisions on 20 bytes of data?

Question 1

The following simple script runs an infinite loop that fetches 2 random 20-byte sequences, calculates CRC16 and checks if there is a collision. Continuous evaluation of this loop de facto estimates collision percentage:

#!/usr/bin/env perl

use Digest::CRC qw(crc16);

open(my $f, '<', '/dev/urandom');
my $n = 0;
my $coll = 0;

while (1) {
    read $f, $randstr1, 20;
    read $f, $randstr2, 20;
    my $crc1 = crc16($randstr1);
    my $crc2 = crc16($randstr2);

    $n++;
    $coll++ if $crc1 == $crc2;

    printf "percent of collisions = %.6f%%\n", $coll * 100.0 / $n if ($n % 100000 == 0);
}

From what I get on my computer, collision percentage seems to be around 0.0016% (or 1e-5, or "1 in 100_000"), which is way worse than predicted estimates based on ideal hashing distribution of a 16-bit hash (such as 2^16 / 2^160).

UPDATE: I see you've clarified that 20 bytes are not just fully random bytes, but fall into range of [a-z0-9]. Here's the updated version that estimates collisions in that alphabet:

#!/usr/bin/env perl

use Digest::CRC qw(crc16);

my $n = 0;
my $coll = 0;
my @chars = ('a'..'z', '0'..'9');

sub randstr() {
    my $res;
    foreach (1..20) { $res .= $chars[rand @chars]; }
    return $res;
}

while (1) {
    my $crc1 = crc16(randstr());
    my $crc2 = crc16(randstr());

    $n++;
    $coll++ if $crc1 == $crc2;

    printf "percent of collisions = %.4f%%\n", $coll * 100.0 / $n if ($n % 100000 == 0);
}

It yields approximately the same result, about 0.0016%.

Question 2

A good 16-bit hash should have a probability of 2^-16 of collision given two distinct inputs. CRC16 isn't a very good hash, but unless you have an adversary picking the inputs, it should be good enough for your purposes.

Keep in mind the birthday paradox. You'll start to get collisions after you hash about 2^8 items.

Question 3

Whether or not you get a likely hash collision depends on the contents of the data, not its quantity. If it's not deliberately chosen to collide, then you should be pretty safe in a situation like this where the size of the data is 10x the size of hash. That said, it's still a 16 bit hash and the likelihood of collisions is quite high by modern standards.

Question 4

The probability of a hash collision does not depend on the length of the message, so long as the entropy (number of significant bits) of the message is greater than or equal to the number of bits in the hash, and that it is a good hash that well mixes the bits of the input into each hash.

In your case you have about 100 bits of entropy, so as long as you have a good hash of length 100 bits or less, then the collision probability will depend only on the number of bits in the hash and the number of opportunities you have for collisions. This answer shows how to calculate the probability of a collision.