Question

I have a tab delimited file with several columns (9 columns) that looks like this:

1:21468 1   21468   2.8628817609765984  0.09640845515631684     0.05034710996552612     1.0     0.012377712911711025    54.0

However in column 5 I sometimes have scientific numbers like:

    8.159959468796783E-4
    8.465114165595303E-4
    8.703354859736187E-5
    9.05132870067004E-4

I need to have all numbers in column 5 in decimal notation. From the example above:

    0.0008159959468796783
    0.0008465114165595303
    0.00008703354859736187
    0.000905132870067004

And I need to change these numbers without changing the rest of the numbers in column 5 or the rest of the file.

I know there is a similar post in Convert scientific notation to decimal in multiple fields. But in this case there was a if statement not related to the type of number present in the field, and it was for all numbers in that column. So, I'm having trouble transforming the information in there to my specific case. Can someone help me figuring this out?

Thank you!

Was it helpful?

Solution 2

As Jim already proposed, one way to do this is to simply treat the number as a string and do the translation yourself. This way you're able to fully maintain your significant digits.

The following demonstrates a function for doing just that. It takes in a number that's potentially in scientific notation, and it returns the decimal representation. Works with both positive and negative exponents:

use warnings;
use strict;

while (<DATA>) {
    my ($num, $expected) = split;
    my $dec = sn_to_dec($num);
    print $dec . ' - ' . ($dec eq $expected ? 'good' : 'bad') . "\n";
}

sub sn_to_dec {
    my $num = shift;

    if ($num =~ /^([+-]?)(\d*)(\.?)(\d*)[Ee]([-+]?\d+)$/) {
        my ($sign, $int, $period, $dec, $exp) = ($1, $2, $3, $4, $5);

        if ($exp < 0) {
            my $len = 1 - $exp;
            $int = ('0' x ($len - length $int)) . $int if $len > length $int;
            substr $int, $exp, 0, '.';
            return $sign.$int.$dec;

        } elsif ($exp > 0) {
            $dec .= '0' x ($exp - length $dec) if $exp > length $dec;
            substr $dec, $exp, 0, '.' if $exp < length $dec;
            return $sign.$int.$dec;

        } else {
            return $sign.$int.$period.$dec;
        }
    }

    return $num;
}


__DATA__
8.159959468796783E-4    0.0008159959468796783
8.465114165595303E-4    0.0008465114165595303
8.703354859736187E-5    0.00008703354859736187
9.05132870067004E-4     0.000905132870067004
9.05132870067004E+4     90513.2870067004
9.05132870067004E+16    90513287006700400
9.05132870067004E+0     9.05132870067004

OTHER TIPS

The easyiest (and fastest) way to convert a scientific notation number in perl, to a regular notation number:

my $num = '0.12345678E5';
$num *= 1;
print "$num\n";

If you do this the simple way, by parsing as floating point and then using printf to force it to print as a decimal, you may end up with slightly different results because you're at the upper limit of precision available in double-precision format.

What you should do is split each line into fields, then examine field 5 with something like this.

($u,$d,$exp) = $field[5] =~ /(\d)\.(\d+)[Ee]([-+]\d+)/

If field[5] is in scientific notation this will give you

$u    the digit before the decimal
$d    the digits after the decimal
$exp  the exponent

(if it's not you'll get back undefined values and can just skip the reformatting step)

Using that information you can reassemble the digits with the correct number of leading zeros and decimal point. If the exponent is positive you have to reassemble the digits but then insert the decimal point in the right place.

Once you've reformatted the value the way you want, reassemble the entire line (using, say, join) and write it out.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top