Question

I have a file (tmp1) which is a list of numbers which are in the following format:

4373610497
4416339969
4426498049
4435738625

Each 64 bit number here is actually made up of multiple 16 bit fields which have the numbers I'm interested in.

For Eg (and only showing the 48 bits that matter here):

4435738625 = 0000000000000001 0000100001100100 0000000000000001

And the numbers I want are:

a= 0000000000000001 = 1
b= 0000100001100100 = 2148
c= 0000000000000001 = 1

This is the code I'm using to do this right now - but it's painfully slow. The input file contains between 500K and 1 million lines, so I'm trying to look for ways to do this faster or more efficiently.

while read line; do
  a=$((((line >> 32)) & 65535));
  b=$((((line >> 16)) & 65535));
  c=$((line & 65535));
  printf "$a $b $c\n" >>tmp2
done <tmp1

I need to run this on a FreeBSD machine - so I can't use gawk. And awk does not seem to allow bit-wise operations.

Was it helpful?

Solution 2

You don't really need bit operations to do that. For example:

awk '{val = $1;
      c = val%65536; val = (val-c)/65536;
      b = val%65536; val = (val-b)/65536;
      a = val%65536;
      print a, b, c}'

However, awk values are not 64-bit integers; they are doubles, which only have 53 bits of precision. So that will only work if none of your numbers are greater than 9007199254740992 (253).

You might consider using a different tool, like bc or python. If you have GNU bc available (which is possible, even on a BSD system), the following very similar program should work:

bc <(echo 'define s(val) {
             c = val%65536; val = (val-c)/65536
             b = val%65536; val = (val-b)/65536
             a = val%65536;
             print a," ",b," ",c,"\n";
           }
           while(1){v=read(); if (v==0) break; v=s(v);}') \
   < datafile

Note well: the bc read function does not check for end of file, so you need to put some kind of explicit terminator in. I used 0 in the above script, but that might be a valid input in your case. You might want to change it to -1 or some other special value. Whatever, make sure that your datafile is actually terminated with that value.

OTHER TIPS

There is a solution, works for your needs. but if it works faster than yours, I cannot tell. you can test.

here I just test with one number in your example, you can wrap it in a loop.

kent$  printf "%064s\n" "$(bc <<< "obase=2;4435738625")"|sed -r 's/.{16}/ibase=2;&\n/g'|bc
1
2148
1

Use bc with obase set, and its documented behaviour:

For bases greater than 16, bc uses a multi-character digit method of printing the numbers where each higher base digit is printed as a base 10 number. The multi-character digits are separated by spaces.

$ bc -q <(echo "obase=65536") tmp1 <(echo "halt")
00001 01200 00001
00001 01852 00001
00001 02007 00001
00001 02148 00001

and pipe that into awk if you need to finesse the output a little, e.g. drop the leading zeroes, or deal with variable number of columns (4 columns if ≥ 248, 3 if ≥ 232 etc.):

| nawk '{printf("%i %i %i\n",(NF>2)?$(NF-2):0,(NF>1)?$(NF-1):0,$NF)}'

The <(echo ...) parts allow bc to read the echo output as a file, a quick alternative to adding those lines to the top and bottom of every input file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top