Python3.3: Square-root optimization

Question 1

This is a negative result. I tried using a couple of tricks to make it faster, but it's only a little bit faster.

import sys, re, math

def find_numbers(f):
    for line in f:
        for word in line.split():
            if word.isdigit():
                yield float(word)

lst = list(find_numbers(sys.stdin))
lst.reverse()
for x in lst:
    print('%.4f' % math.sqrt(x))

I thought reversing the list might be making it slow, but it didn't really make much difference when I just printed the numbers without reversing.

The fastest solution for Python would be to just run the above code in PyPy.

This is not a very difficult problem, and if you need speed, you might want to write a solution in C code. C code will be about as fast as you can get for this problem.

Question 2

You can try loading and processing the file with Numpy:

import numpy as np
for i in reversed(np.fromfile(sys.stdin, sep=' ')**0.5):
    print i

As a high-performance numeric library for Python, I'd expect it to be the fastest solution available to you.

Question 3

You asked for Python, but this can be done pretty well in C. This C program does not reverse the numbers, but you can simply pipe the output through the tac program, which is like cat but reverses the lines.

In my tests this is about 3x the speed of the NumPy solution, and about 6x the speed of my Python solution or the original solution.

#include <ctype.h>
#include <math.h>
#include <stdio.h>

int
main()
{
    char buf[1024];
    char ch;
    float f;
    int i, n;

    for (i = 0;;)
    {
        ch = getchar();

        if (i > sizeof(buf))
        {
            fprintf(stderr, "number too long!\n");
            return 1;
        }

        if (isspace(ch) || EOF == ch)
        {
            if (i > 0)
            {
                buf[i] = '\0';
                n = atoi(buf);
                f = sqrtf(n);
                printf("%0.4f\n", f);
                i = 0;
            }

            if (EOF == ch)
                return 0;

            continue;
        }

        buf[i++] = ch;
    }
}

Question 4

Update: Apologies for posting a duplicate of steveha's much earlier answer. Speaks volumes about my reading skills. Still leaving this answer online for now, just because of my musings about the i/o/buffering/runtime effects.

Original post:

I cannot believe that it takes Python longer to apply one regular expression, and calculate one square root, than it does take to read one line from standard input and output a result on standard output (or any I/O for that matter).

Since I/O at one point in time will come from a hardrive and will either go to another hardrive or to a user's eye, that should be the limiting factor.

I/O is usually buffered for speedup. Usually a buffer is filled in bursts, then the cpu idles while waiting for the device to provide more data.

That leads to a generator for your application. Write a generator that reads input line by line and immediate provides a sqrt number on demand. I doubt that this will be any slower than the overall I/O speed on any reasonable modern hardware. If you're on a special device (like embedded, uController, Raspberry Pi, etc let us know)

The one optimization you can do is precompile the regular expression. As you're using the same regexp for each test, let's do the parsing of the regexp only once. Your example in the question is fine because you're doing a re.findall(). I'm just elaborating for other readers.

import sys, re, math

pattern = re.compile(r'\b\d+\b')

def fh_numbers_to_sqrt(fh):
    for line in fh:
        for i in re.findall(pattern, line):
            yield math.sqrt(float(i))

numbers_g = fh_numbers_to_sqrt(sys.stdin)
for num in numbers_g:
    print('%.4f' % num)

This allows all the regexp and math operations to interleave with the I/O times.

Now, the one thing we simply cannot really optimize and integrate is the reverse. The algorithm must wait until the last element to be able to reverse.

So we could change the calling code to:

numbers_g = fh_numbers_to_sqrt(sys.stdin)
for num in reverse(list(numbers_g)):
    print('%.4f' % num)

And hope that this is faster what you originally had. Again, the only reason why this should be faster is because we've hidden the runtime of regexp parsing and calculation within the wall clock time it takes to read data from standard input. This should still be I/O limited. Actually the reverse might not really add to overall runtime, because it just might interleave with the I/O happening on standard output. Looking at a wall clock, this algorithm might use up no time at all. :-)

To prove or negate my whole post, you could measure with time.time() how long it takes from the start of your script to just before the line Data = re.findall, and from then on to the end. If I'm correct then the data reading will take most of the time. If not, it's worthwhile to measure also the time required for all the regular expression searches. Let us know. I'm curious...

Question 5

import sys, re, math
str1 = str(sys.stdin.readlines())
Data = re.findall('\\b\\d+\\b', str1)

d2 = [round(math.sqrt(float(i)),4) for i in reversed (Data)]

for i in d2:
    print(i)