Question

I'm using numpy savetxt() to save the elements of a matrix to file as a single row (I need to print lots of them in order). This is the method I've found:

import numpy as np

mat = np.array([[1,2,3],
                [4,5,6],
                [7,8,9]])

with open('myfile.dat','a') as handle:
    np.savetxt(handle, mat.reshape(1,mat.size), fmt='%+.8e')
handle.close()

There are 2 questions:

1) Is savetxt() the best option? I need to print 1e5 to 1e7 of these things... and I don't want i/o bottlenecking the actual computation. I'm guessing reopening the file each iteration is a bad plan, speed-wise.

2) Ideally I would have some context data printed to start each row so my output might look like:

(N foo mat):

...
6 -2.309 +1.000 +2.000 ...
7 -4.273 +1.000 +2.000 ...
8 -3.664 +1.000 +2.000 ...
...

I could do this using np.append(), but then the first number won't print as an INT. Is this sort of thing doable directly in savetxt()? Or do I need a C-like fprintf() anyways?

Was it helpful?

Solution 2

OK. My original code for printing as an array only works if you want to print once. The mat.reshape() method doesn't just return the reshaped matrix it alters mmat itself. This means the next time through the loop any linalg routines will fail.

To avoid this we need to reshape a copy() of mat. I've also added a tmp variable for clarity.

import numpy as np

mat = np.array([[1,2,3],
                [4,5,6],
                [7,8,9]]) # initialize mat to see format

handle = open('myfile.dat', 'ab')
for n in range(N):
    # perform linalg calculations on mat ...
    meta = foo # based on current mat

    tmp = np.hstack( ([[n]], [[meta]], (mat.copy()).reshape(1,mat.size)) )
    np.savetxt(handle, tmp, fmt='%+.8e')

handle.close()

This gets the context data n and meta in this case. I can live with n being saved as a float.

I did some bench marking to check the i/o cost. I set N=100,000 for the loop, and average the run time for 6 runs:

  • no i/o, just computations: 9.1 sec
  • as coded above: 17.2 sec
  • open 'myfile.dat' to append each iteration: 30.6 sec

So the i/o doubles the runtime and, as expected, constantly opening and closing a file is a bad plan.

OTHER TIPS

Pandas has a good to_csv method:

import pandas as pd
import numpy as np

mat = np.array([[1,2,3],
                [4,5,6],
                [7,8,9]])
df = pd.DataFrame(data=mat.astype(np.float))
df.to_csv('myfile.dat', sep=' ', float_format='%+.8e', header=False)

By default it'll add the index (index=True), though if you wanted different context data you could just add that to your data frame and set index=False

$ cat myfile.dat 
0 +1.00000000e+00 +2.00000000e+00 +3.00000000e+00
1 +4.00000000e+00 +5.00000000e+00 +6.00000000e+00
2 +7.00000000e+00 +8.00000000e+00 +9.00000000e+00
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top