Question

I wanna abbreviate the way I import multiples files with loadtxt, I do the next:

rc1    =loadtxt("20120701_Gp_xr_5m.txt", skiprows=19)
rc2    =loadtxt("20120702_Gp_xr_5m.txt", skiprows=19)
rc3    =loadtxt("20120703_Gp_xr_5m.txt", skiprows=19)
rc4    =loadtxt("20120704_Gp_xr_5m.txt", skiprows=19)
rc5    =loadtxt("20120705_Gp_xr_5m.txt", skiprows=19)
rc6    =loadtxt("20120706_Gp_xr_5m.txt", skiprows=19)
rc7    =loadtxt("20120707_Gp_xr_5m.txt", skiprows=19)
rc8    =loadtxt("20120708_Gp_xr_5m.txt", skiprows=19)
rc9    =loadtxt("20120709_Gp_xr_5m.txt", skiprows=19)
rc10   =loadtxt("20120710_Gp_xr_5m.txt", skiprows=19)

Then I concatenate them using:

GOES   =concatenate((rc1,rc2,rc3,rc4,rc5,rc6,rc7,rc8,rc9,
                     rc10),axis=0)

But my question is: Do I wanna reduce all of this? Maybe with a FOR or something like that. Since the files are a secuence of dates (strings).

I was thinking to do something like this

day= #### i dont know how define a string going from 01 to 31 for example

data="201207"+day+"_Gp_xr_5m.txt"

Then do this, but i think is not correct

GOES=loadtxt(data, skiprows=19)
Was it helpful?

Solution

Yes, you can easily get your sub-arrays with a for-loop, or with an equivalent list comprehension. Use the glob module to get the desired file names:

import numpy as np  # you probably don't need this line
from glob import glob

fnames = glob('path/to/dir')
arrays = [np.loadtxt(f, skiprows=19) for f in fnames]
final_array = np.concatenate(arrays)

If memory use becomes a problem, you can also iterate over all files line by line by chaining them and feeding that generator to np.loadtxt.


edit after OP's comment

My example with glob wasn't very clear..

You can use "wildcards" * to match files, e.g. glob('*') to get a list of all files in the current directory. A part of the code above could therefor be written better as:

fnames = glob('path/to/dir/201207*_Gp_xr_5m.txt')

Or if your program already runs from the right directory:

fnames = glob('201207*_Gp_xr_5m.txt')

I forgot this earlier, but you should also sort the list of filenames, because the list of filenames from glob is not guaranteed to be sorted.

fnames.sort()

A slightly different approach, more in the direction of what you were thinking is the following. When variable day contains the day number you can put it in the filename like so:

daystr = str(day).zfill(2)
fname = '201207' + daystr + '_Gp_xr_5m.txt'

Or using a clever format specifier:

fname = '201207{:02}_Gp_xr_5m.txt'.format(day)

Or the "old" way:

fname = '201207%02i_Gp_xr_5m.txt' % day

Then simply use this in a for-loop:

arrays = []
for day in range(1, 32):
    daystr = str(day).zfill(2)
    fname = '201207' + daystr + '_Gp_xr_5m.txt'
    a = np.loadtxt(fname, skiprows=19)
    arrays.append(a)

final_array = np.concatenate(arrays)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top