Question

I have a text file containing simulation data (60 columns, 100k rows):

a  b   c  
1  11 111
2  22 222
3  33 333
4  44 444

... where in the first row are variable names, and beneath (in columns) is the corresponding data (float type).

I need to use all these variables with their data in Python for further calculations. For example, when I insert:

print(b)

I need to receive the values from the second column.

I know how to import data:

data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

Assign variables "manually":

a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

But I'm having trouble with getting variable names:

reader = csv.reader(open("1.txt", "rt"))
for row in reader: 
   list.append(row)
variables=(list[0])  

How can I change this code to get all variable names from the first row and assign them to the imported arrays ?

Was it helpful?

Solution

Instead of trying to assign names, you might think about using an associative array, which is known in Python as a dict, to store your variables and their values. The code could then look something like this (borrowing liberally from the csv docs):

import csv
with open('1.txt', 'rt') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)

  lineData = list()

  cols = next(reader)
  print(cols)

  for col in cols:
    # Create a list in lineData for each column of data.
    lineData.append(list())


  for line in reader:
    for i in xrange(0, len(lineData)):
      # Copy the data from the line into the correct columns.
      lineData[i].append(line[i])

  data = dict()

  for i in xrange(0, len(cols)):
    # Create each key in the dict with the data in its column.
    data[cols[i]] = lineData[i]

print(data)

data then contains each of your variables, which can be accessed via data['varname'].

So, for example, you could do data['a'] to get the list ['1', '2', '3', '4'] given the input provided in your question.

I think trying to create names based on data in your document might be a rather awkward way to do this, compared to the dict-based method shown above. If you really want to do that, though, you might look into reflection in Python (a subject I don't really know anything about).

OTHER TIPS

The answer is: you don't want to do that.

Dictionaries are designed for exactly this purpose: the data structure you actually want is going to be something like:

data = {
    "a": [1, 2, 3, 4],
    "b": [11, 22, 33, 44],
    "c": [111, 222, 333, 444],
}

... which you can then easily access using e.g. data["a"].

It's possible to do what you want, but the usual way is a hack which relies on the fact that Python uses (drumroll) a dict internally to store variables - and since your code won't know the names of those variables, you'll be stuck using dictionary access to get at them as well ... so you might as well just use a dictionary in the first place.

It's worth pointing out that this is deliberately made difficult in Python, because if your code doesn't know the names of your variables, they are by definition data rather than logic, and should be treated as such.

In case you aren't convinced yet, here's a good article on this subject:

Stupid Python Ideas: Why you don't want to dynamically create variables

Thanks to @andyg0808 and @Zero Piraeus I have found another solution. For me, the most appropriate - using Pandas Data Analysis Library.

   import pandas as pd

   data=pd.read_csv("1.txt",
           delim_whitespace=True,
           skipinitialspace=True)

  result=data["a"]*data["b"]*3
  print(result)

  0     33
  1    132
  2    297
  3    528

...where 0,1,2,3 are the row index.

Here is a simple way to convert a .txt file of variable names and data to NumPy arrays.

D = np.genfromtxt('1.txt',dtype='str')    # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:]                          # save a list of the variable names

for i in range(len(D_names)):
    key = D_names[i]                      # define the key for this variable 
    val = D_data[:,i]                     # set the value for this variable 
    exec(key + '=val')                    # build the variable  code here

I like this method because it is easy to follow and simple to maintain. We can compact this code as follows:

D = np.genfromtxt('1.txt',dtype='str')     # load the data in as strings
for i in range(D.shape[1]):
    val = np.asarray(D[1::,i],dtype=float) # set the value for this variable 
    exec(D[0,i] + '=val')                  # build the variable 

Both codes do the same thing, return NumPy arrays named a,b, and c with their associated data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top