Pergunta

When I program I often use external software to do the heavy computations, but then analysis the results in Python. These external software is often Fortran, C or C++, which works by giving them input file(s). This can either be a small file telling which mode to perform certain calculations, or a large data file it has to process. These files often use a certain format (so and so many spaces between data columns). An e.g. is given below for a data file I currently use.

This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9

My question is if there exist a Python library to create such input files, from reading a template (given by a coworker or from documentation of the external software)?

Usually I have all the columns in a NumPy format and want to give it to a function that creates an input file, using the template as an example. I'm not looking for a brute force method, which can get ugly very quickly.

I am not sure what to search for here, and any help is appreciated.

Foi útil?

Solução

I can basically replicate your sample with savetxt. Its fmt variable gives me the same sort of formatting control that FORTRAN code uses for reading and writing files. It preserves spaces in the same way that FORTRAN and C print does.

import numpy as np

example = """
This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
...
"""

lines = example.split('\n')[1:]
header = lines[0]
data = []
for line in lines[1:]:
  if len(line):
    data.append([float(x) for x in line.split()])
data = np.array(data)

fmt = '%10.3f %9.1f %9.2f %9.3f %20.1f'  # similar to a FORTRAN format statment
filename = 'stack21865757.txt'

with open(filename,'w') as f:
  np.savetxt(f, data, fmt, header=header)

with open(filename) as f:
  print f.read()

producing:

# This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                 11.2
  7353.510      26.0      4.73    -1.570                  3.5
...

EDIT

Here's a crude script that converts an example line into a format:

import re
tmplt = '  7352.103      26.0      2.61    -8.397                         11.2'
def fmt_from_template(tmplt):
    pat = r'( *-?\d+\.(\d+))' # one number with its decimal
    fmt = []
    while tmplt:
        match = re.search(pat,tmplt)
        if match:
            x = len(match.group(1)) # length of the whole number
            d = len(match.group(2)) # length of decimals
            fmt += ['%%%d.%df'%(x,d)]
            tmplt = tmplt[x:]
    fmt = ''.join(fmt)
    return fmt
print fmt_from_template(tmplt)
# %10.3f%10.1f%10.2f%10.3f%29.1f

Outras dicas

adapating hpaulj andwer to magically extract the fmt of savetxt

from __future__ import print_function
import numpy as np
import re
example = """
This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9
"""
def extract_format(line):
  def iter():
    for match in re.finditer(r"\s+-?\d+\.(\d+)",line):
      yield "%{}.{}f".format(len(match.group(0)),len(match.group(1)))
  return "".join(iter())

lines = example.split('\n')[1:]
header = lines[0]
data = []
for line in lines[1:]:
  if len(line):
    data.append([float(x) for x in line.split()])
data = np.array(data)

fmt = extract_format(lines[1])  # similar to a FORTRAN format statment

filename = 'stack21865757.txt'

with open(filename,'w') as f:
  print(header,file=f)
  np.savetxt(f, data, fmt)

with open(filename) as f:
  print (f.read())

producing

This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9

If your header is always the same, then you could look into pandas. This would allow you to move columns around really easily just by knowing the name of the column from the header. Even if the header isn't always the same, if you could get the headers from the template, then it could still rearrange it.

If I have misunderstood the question, then I am sorry, but more concrete data or a longer example might be nice for more help.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top