reading an array with missing data and spaces in the first column

https://stackoverflow.com/questions/21712204

10-10-2022
|

题

I have a .txt file I want to read using pyhon. The file is an array. It contains data on comets. I copied 3 rows out of the 3000 rows.

P/2011 U1 PANSTARRS               1.54 0.5   14.21 145.294 352.628 6098.07
P/2011 VJ5 Lemmon                 4.12 0.5    2.45 139.978 315.127 5904.20 *
149P/Mueller 4                    3.67 0.1    5.32  85.280  27.963 6064.72

I am reading the array using the the following code:

import numpy as np
list_comet = np.genfromtxt('jfc_master.txt', dtype=None)

I am facing 2 different problems:

First, in row 1 the name of the comet is: P/2011 U1 PANSTARRS. If I type: list_comet[0][1] the result will be P/2011. How should I tell python how to read the name of each comet? Note that the longest name is 31 characters. So what is the command to tell python that column 1 is 31 characters long?

Second, in row 2 that value of the last column is *. When I read the file I am receiving an error which says that:

Line #2941 (got 41 columns instead of 40)

(note that the above data is not the complete data, the total number of columns I have in my original data is 38). I guess I am receiving this error due to the * found in certain rows. How can I fix this problem?

解决方案

You didn't mention what data structure you're looking for, i.e. what operations you intend to perform on the parsed data. In the simplest case, you could massage the file into a list of 8-tuples - the last element being either '*' or an empty string. That is as simple as

import string

def tokenize(s):
    if s[-1] == '*':
        return string.rsplit(s, None, 7)
    else:
        return string.rsplit(s, None, 6) + ['']

tokens = (tokenize(line.rstrip()) for line in open('so21712204.txt'))

To be fair, this doesn't make tokens a list of 8-tuples but rather a generator (which is more space efficient) of lists, each of which having 8 elements.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow