파이썬을 사용하여 파일의 필드 구분 기호를 변경하는 방법은 무엇입니까?

StackOverflow https://stackoverflow.com/questions/6040711

  •  14-11-2019
  •  | 
  •  

문제

r 세계에서 파이썬을 처음 사용하고 있으며, 데이터 열로 구성된 큰 텍스트 파일을 작업하고 있습니다 (이것은 일반적으로 60 만 + 기록이므로 라이 덮개 데이터입니다).

는 파일을 읽지 않고도 큰 파일의 필드 구분 기호 (예 : 탭 - 구분 된 쉼표로 구분 된 것)를 변경하고 줄에 for 루프를 수행 할 수 있습니까?

도움이 되었습니까?

해결책

No.

  • Read the file in
  • Change separators for each line
  • Write each line back

This is easily doable with just a few lines of Python (not tested but the general approach works):

# Python - it's so readable, the code basically just writes itself ;-)
#
with open('infile') as infile:
  with open('outfile', 'w') as outfile:
    for line in infile:
      fields = line.split('\t')
      outfile.write(','.join(fields))

I'm not familiar with R, but if it has a library function for this it's probably doing exactly the same thing.

Note that this code only reads one line at a time from the file, so the file can be larger than the physical RAM - it's never wholly loaded in.

다른 팁

You can use the linux tr command to replace any character with any other character.

Actually lets say yes, you can do it without loops eg:

with open('in') as infile:
  with open('out', 'w') as outfile:
      map(lambda line: outfile.write(','.join(line.split('\n'))), infile)

You cant, but i strongly advise you to check generators.

Point is that you can make faster and well structured program without need to write and store data in memory in order to process it.

For instance

file = open("bigfile","w")
j = (i.split("\t") for i in file)
s = (","join(i) for i in j)
#and now magic happens
for i in s:
     some_other_file.write(i)

This code spends memory for holding only single line.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top