Python에서 파일의 줄 검색 및 바꾸기

https://stackoverflow.com/questions/39086

file
python

09-06-2019
|

문제

텍스트 파일의 내용을 반복하여 검색하고 일부 줄을 바꾸고 결과를 파일에 다시 쓰고 싶습니다.먼저 전체 파일을 메모리에 로드한 다음 다시 쓸 수 있지만 아마도 이것이 최선의 방법은 아닐 것입니다.

다음 코드 내에서 이를 수행하는 가장 좋은 방법은 무엇입니까?

f = open(file)
for line in f:
    if line.contains('foo'):
        newline = line.replace('foo', 'bar')
        # how to write this newline back to the file

해결책

이런 식으로 하면 될 것 같아요.기본적으로 내용을 새 파일에 쓰고 이전 파일을 새 파일로 바꿉니다.

from tempfile import mkstemp
from shutil import move
from os import fdopen, remove

def replace(file_path, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    with fdopen(fh,'w') as new_file:
        with open(file_path) as old_file:
            for line in old_file:
                new_file.write(line.replace(pattern, subst))
    #Remove original file
    remove(file_path)
    #Move new file
    move(abs_path, file_path)

다른 팁

가장 짧은 방법은 아마도 파일 입력 모듈.예를 들어, 다음은 파일에 줄 번호를 내부에 추가합니다.

import fileinput

for line in fileinput.input("test.txt", inplace=True):
    print "%d: %s" % (fileinput.filelineno(), line),

여기서 일어나는 일은 다음과 같습니다.

원본 파일이 백업 파일로 이동됩니다.
표준 출력은 루프 내의 원본 파일로 리디렉션됩니다.
따라서 어떤 print 명령문은 원본 파일에 다시 기록됩니다.

fileinput 더 많은 종소리와 휘파람 소리가 있습니다.예를 들어, 다음의 모든 파일에 대해 자동으로 작업하는 데 사용할 수 있습니다. sys.args[1:], 명시적으로 반복할 필요 없이.Python 3.2부터는 Python 3.2에서 사용할 수 있는 편리한 컨텍스트 관리자도 제공합니다. with 성명.

하는 동안 fileinput 일회용 스크립트에 적합합니다. 읽기가 쉽지 않거나 익숙하지 않기 때문에 실제 코드에서 사용하는 것은 조심스럽습니다.실제(프로덕션) 코드에서는 프로세스를 명시적으로 만들고 코드를 읽기 쉽게 만들기 위해 코드 몇 줄만 더 사용하는 것이 좋습니다.

두 가지 옵션이 있습니다:

파일은 지나치게 크지 않으며 메모리에 완전히 읽을 수 있습니다.그런 다음 파일을 닫고 쓰기 모드로 다시 열고 수정된 내용을 다시 씁니다.
파일이 너무 커서 메모리에 저장할 수 없습니다.임시 파일로 옮겨서 열어서 한 줄씩 읽고 원본 파일에 다시 쓸 수 있습니다.이를 위해서는 두 배의 저장 공간이 필요합니다.

다음은 테스트되었으며 검색 및 바꾸기 패턴과 일치하는 또 다른 예입니다.

import fileinput
import sys

def replaceAll(file,searchExp,replaceExp):
    for line in fileinput.input(file, inplace=1):
        if searchExp in line:
            line = line.replace(searchExp,replaceExp)
        sys.stdout.write(line)

사용 예:

replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")

이것은 작동합니다:(내부 편집)

import fileinput

# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1): 
      print line.replace("foo", "bar"),

Thomas Watnedal의 답변을 바탕으로 합니다.그러나 이것은 원래 질문의 라인 간 부분에 정확하게 대답하지 않습니다.이 기능은 여전히 라인 단위로 대체될 수 있습니다.

이 구현은 임시 파일을 사용하지 않고 파일 내용을 대체하므로 결과적으로 파일 권한은 변경되지 않습니다.

또한 교체 대신 re.sub를 사용하면 일반 텍스트 교체 대신 정규식 교체만 허용됩니다.

파일을 한 줄씩 읽는 대신 단일 문자열로 읽으면 여러 줄 일치 및 교체가 가능합니다.

import re

def replace(file, pattern, subst):
    # Read contents from file as a single string
    file_handle = open(file, 'r')
    file_string = file_handle.read()
    file_handle.close()

    # Use RE package to allow for replacement (also allowing for (multiline) REGEX)
    file_string = (re.sub(pattern, subst, file_string))

    # Write contents to file.
    # Using mode 'w' truncates the file.
    file_handle = open(file, 'w')
    file_handle.write(file_string)
    file_handle.close()

lassevk가 제안한 대로 새 파일을 작성하세요. 다음은 몇 가지 예제 코드입니다.

fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
    fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()

대체하는 일반적인 기능을 원하는 경우 어느 텍스트를 다른 텍스트와 함께 사용하는 경우, 특히 정규식 팬인 경우 이것이 가장 좋은 방법일 것입니다.

import re
def replace( filePath, text, subs, flags=0 ):
    with open( filePath, "r+" ) as file:
        fileContents = file.read()
        textPattern = re.compile( re.escape( text ), flags )
        fileContents = textPattern.sub( subs, fileContents )
        file.seek( 0 )
        file.truncate()
        file.write( fileContents )

좀 더 파이썬적인 방법은 아래 코드와 같은 컨텍스트 관리자를 사용하는 것입니다.

from tempfile import mkstemp
from shutil import move
from os import remove

def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()
    with open(target_file_path, 'w') as target_file:
        with open(source_file_path, 'r') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

전체 내용을 찾을 수 있습니다 여기.

새 파일을 만들고, 이전 줄에서 새 줄로 복사하고, 새 파일에 줄을 쓰기 전에 바꾸기를 수행합니다.

더 간결하고 Pythonic하다는 데 동의하는 @Kiran의 답변을 확장하면 UTF-8 읽기 및 쓰기를 지원하는 코덱이 추가됩니다.

import codecs 

from tempfile import mkstemp
from shutil import move
from os import remove


def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()

    with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
        with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

hamishmcn의 답변을 템플릿으로 사용하여 파일에서 내 정규식과 일치하는 줄을 검색하고 이를 빈 문자열로 바꿀 수 있었습니다.

import re 

fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
    p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
    newline = p.sub('',line) # replace matching strings with empty string
    print newline
    fout.write(newline)
fin.close()
fout.close()

아래와 같이 들여쓰기를 제거하면 여러 줄로 검색하여 교체하게 됩니다.예를 들어 아래를 참조하세요.

def replace(file, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    print fh, abs_path
    new_file = open(abs_path,'w')
    old_file = open(file)
    for line in old_file:
        new_file.write(line.replace(pattern, subst))
    #close temp file
    new_file.close()
    close(fh)
    old_file.close()
    #Remove original file
    remove(file)
    #Move new file
    move(abs_path, file)

Linux 사용자의 경우:

import os
os.system('sed -i \'s/foo/bar/\' '+file_path)

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow