Python - extract and modify part of a specific line of text with a function for all files in folder

StackOverflow https://stackoverflow.com/questions/22253547

  •  11-06-2023
  •  | 
  •  

Domanda

I'm looking to extract and modify a specific line of text in many files within a folder but I am having some trouble.

For instance, the first file might read:

To: Bob
From: Bill
<Message> The eagle flies at midnight. <End Message>

The second message is different, but same format, and so on. I'd like to extract the third line, pass 'The eagle flies at midnight.' through a function (like base64), and then put it back on the line between 'Message' and 'End Message'. Such that the final output would read:

To: Bob
From: Bill
<Message> VGhlIGVhZ2xlIGZsaWVzIGF0IG1pZG5pZ2h0Lg== <End Message>

This is what I am trying (and adjusting) so far.

import base64
import os
import io

#ask user where his stuff is / is going
directory = raw_input("INPUT Folder:")
output = raw_input("OUTPUT Folder:")

#get that stuff
myfilepath = os.path.join(directory, '*.txt')
with open('*.txt', 'r') as file:
data = file.readlines()

#Go to line 3 and take out non encoded text.
data[3] = X
X.strip("<Message>")
X.strip("<End Message>")
coded_string = X

#Encode line 3
base64.b64encode(coded_string)
data[3] = '<Message> %s <End Message>' % (coded_string)

# and write everything back
with open('*.txt', 'w') as file:
file.writelines(data)

I'm sure there are numerous problems, particularly with how I am opening and writing back. Bonus points: 99% of the messages in this folder are in this exact format, but there are 1% junk messages (they dont need to be encoded, and line 3 for them is something different). I'm not too worried about them, but if they could be unharmed in the process that'd be nifty. Maybe line 3 should be line 2 if the count starts at 0 ...

Edit: Trying

import re, base64
import os

folder = 'C:/Users/xxx/Desktop/input'
matcher = re.compile("<Message>(?P<text>[^<]*)<End Message>")

for filename in os.listdir(folder):
    infilename = os.path.join(folder, filename)
    if not os.path.isfile(infilename): continue

    base, extension = os.path.splitext(filename)
    filein = open(infilename, 'r')
    fileout = open(os.path.join(folder, '{}_edit.{}'.format(base, extension)), 'w')
for line in filein:
match = matcher.search(line)
if match:
    fileout.write("<message> " + base64.b64encode(match.group('text').strip()) + " <End message>\n")
else:
    fileout.write(line)

filein.close()
fileout.close()

Ultimately this gives me a bunch of blank files except for the last one which is translated properly.

È stato utile?

Soluzione

You can use regular expression to make it easier as:

import re, base64

filein = open("examplein.txt", 'r')
fileout = open("exampleout.txt", 'w')

matcher = re.compile("<Message>(?P<text>[^<]*)<End Message>")

for line in filein:
    match = matcher.search(line)
    if match:
        fileout.write("<message> " + base64.b64encode(match.group('text').strip()) + " <End message>\n")
    else:
        fileout.write(line)

filein.close()
fileout.close()

This code works just for one file, you should adapt it to work with all the file in you directory:

import re, base64
import os

folder = '/home/user/Public'
matcher = re.compile("<Message>(?P<text>[^<]*)<End Message>")

for filename in os.listdir(folder):
    infilename = os.path.join(folder, filename)
    if not os.path.isfile(infilename): continue

    base, extension = os.path.splitext(filename)
    filein = open(infilename, 'r')
    fileout = open(os.path.join(folder, '{}_edit.{}'.format(base, extension)), 'w')
    for line in filein:
        match = matcher.search(line)
        if match:
            fileout.write("<message> " + base64.b64encode(match.group('text').strip()) + " <End message>\n")
        else:
            fileout.write(line)

    filein.close()
    fileout.close()

This code works in my pc

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top