Question

I'm using this code to parse a text file and format it in a way that puts every sentence in a new line:

import re

# open the file to be formatted 
filename=open('inputfile.txt','r')
f=filename.read()
filename.close()

# put every sentence in a new line 
pat = ('(?<!Dr)(?<!Esq)\. +(?=[A-Z])')
lines = re.sub(pat,'.\n',f)
print lines 

# write the formatted text 
# into a new txt file 
filename = open("outputfile.txt", "w")
filename.write(lines)
filename.close()

But essentially I need to split the sentences after 110 characters. So in case when a sentence in a line is longer than 110, it would split it and add '...' in the end, and then start a new line with '...' and following other part of the splitted sentence, and so on.

Any suggestions how to do that? I'm somehow lost.

Was it helpful?

Solution

# open inputfile/read/auto-close 
with open('inputfile.txt') as f:
    lines = f.readlines() # with block auto closes file after block is executed

output = []

for line in lines:
    if len(line) > 110:
        while True: # until break
            output.append(line[:107] + '...') 
            if len(line[107:]) < 111: # if remainder of line is under 110 chars
                output.append('...' + line[107:])
                break
            line = line[107:] # otherwise loop continues with new line definition
    else:
        output.append(line)

# open outputfile/write/auto-closed
with open('outputfile.txt', 'w') as f:
    for line in output:
        f.write(line)

OTHER TIPS

I don't know the content of "lines", but, if this is not a list with each line, you need to split all the lines in a list.

After you have a list with those strings (lines), you can verify how many characteres are in the string, and if is more then 110, you get the 107 firsts and put '...' at the end. Like this:

for i in xrange(0, len(lines)):
    string_line = lines[i]
    if len(string_line) > 110:
        new_string = "{0}...".format(string_line[:107])
        lines[i] = new_string

Explaning:

if you do this:

string = "Hello"
print len(string)

result will be: 5

print string[:3]

result will be: "Hel"

You can't insert in the same file in python. Something like this will do what you describe.

WARNING: make a backup of the file before as the existing file will be replaced.

from shutil import move
import os

insert=" #####blabla#### "
insert_position=110


targetfile="full/path/to/target_file"
tmpfile="/full/path/to/tmp.txt"

output=open(tmpfile,"w")

with open(targetfile,"r") as myfile:
    for line in myfile:
        if len(line) >insert_position:
            line=line[:insert_position+1] + insert + "\n" + line[insert_position+1:] 
            myfile.write

        output.write(line)  

output.close()

move(tmpfile,targetfile)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top