Question

I'm trying to split a variable length string across different but predefined line lengths. I've thrown together some code below which fails on key error 6 when I plonk it into Python Tutor (I don't have access to a proper python IDE right now) I guess this means my while loop isn't working properly and it's trying to keep incrementing lineNum but I'm not too sure why. Is there a better way to do this? Or is this easily fixable?

The code:

import re

#Dictionary containing the line number as key and the max line length
lineLengths = {
        1:9,
        2:11,
        3:12,
        4:14,
        5:14
               }

inputStr = "THIS IS A LONG DESC 7X7 NEEDS SPLITTING"        #Test string, should be split on the spaces and around the "X"

splitted = re.split("(?:\s|((?<=\d)X(?=\d)))",inputStr)     #splits inputStr on white space and where X is surrounded by numbers eg. dimensions

lineNum = 1                         #initialises the line number at 1

lineStr1 = ""                           #initialises each line as a string
lineStr2 = ""
lineStr3 = ""
lineStr4 = ""
lineStr5 = ""

#Dictionary creating dynamic line variables
lineNumDict = {
        1:lineStr1,
        2:lineStr2,
        3:lineStr3,
        4:lineStr4,
        5:lineStr5
        }

if len(inputStr) > 40:
    print "The short description is longer than 40 characters"
else:
    while lineNum <= 5:
        for word in splitted:
            if word != None:
                if len(lineNumDict[lineNum]+word) <= lineLengths[lineNum]:
                    lineNumDict[lineNum] += word
                else:
                    lineNum += 1
            else:
                if len(lineNumDict[lineNum])+1 <= lineLengths[lineNum]:
                    lineNumDict[lineNum] += " "
                else:
                    lineNum += 1

lineOut1 = lineStr1.strip()
lineOut2 = lineStr2.strip()
lineOut3 = lineStr3.strip()
lineOut4 = lineStr4.strip()
lineOut5 = lineStr5.strip()

I've taken a look at this answer but don't have any real understanding of C#: Split large text string into variable length strings without breaking words and keeping linebreaks and spaces

Was it helpful?

Solution

It doesn't work because you have the for words in splitted loop inside your loop with the lineLen condition. You have to do this:

    if len(inputStr) > 40:
        print "The short description is longer than 40 characters"
    else:
        for word in splitted:
            if lineNum > 5:
                 break
            if word != None:
                if len(lineNumDict[lineNum]+word) <= lineLengths[lineNum]:
                    lineNumDict[lineNum] += word
                else:
                    lineNum += 1
            else:
                if len(lineNumDict[lineNum])+1 <= lineLengths[lineNum]:
                    lineNumDict[lineNum] += " "
                else:
                    lineNum += 1

Also lineStr1, lineStr2 and so on won't be changed, you have to access the dict directly (strings are immutable). I tried it and got the results working:

    print("Lines: %s" % lineNumDict) 

Gives:

    Lines: {1: 'THIS IS A', 2: 'LONG DESC 7', 3: '7 NEEDS ', 4: '', 5: ''}

OTHER TIPS

for word in splitted:
    ...
    lineNum += 1

your code increments lineNum by the number of words in splitted, i.e. 16 times.

I wonder if a properly commented regular expression wouldn't be easier to understand?

lineLengths = {1:9,2:11,3:12,4:14,5:14}
inputStr = "THIS IS A LONG DESC 7X7 NEEDS SPLITTING" 
import re
pat = """
(?:                     # non-capture around the line as we want to drop leading spaces
    \s*                 # drop leading spaces
    (.{{1,{max_len}}})  # up to max_len characters, will be added through 'format'
    (?=[\b\sX]|$)       # and using word breaks, X and string ending as terminators
                        # but without capturing as we need X to go into the next match
)?                      # and ignoring missing matches if not all lines are necessary
"""

# build a pattern matching up to 5 lines with the corresponding max lengths
pattern = ''.join(pat.format(max_len=x) for x in lineLengths.values())

re.match(pattern, inputStr, re.VERBOSE).groups()
#  Out: ('THIS IS A', 'LONG DESC 7', '7 NEEDS', 'SPLITTING', None)

Also, there is no real point in using a dict for line_lengths, a list would do nicely.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top