Question

I'm using pyparser to process the output of a hex-to-text converter. It prints out 16 characters per line, separated by spaces. If the hex value is an ASCII-printable character, that character is printed, otherwise the converter outputs a period (.)

Mostly the output looks like this:

. a . v a l i d . s t r i n g .
. a n o t h e r . s t r i n g .
. e t c . . . . . . . . . . . .

My pyparsing code to describe this line is:

dump_line = 16 * Word(printables, exact=1)

This works fine, until the hex-to-text converter hits a hex value of 0x20, which causes it to output a space.

l i n e . w . a .   s p a c e .

In that case, pyparsing ignores the outputted space and takes up characters from the following line to make the "quota" of 16 characters.

Can someone please suggest how I can tell pyparsing to expect 16 characters, each separated by a space, where a space can also be a valid character?

Thanks in advance. J

Was it helpful?

Solution

Since this has significant whitespace, you'll need to tell your character expression to leave leading whitespace alone. See how this is done below in the definition of dumpchar:

hexdump = """\
. a . v a l i d . s t r i n g . 
. a n o t h e r . s t r i n g . 
. e t c . . . . . . . . . . . . 
l i n e . w . a .   s p a c e . 
. e t c . . . . . . . . . . . . 
"""

from pyparsing import oneOf, printables, delimitedList, White, LineEnd

# expression for a single char or space
dumpchar = oneOf(list(printables)+[' ']).leaveWhitespace()

# convert '.'s to something else, if you like; in this example, '_'
dumpchar.setParseAction(lambda t:'_' if t[0]=='.' else None)

# expression for a whole line of dump chars - intervening spaces will
# be discarded by delimitedList
dumpline = delimitedList(dumpchar, delim=White(' ',exact=1)) + LineEnd().suppress()

# if you want the intervening spaces, use this form instead
#dumpline = delimitedList(dumpchar, delim=White(' ',exact=1), combine=True) + LineEnd().suppress()

# read dumped lines from hexdump
for t in dumpline.searchString(hexdump):
    print ''.join(t)

Prints:

_a_valid_string_
_another_string_
_etc____________
line_w_a_ space_
_etc____________

OTHER TIPS

Consider using another way to remove the spaces

>>> s=". a . v a l i d . s t r i n g ."
>>> s=s[::2]
>>> s
'.a.valid.string.'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top