Pattern finding LZW python

https://stackoverflow.com/questions/22363942

13-06-2023
|

Question

LZW algorithm is used to find patterns between input symbols. But can it seek pattern among words ? I mean the alfabet index not to be symbols but words for example for the input :

'abcd', 'abcd', 'fasf' , 'asda', 'abcd' , 'fasf' ...

to have an output like :

'abcd', '1', 'fasf' , 'asda' , '1', '2' ...

Or is there any compressing algorithm that does the trick ?

Solution

keys = []
def lzw(text):
      tokens = text.split()
      new_keys = dict.fromkeys(tokens).keys()
      keys.extend([key for key in new_keys if  key not in keys])
      encoded = ["%s"%keys.index(tok) for tok in tokens]
      for i,key in enumerate(keys):
           try:
              encoded[encoded.index(str(i))] = key
           except:
               pass
      return " ".join(encoded)

print lzw("abcd abcd fasf asda abcd fasf")
#outputs: abcd 0 fasf asda 0 2

is a pretty easy implementation

OTHER TIPS

You can use this code which will search through strings to find a pattern. You will need to know what pattern you want to search for though.

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig') =>  found, match.group() == "iii"
match = re.search(r'igs', 'piiig') =>  not found, match == None

Have a read of this website: https://developers.google.com/edu/python/regular-expressions?hl=iw

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow