Question

Logstash's grok is a string parsing tool which built on top of regex, it provides many patterns that make string parsing jobs so much easier, I just fell in love with it the first time I used it. But unfortunately, it's written in Ruby, makes it impossible to be used in my Python projects, so I'm wondering is there any Python implementation of grok, or is there any Python alternative that can simplify string parsing like grok do?

Was it helpful?

Solution 2

I'm not aware on any python ports of grok, but this functionality seems pretty straightforward to implement:

import re

types = {
    'WORD': r'\w+',
    'NUMBER': r'\d+',
    # todo: extend me
}


def compile(pat):
    return re.sub(r'%{(\w+):(\w+)}', 
        lambda m: "(?P<" + m.group(2) + ">" + types[m.group(1)] + ")", pat)


rr = compile("%{WORD:method} %{NUMBER:bytes} %{NUMBER:duration}")

print re.search(rr, "hello 123 456").groupdict()
# {'duration': '456', 'bytes': '123', 'method': 'hello'}

OTHER TIPS

I built a project in github called pygrok based on @georg 's answer to meet my log pattern parsing requirements in python codes.I think pygrok may be helpful for you,Let me introduce it in brief:

pygrok

A Python library to parse strings and extract information from structured/unstructured data

What can I use Grok for?

  • parsing and matching patterns in a string(log, message etc.)
  • relieving from complex regular expressions.
  • extracting information from structured/unstructured data

You can find it here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top