Question

I Have been given a assignment where i need to instrument a given application and generate a trace file and later on from the trace file a sequence diagram needs to be generated. The application is written in python.The application was instrumented in the places where the method starts and exits.

My main purpose is to find the repetitive patterns in the trace file ?

The following is a sample of the trace file

Entering    get_instance    None    []  None    10:25:30:743000
Entering    __init__    ConfigHandler   ['config_filepath'] 56663624    10:25:30:743000
Entering    _load_config    ConfigHandler   ['path']    56663624    10:25:30:744000
Exited  _load_config    ConfigHandler   True    56663624    10:25:30:746000
Exited  __init__    ConfigHandler   None    56663624    10:25:30:747000
Exited  get_instance    None    <commons.ConfigHandler.ConfigHandler object at 0x0000000003609E48>  None    10:25:30:747000
Entering    __init__    ColumnConverter []  56963312    10:25:30:769000
Exited  __init__    ColumnConverter None    56963312    10:25:30:769000
Entering    __init__    PredicatesFactory   []  56963424    10:25:30:769000
Exited  __init__    PredicatesFactory   None    56963424    10:25:30:769000
Entering    __init__    LogFileConverter    []  56963536    10:25:30:769000
Exited  __init__    LogFileConverter    None    56963536    10:25:30:769000

how to find patterns of repetitions in a trace file ?

My main purpose is to find the repetitive patterns in the trace file ?

Was it helpful?

Solution

You can use the PrefixSpan algorithm to find sequential rules.

The paper:

http://www.cs.uiuc.edu/~hanj/pdf/span01.pdf

This site has open-source Java code, that you can gain inspiration from:

http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#examplePrefixSpan

OTHER TIPS

Consider checking out regular expressions for finding patterns.

For example, to match lines like this:

Exited  __init__    LogFileConverter    None    56963536    10:25:30:769000

You can use the following regex pattern:

>>> import re
>>> pattern = re.compile('Exited  __init__\s+(\w+)\s+(.*?)\s+(\d+)\s+(\d+:\d+:\d+\d+)')
>>> matches = re.findall(pattern, text)

With a bit of modification, you should be able to find the repetitive patterns.

If you wanted to find repetions in the first two fields, you could use them as a dictionary key and populate it with a list of all matching lines. When you have processed the entire file, those dictionary entries which contain a list of more than one element are repetitions.

#!/usr/bin/env python

import fileinput

def read (line, d=dict()):
    tokens = line.split()
    key = ' '.join(tokens[0:2])
    try:
        d[key].append(line)
    except KeyError:
        d[key] = [line]in v:
    return d

def main ():
    d = dict()
    for line in fileinput.input():
        read(line, d)
    for k  in d:
        v = d[k]
        if len(v) > 1:
            # print "### %s => %s" % (k, v)   for debugging
            for l in v:
                print l,

if __name__ == '__main__':
    main()

Sample output (with the debug prints enabled so you can see why it prints these outputs):

### Exited __init__ => ['Exited  __init__    ConfigHandler   None    56663624    10:25:30:747000\n', 'Exited  __init__    ColumnConverter None    56963312    10:25:30:769000\n', 'Exited  __init__    PredicatesFactory   None    56963424    10:25:30:769000\n', 'Exited  __init__    LogFileConverter    None    56963536    10:25:30:769000\n']
Exited  __init__    ConfigHandler   None    56663624    10:25:30:747000
Exited  __init__    ColumnConverter None    56963312    10:25:30:769000
Exited  __init__    PredicatesFactory   None    56963424    10:25:30:769000
Exited  __init__    LogFileConverter    None    56963536    10:25:30:769000
### Entering __init__ => ["Entering    __init__    ConfigHandler   ['config_filepath'] 56663624    10:25:30:743000\n", 'Entering    __init__    ColumnConverter []  56963312    10:25:30:769000\n', 'Entering    __init__    PredicatesFactory   []  56963424    10:25:30:769000\n', 'Entering    __init__    LogFileConverter    []  56963536    10:25:30:769000\n']
Entering    __init__    ConfigHandler   ['config_filepath'] 56663624    10:25:30:743000
Entering    __init__    ColumnConverter []  56963312    10:25:30:769000
Entering    __init__    PredicatesFactory   []  56963424    10:25:30:769000
Entering    __init__    LogFileConverter    []  56963536    10:25:30:769000
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top