First of all happy Independence Day! for those who apply!

I´m analyzing an Ab Initio graphs, for that, I need to obtain the name of the component, the one that the developer used to describe it´s functionality, which I can extract from the following line.

name ='}}@0|@207000|80000|227000|100000|152000|126000|11654|RFMT: Generate Labels Header|Ab Initio Software|Built-in|1|100|0||6||32769|1|{1|0|}}}'

I tried to use regex to extract the name of the component which is: RFMT: Generate Labels Header.

There comes the problem:

My delimiter is |Ab Initio Software that means, I need to use regex from right to left. is there any way to acomplish that using Python.

The most eficient solution I have came up with is to reverse everything.

name = line[::-1]
name = re.search('erawtfoS oitinI bA\|(.*?)\|', name, re.IGNORECASE).group(1)
name = name[::-1]

All I want is to make it more efficient because is going to be used on hundreds of graphs and most of those files are quite large.

有帮助吗?

解决方案

You could just match non-| characters and use lookarounds to make sure it's the element before Ab Initio...:

re.search(r'(?<=[|])[^|]*(?=[|]Ab Initio Software)', name, re.IGNORECASE).group()

Even without the lookahead, if you just change (.*?) to the more explicit [^|]*, you'd get the right result. But the greedy lookahead solution might be more efficient. Anyway, here it is:

re.search(r'[|]([^|]*)[|]Ab Initio Software', name, re.IGNORECASE).group(1)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top