Question

In Python/django I have a string from which I extract "the title" by matching characters before the ':' character, like:

some_string = "This is my Title: This is some text"

So I'm using this code to extract the title:

result = regex.search('(.*):', some_string)
result.group(1)
>>> 'This is my Title'

There will be problems when a user put only a url in the string, like:

some_string = 'http://vimeo.com/49742318'
result.group(1)
>>> 'http'

I prefer to just have an empty string returened. I've tried using the negative look ahead metatag (?!):

result = regex.search('(.*(?!http)):', some_string)

But it still returns 'http' instead of an empty string. How should it be?

Was it helpful?

Solution

The problem is that at the point where you've put the negative lookahead, the next character is also constrained to be a colon: the negative lookahead succeeds trivially as h is not the next character!

What you probably actually want is to put the negative lookahead after the colon so that the next character is not a /:

(.*):(?!/)

But at that point you might as well use a positive lookahead and stop using a capturing group at all. You should also not allow colons to be captured or the RE would be able to consume much more than you might expect:

result = regex.search('[^:]*(?=:[^/])', some_string)
result.group()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top