Question

I'm trying to use a regular expression to match a certain character only when it isn't immediately adjacent to a certain character. (For an eBook in Calibre)

Specifically, I want to match all that aren't at the end of a sentence, which means they will be between regular characters, not an angle bracket or space. I thought ”[^<] would work, but that selects both the quotation mark and the next character, not just the character itself. I'm also not sure how to do an OR to check for a space. I'm assuming it would be something like ”[^<]|[^ ] but that's not right either.

Here's an example of what I would like to match:

Beside angle bracket: <p class="calibre1">“I”m tired!”</p>

Beside space: <p class="calibre1">“I”m tired!” he said</p>

Only the quotation mark within I”m should be selected (and only the quotation mark itself)

I'm sorry if there's an obvious answer for this, but I've been reading over Python's regex documentation and I can't figure it out. :(

Was it helpful?

Solution

You can perhaps use a negative lookahead (?! ... ) like that:

”(?!<)

This will match unless it is followed by <.

To add the space...

”(?![< ])

That one will match unless followed by < or space.

OTHER TIPS

Using a negative-lookahead:

regex = r '"(?!<|\s)'

| means "or"
\s means whitespace

You don't need to capture, since you know you're only matching a ".

Alternatively, you could use a character class instead of the or, ie: [<\s].

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top