Isolate the first number after a letter with regular expressions

https://stackoverflow.com/questions/18164804

24-06-2022
|

Question

I am trying to parse a chemical formula that is given to me in unicode in the format C7H19N3

I wish to isolate the position of the first number after the letter, I.e 7 is at index 1 and 1 is at index 3. With is this i want to insert "sub" infront of the digits

My first couple attempts had me looping though trying to isolate the position of only the first numbers but to no avail.

I think that Regular expressions can accomplish this, though im quite lost in it.

My end goal is to output the formula Csub7Hsub19Nsub3 so that my text editor can properly format it.

Solution

How about this?

>>> re.sub('(\d+)', 'sub\g<1>', "C7H19N3")
'Csub7Hsub19Nsub3'

(\d+) is a capturing group that matches 1 or more digits. \g<1> is a way of referring to the saved group in the substitute string.

OTHER TIPS

Something like this with lookahead and lookbehind:

>>> strs = 'C7H19N3'
>>> re.sub(r'(?<!\d)(?=\d)','sub',strs)
'Csub7Hsub19Nsub3'

This matches the following positions in the string:

C^7H^19N^3   # ^ represents the positions matched by the regex.

Here is one which literally matches the first digit after a letter:

>>> re.sub(r'([A-Z])(\d)', r'\1sub\2', "C7H19N3")
'Csub7Hsub19Nsub3'

It's functionally equivalent but perhaps more expressive of the intent? \1 is a shorter version of \g<1>, and I also used raw string literals (r'\1sub\2' instead of '\1sub\2').

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow