I'm not sure this approach will work in general to parse roman numerals. For example, this code fails to properly parse VIII
but that's because V
isn't in the list of tokens. But here's a simple recursive function that looks for one of the tokens at the beginning of the input string and assembles a list:
tokens = ['IX', 'C', 'D', 'XL', 'I', 'XC', 'M', 'L', 'CD', 'X', 'IV', 'CM']
def rn_split(numeral, results_so_far=[]):
if len(numeral)==0:
return results_so_far # Break the recursion
for token in tokens:
if numeral.startswith(token):
results_so_far.append(token)
recurse_numeral = numeral[ (len(token)): ]
return rn_split(recurse_numeral, results_so_far)
# Remainder of numeral didn't match. Bail out
results_so_far.append(numeral)
return results_so_far