Question

Background

I do programming with Python and now and then i run into a situation where i have to use regex

Typically i try to learn a bit about it and look at examples of doing things similar to what i'm trying to achieve

Problem

The problems i have with regex:

  • Difficult (for me) to learn and get used to. Honestly i'm having to use it so seldom that i likely will forget important parts until the next time
  • The backslash problem (Python-specific) - reference (i've had problems when using raw strings too)
  • It's not clear to me what the regex i've written will match, which is of extra concern during (auto) testing

Question

Is there an alternative to regex in Python? Hopefully something that addresses one or several of the issues above!

(It's not clear to me in what context a possible alternative will be used, except for auto-testing)

Was it helpful?

Solution

"I have to use regex" and "What alternatives are there?" sounds contradictory...

With regular expressions as with any special-purpose tool, sure, there are alternatives to them. The case for using them becomes stronger the more your problem resembles the archteypical use case: concisely describing a small language of low to moderate complexity.

If your problem is extremely small ("The customer number must start with an A"), sure, avoid them. Python has startswith() and friends instead, so this is the way to go.

If your problem is large ("the order can be in one of these four formats, with various optional and recursive parts"), just bite the bullet and write a proper parser. There are many libraries that assist you with this, and while they can have gotchas of their own, I assure you they are as nothing compared to the gotchas you would encounter porting such a spec to your language's regex engine.

If your problem is intermediate ("detect legal California license plate numbers"), there is really no strong reason to avoid a well-domesticated regular expression. They are a sublanguage like any other computer language, and as with any language, whether you bother to learn more than the bare minimum should depend on whether or not you regularly encounter problems for which it is well-suited.

As a final thought, never think a good solution to your problem must be either-or. In most of the tasks that I solve professionally, combining RE matching with normal Python logic is the most efficient and elegant way to go.

OTHER TIPS

In theory - as mentioned by others - you can substitute Regex with basic string operations or different operations with various libraries. The question remains why you would want to do this as it is way more code and way more effort.

I use Regex a lot - and I keep forgetting what what does basically all the time.

Usually I go to RegexOne for a quick reminder.

Next I usually go to Pythex - there you can quickly test your Python Regex live with a sample string you provide and there is even a quick cheat sheet provided.

So all in all you can keep a file with 3-4 lines of code snippets like:

# dataframe ds from pandas-library
ds['COLUMN_NAME'].str.replace('REGEX HERE', regex=True) # to replace strings in a column

re.sub('REPLACE THIS REGEX PATTERN',  'WITH THIS REGEX PATTERN', input) # simple replace
re.match('SEARCH FOR THIS REGEX PATTERN', 'IN THIS STRING') # returns a match if a match is found at BEGINNING of string
re.search('SEARCH FOR THIS REGEX PATTERN', 'IN THIS STRING') # returns a match if pattern is found ANYWHERE in string

(This is a code snipped file I use a lot.)

When keeping something like this adjusted for your needs and using Pythex; regex in python is really just a copy-paste job for the rest of your life (for the most part).

Edit: I think I didn't point out enough on how this will solve your

It's not clear to me what the regex i've written will match, which is of extra concern during (auto) testing

Problem. I recommended Pythex because you can put any string for your sample data you want into it and see right away what it matches which is quite comfortable. You could of course turn your "snippet file" I suggested into a "testing file" by providing the sample code with a text input so you have a script to test your regex quickly all the time.

Maybe you could elaborate more on:

The backslash problem (Python-specific) - reference (i've had problems when using raw strings too)

Because I don't really see how this is a problem - the article you linked even pointed out the solution by using raw strings I guess.

Hope it helps.

Licensed under: CC-BY-SA with attribution
scroll top