Question

I'm trying to write a regular expressions that will match a set of characters without regard to order. For example:

str = "act" 
str.scan(/Insert expression here/)

would match:

cat
act
tca
atc
tac
cta

but would not match ca, ac or cata.

I read through a lot of similar questions and answers here on StackOverflow, but have not found one that matches my objectives exactly.

To clarify a bit, I'm using ruby and do not want to allow repeat characters.

Was it helpful?

Solution

Here is your solution

^(?:([act])(?!.*\1)){3}$

See it here on Regexr

^                  # matches the start of the string
    (?:            # open a non capturing group 
        ([act])    # The characters that are allowed and a capturing group
        (?!.*\1)   # That character is matched only if it does not occur once more, Lookahead assertion
    ){3}           # Defines the amount of characters
$

The only special think is the lookahead assertion, to ensure the character is not repeated.

^ and $ are anchors to match the start and the end of the string.

OTHER TIPS

[act]{3} or ^[act]{3}$ will do it in most regular expression dialects. If you can narrow down the system you're using, that will help you get a more specific answer.

Edit: as mentioned by @georgydyer in the comments below, it's unclear from your question whether or not repeated characters are allowed. If not, you can adapt the answer from this question and get:

^(?=[act]{3}$)(?!.*(.).*\1).*$

That is, a positive lookahead to check a match, and then a negative lookahead with a backreference to exclude repeated characters.

Here's how I'd go about it:

regex = /\b(?:#{ Regexp.union(str.split('').permutation.map{ |a| a.join }).source })\b/
# => /(?:act|atc|cat|cta|tac|tca)/

%w[
  cat act tca atc tac cta
  ca ac cata
].each do |w|
  puts '"%s" %s' % [w, w[regex] ? 'matches' : "doesn't match"]
end

That outputs:

"cat" matches
"act" matches
"tca" matches
"atc" matches
"tac" matches
"cta" matches
"ca" doesn't match
"ac" doesn't match
"cata" doesn't match

I use the technique of passing an array into Regexp.union for a lot of things; I works especially well with the keys of a hash, and passing the hash into gsub for rapid search/replace on text templates. This is the example from the gsub documentation:

'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"

Regexp.union creates a regex, and it's important to use source instead of to_s when extracting the actual pattern being generated:

puts regex.to_s
=> (?-mix:\b(?:act|atc|cat|cta|tac|tca)\b)

puts regex.source
=> \b(?:act|atc|cat|cta|tac|tca)\b

Notice how to_s embeds the pattern's flags inside the string. If you don't expect them you can accidentally embed that pattern into another, which won't behave as you expect. Been there, done that and have the dented helmet as proof.

If you really want to have fun, look into the Perl Regexp::Assemble module available on CPAN. Using that, plus List::Permutor, lets us generate more complex patterns. On a simple string like this it won't save much space, but on long strings or large arrays of desired hits it can make a huge difference. Unfortunately, Ruby has nothing like this, but it is possible to write a simple Perl script with the word or array of words, and have it generate the regex and pass it back:

use List::Permutor;
use Regexp::Assemble;

my $regex_assembler = Regexp::Assemble->new;
my $perm = new List::Permutor split('', 'act');
while (my @set = $perm->next) {
    $regex_assembler->add(join('', @set));
}
print $regex_assembler->re, "\n";
(?-xism:(?:a(?:ct|tc)|c(?:at|ta)|t(?:ac|ca)))

See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for more information about using Regexp::Assemble with Ruby.

I will assume several things here: - You are looking for permutations of given characters - You are using ruby

str = "act"
permutations = str.split(//).permutation.map{|p| p.join("")}

# and for the actual test
permutations.include?("cat")

It is no regex though.

No doubt - the regex that uses positive/negative lookaheads and backreferences is slick, but if you're only dealing with three characters, I'd err on the side of verbosity by explicitly enumerating the character permutations like @scones suggested.

"act".split('').permutation.map(&:join)
=> ["act", "atc", "cat", "cta", "tac", "tca"]

And if you really need a regex out of it for scanning a larger string, you can always:

Regexp.union "act".split('').permutation.map(&:join)
=> /\b(act|atc|cat|cta|tac|tca)\b/

Obviously, this strategy doesn't scale if your search string grows, but it's much easier to observe the intent of code like this in my opinion.

EDIT: Added word boundaries for false positive on cata based on @theTinMan's feedback.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top