Question

What is the optimal way to couple multiple regular expressions within a Clojure function? I believe the function would start out as such:

(defn foo [x]
(re-seq #"some means to combine multiple regex")

but am not clear if this is will work, or the efficiency of such a function. To provide an example of possible regex coupling, one might consider a function which searched for both domain names and IP. For domain names I'd use a regex as such:

(re-seq #"\b([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b" x)

and for IP:

(re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")
Was it helpful?

Solution

Regexs already allow for alternation with the | operator.

user=> (re-seq #"\d+" "123 foo 345 bar")
("123" "345")
user=> (re-seq #"[a-zA-Z]+" "123 foo 345 bar")
("foo" "bar")
user=> (re-seq #"\d+|[a-zA-Z]+" "123 foo 345 bar")
("123" "foo" "345" "bar")

You can programatically union the regex patterns if desired by interposing the | operator.

(defn union-re-patterns [& patterns] 
    (re-pattern (apply str (interpose "|" (map #(str "(?:" % ")") patterns)))))

user=> (union-re-patterns #"\d+" #"[a-zA-Z]+")
#"(\d+)|([a-zA-Z]+)"
user=> (map first (re-seq (union-re-patterns #"\d+" #"[a-zA-Z]+") "123 foo 345 bar"))
("123" "foo" "345" "bar")

OTHER TIPS

Depending on your use case, frak may be what you're searching for ; frak transforms collections of strings into regular expressions for matching those strings:

(frak/pattern ["foo" "bar" "baz" "quux"])
;; => #"(?:ba[rz]|foo|quux)"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top