Question

I have a number of references I would like to have replaced with links to the anchors further down in the text. The links have a very regular form, so it should be quite doable - at least with a script:

A reference "[44]" should be replaced with the following html code: [<a href="ref44">44</a>].

That one is easy enough. Simple replacement with a backreference. But is there a regex (vim dialect, python, or ... perl, if must be. The horror!) that can convert the following into similar links: [44,45,77,91]? That is, one link per number, where the group of links are surrounded by a pair of square brackets.

Since this involves (theoritcally unbounded) memory, it does not map 1:1 with a FSM, and as such should rather be handled by some kind of pushdown-automaton, not a regex, but some dialects are a lot more powerful, so ...

Was it helpful?

Solution

You could re-run this regex replace until no more replacements are made.

Regex: (\[(?:<a(?=\s|>)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/a>,)*)(\d+)([,\]])

Replace with: $1<a href="ref$2">$2</a>$3

enter image description here

The portion which captures group 1, will match event the most sophisticated complex anchor tags.

Examples

Sample Text

[22][44,45,77,91]

After Replacement

First time:

[<a href="ref22">22</a>][<a href="ref44">44</a>,45,77,91]

Second time:

[<a href="ref22">22</a>][<a href="ref44">44</a>,<a href="ref45">45</a>,77,91]

Third time:

[<a href="ref22">22</a>][<a href="ref44">44</a>,<a href="ref45">45</a>,<a href="ref77">77</a>,91]

Fourth time:

[<a href="ref22">22</a>][<a href="ref44">44</a>,<a href="ref45">45</a>,<a href="ref77">77</a>,<a href="ref91">91</a>]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top