Perl regex without variable length lookbehind?

Question 1

As I see it, your program will have three states:

In a headline.
In a paragraph directly after a headline.
In other paragraphs.

Because this roughly is a regular language, it can be parsed by regexes. But why would we want to do that, considering we would need 400 passes over the text?

It might really be easier to split the file into an array of paragraphs. When we hit a headline, we produce all links that can point there. Then in the next paragraph, we substitute all keywords except the forbidden ones. E.g:

my %substitutions = ...;
my $kw_regex = ...;
my %forbidden; # holds state

local $/ = ""; # paragraph mode
while (<>) {
  if (/^#/) {
    # it's a headline
    @forbidden{ slugify($_) } = ();  # extract forbidden link(s)
  } else {
    # a paragraph
    s{($kw_regex)}{
      my $keyword = $1;
      my $link = $substitutions{lc $keyword};
      exists $forbidden{$link} ? $keyword : "($keyword)[$link]";
    }eg;
    %forbidden = (); # forbidden links only in 1st paragraph after headline
  }
  print;
}

If headlines are not guaranteed to be seperated from their paragraphs by an empty line, then the paragrapg mode will not work, and you'll have to roll your own.

Regexes are awesome, but they are not always an adequate tool.

Question 2

That is one horrible regex. I would not want to be the poor sucker who is stuck with maintaining it. Also, how did you generate it from your replacement template?

I would suggest something considerably simpler. Use a hash to store the replacements, use word boundary to prevent partial matches, use /i modifier to match case insensitively, and use regular loop logic to avoid replacements on commented lines.

use strict;
use warnings;

my @kw = "keyword::(keyword)[#heading-to-jump-to]";
my %rep = map { /([^:]+)::(.+)/ } @kw;
while (<DATA>) {
    next if /^#/;
    for my $kw (keys %rep) {
        s/\b\Q$kw\E\b/$rep{$kw}/ig;
    }
} continue {
    print;
}

__DATA__
This is a text with keywords. Only the keyword 'keyword' should be replaced.
# Dont replace keyword when in a comment

Output:

This is a text with keywords. Only the (keyword)[#heading-to-jump-to] '(keyword)
[#heading-to-jump-to]' should be replaced.
# Dont replace keyword when in a comment

Explanation:

Create the hash of replacement keywords with a map statement, which returns a two element list for each keyword::replacement string.
With lines that begin with #, skip directly to print
For each keyword in the hash, perform a global /g, case insensitive /i substitution on each line. Use word boundary \b to prevent partial matches, and quote meta characters with \Q ... \E. Substitute with the hash value for that keyword.

As with all language processing, this will have some caveats and edge cases that needs handling. For example, word boundary will replace foo in foo-bar. As for how to control what not to replace under which heading, you would first have to tell me how to identify a heading.

Update:

If I understand you correctly, what you mean by skipping keywords inside paragraphs with their own heading, is something like this:

#heading-to-jump-to
Here is 'keyword' not replaced

Look up the string #heading-to-jump-to and remove keyword from the replacement list.

You might use a lookup hash with the keys being the heading references, and combine that with the generation of the first hash. Although, in this case I would start being concerned that you can have multiple keywords for each link, e.g. both foo and bar point to #foobar, so #foobar should exclude keywords foo and bar both.

my %rep;
my %heading;

for my $str (@kw) {
    chomp $str;
    my ($kw, $rep) = split /::/, $str, 2;  # split into 2 fields
    $rep{$kw} = $rep;
    my ($heading) = $rep =~ /\[([^]]+)\]/;
    push @{ $heading{$heading} }, $kw;
}

And then instead of simply skipping a line with next, do something like

my @kws = keys %rep;   # default list
while (<DATA>) {
    if (/^(#.+)/) {    # inside heading
        my %exclude = map { $_ => 1 } @{ $heading{$1} };
        @kws = grep { ! $exclude{$_} } @kws;
    } else {
        # not in a heading
        # ...
    }
}

Note that this is just a demonstration of the principle and not intended as working code. As you can see, the tricky part here is knowing when to reset the limited list of @kws and when to use it. You will have to make those decisions, since I do not know your data.