Question

I have a string of text that contains html with all different types of links (relative, absolute, root-relative). I need a regex that can be executed by PHP's preg_replace to replace all relative links with root-relative links, without touching any of the other links. I have the root path already.

Replaced links:

<tag ... href="path/to_file.ext" ... >   --->   <tag ... href="/basepath/path/to_file.ext" ... >
<tag ... href="path/to_file.ext" ... />   --->   <tag ... href="/basepath/path/to_file.ext" ... />

Untouched links:

<tag ... href="/any/path" ... >
<tag ... href="/any/path" ... />
<tag ... href="protocol://domain.com/any/path" ... >
<tag ... href="protocol://domain.com/any/path" ... />
Was it helpful?

Solution

If you just want to change the base URI, you can try the BASE element:

<base href="/basepath/">

But note that changing the base URI affects all relative URIs and not just relative URI paths.

Otherwise, if you really want to use regular expression, consider that a relative path like you want must be of the type path-noscheme (see RFC 3986):

path-noscheme = segment-nz-nc *( "/" segment )
segment       = *pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                ; non-zero-length segment without any colon ":"
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded   = "%" HEXDIG HEXDIG
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

So the begin of the URI must match:

^([a-zA-Z0-9-._~!$&'()*+,;=@]|%[0-9a-fA-F]{2})+($|/)

But please use a proper HTML parser for parsing the HTML an build a DOM out of that. Then you can query the DOM to get the href attributes and test the value with the regular expression above.

OTHER TIPS

I came up with this:

preg_replace('#href=["\']([^/][^\':"]*)["\']#', $root_path.'$1', $html);

It might be a little too simplistic. The obvious flaw I see is that it will also match href="something" when it is outside of a tag, but hopefully it can get you started.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top