Determining Exact URL from a link inside wiki text
-
16-06-2021 - |
Question
In a wikipedia's article text, a link might be mentioned like: [Category:A B C], however the exact wiki url will have suffix like Category:A_B_C From where I can get the information regarding all these rules which wiki uses to get the url from a link in its text ?(, e.g. converting spaces to underscores, capitalizing first letter, dealing with non-ascii characters etc)
Solution
Roughly the following:
- Normalize namespace, e.g.
category:
-->Category:
. - Uppercase the first letter of title proper, e.g.
Category:foo
-->Category:Foo
. Note: this depends on wiki settings and titles are never uppercased on Wiktionary, for example. - Replace spaces with underscores, e.g.
Foo bar
-->Foo_bar
. - Percent-encode all the usual characters with PHP's standard function
urlencode()
, except for the following ones:;:@$!*(),/
.
For full technical details you could look up this (function getLocalUrl()) and this (function wfUrlencode()).
OTHER TIPS
There is no “etc.”, you already mentioned all the rules:
- spaces are converted to underscores
- the first letter of the article title is capitalized (the first letter of the namespace is capitalized too, if there is any)
- the whole link is percent-encoded
Note that rules #1 and #2 are not mandatory: if you create your own URL that doesn't follow them, Wikipedia will still show the page correctly.
Things get more complicated if you include namespace aliases (WP:WikiProject Computing
→ Wikipedia:WikiProject_Computing
) and interwiki links (wikia:gameofthrones:Westeros
→ http://www.wikia.com/wiki/c:gameofthrones:Westeros
).