Question

In a wikipedia's article text, a link might be mentioned like: [Category:A B C], however the exact wiki url will have suffix like Category:A_B_C From where I can get the information regarding all these rules which wiki uses to get the url from a link in its text ?(, e.g. converting spaces to underscores, capitalizing first letter, dealing with non-ascii characters etc)

Was it helpful?

Solution

Roughly the following:

  • Normalize namespace, e.g. category: --> Category:.
  • Uppercase the first letter of title proper, e.g. Category:foo --> Category:Foo. Note: this depends on wiki settings and titles are never uppercased on Wiktionary, for example.
  • Replace spaces with underscores, e.g. Foo bar --> Foo_bar.
  • Percent-encode all the usual characters with PHP's standard function urlencode(), except for the following ones: ;:@$!*(),/.

For full technical details you could look up this (function getLocalUrl()) and this (function wfUrlencode()).

OTHER TIPS

There is no “etc.”, you already mentioned all the rules:

  1. spaces are converted to underscores
  2. the first letter of the article title is capitalized (the first letter of the namespace is capitalized too, if there is any)
  3. the whole link is percent-encoded

Note that rules #1 and #2 are not mandatory: if you create your own URL that doesn't follow them, Wikipedia will still show the page correctly.

Things get more complicated if you include namespace aliases (WP:WikiProject ComputingWikipedia:WikiProject_Computing) and interwiki links (wikia:gameofthrones:Westeroshttp://www.wikia.com/wiki/c:gameofthrones:Westeros).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top