What is the best way to clean a string for placement in a URL, like the question name on SO?

StackOverflow https://stackoverflow.com/questions/539920

  •  22-08-2019
  •  | 
  •  

Question

I'm looking to create a URL string like the one SO uses for the links to the questions. I am not looking at rewriting the url (mod_rewrite). I am looking at generating the link on the page.

Example: The question name is:

Is it better to use ob_get_contents() or $text .= ‘test’;

The URL ends up being:

http://stackoverflow.com/questions/292068/is-it-better-to-use-obgetcontents-or-text-test

The part I'm interested in is:

is-it-better-to-use-obgetcontents-or-text-test

So basically I'm looking to clean out anything that is not alphanumeric while still keeping the URL readable. I have the following created, but I'm not sure if it's the best way or if it covers all the possibilities:

$str = urlencode(
    strtolower(
    str_replace('--', '-', 
    preg_replace(array('/[^a-z0-9 ]/i', '/[^a-z0-9]/i'), array('', '-'), 
    trim($urlPart)))));

So basically:

  1. trim
  2. replace any non alphanumeric plus the space with nothing
  3. then replace everything not alphanumeric with a dash
  4. replace -- with -.
  5. strtolower()
  6. urlencode() -- probably not needed, but just for good measure.
Was it helpful?

Solution

As you pointed out already, urlencode() is not needed in this case and neither is trim(). If I understand correctly, step 4 is to avoid multiple dashes in a row, but it will not prevent more than two dashes. On the other hand, dashes connecting two words (like in "large-scale") will be removed by your solution while they seem to be preserved on SO.

I'm not sure that this is really the best way to do it, but here's my suggestion:

$str = strtolower( 
  preg_replace( array('/[^a-z0-9\- ]/i', '/[ \-]+/'), array('', '-'), 
  $urlPart ) );

So:

  1. remove any character that is neither space, dash, nor alphanumeric
  2. replace any consecutive number of spaces or dashes with a single dash
  3. strtolower()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top