Question

I'm having troubles with (german) special characters in URIs and want to try to resolve it with a RegEx Route and a PCRE pattern modifier for UTF-8 u.

'router' => array(
    'routes' => array(
        // ...
        'city' => array(
            'type'  => 'regex',
            'options' => array(
                'regex' => '/catalog/(?<city>[a-zA-Z0-9_-äöüÄÖÜß]*)\/u',
                'defaults' => array(
                    'controller' => 'Catalog\Controller\Catalog',
                    'action'     => 'list-sports',
                ),
                'spec'  => '/catalog/%city%',
            ),
            'may_terminate' => true,
        ),
    ),
),

But when I set it, the route stopps to work at all (error 404) -- neither for URIs with nor to ones without special characters.

How to set the modifier correctly?

Was it helpful?

Solution

Since I already had this open here's a handler that solves the problem.

<?php
namespace Application\Mvc\Router\Http;

use Zend\Mvc\Router\Http\Regex;
use Zend\Mvc\Router\Http\RouteMatch;
use Zend\Stdlib\RequestInterface as Request;

class UnicodeRegex extends Regex
{
    /**
     * match(): defined by RouteInterface interface.
     *
     * @param  Request $request
     * @param  integer $pathOffset
     * @return RouteMatch
     */
    public function match(Request $request, $pathOffset = null)
    {
        if (!method_exists($request, 'getUri')) {
            return null;
        }

        $uri  = $request->getUri();
        // path decoded before match
        $path = rawurldecode($uri->getPath());

        // regex with u modifier    
        if ($pathOffset !== null) {
            $result = preg_match('(\G' . $this->regex . ')u', $path, $matches, null, $pathOffset);
        } else {
            $result = preg_match('(^' . $this->regex . '$)u', $path, $matches);
        }

        if (!$result) {
            return null;
        }

        $matchedLength = strlen($matches[0]);

        foreach ($matches as $key => $value) {
            if (is_numeric($key) || is_int($key) || $value === '') {
                unset($matches[$key]);
            } else {
                $matches[$key] = $value;
            }
        }

        return new RouteMatch(array_merge($this->defaults, $matches), $matchedLength);
    }
}

Assuming you place the file in Application/Mvc/Router/Http/UnicodeRegex your route definition should look like this

'router' => array(
    'routes' => array(
        // ...
        'city' => array(
            'type'  => 'Application\Mvc\Router\Http\UnicodeRegex',
            'options' => array(
                'regex' => '/catalog/(?<city>[\p{L}]+)',
                // or if you prefer, your original regex should work too
                // 'regex' => '/catalog/(?<city>[a-zA-Z0-9_-äöüÄÖÜß]*)',
                'defaults' => array(
                    'controller' => 'Catalog\Controller\Catalog',
                    'action'     => 'list-sports',
                ),
                'spec'  => '/catalog/%city%',
            ),
            'may_terminate' => true,
        ),
    ),
),

OTHER TIPS

Well,

I guess you can solve it as easily as many other ones had this same problem. So take a look at some of them:

UTF-8 in * regular expressions

There uses the following modifiers like \\s, \\p{L}, and \\u to help you. I hope it solves! Good luck.

Edit

See my own test:

<?php

    $toss_the_dice = utf8_decode ("etc/catalog/Nürnberg");
    preg_match ('/\/catalog\/([\\s\\p{L}]*)/m', $toss_the_dice, $dice);
    echo utf8_encode ($dice[1]);

// Now it prints
// Nürnberg

?>

Can you realize?

Edit 2

It can be better for you!

<?php
    $toss_the_dice = "etc/catalog/Nürnberg";
    preg_match ('/\/catalog\/([\\s\\p{L}]*)/u', $toss_the_dice, $dice);
    echo $dice[1];

// Now it also prints
// Nürnberg

?>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top