Fusion de deux expressions régulières tronquer mots dans les chaînes

https://stackoverflow.com/questions/2682861

30-09-2019
|

Question

Je suis en train de trouver la fonction suivante cette chaîne de tronque à des mots entiers (si possible, sinon il doit tronquer caractères):

function Text_Truncate($string, $limit, $more = '...')
{
    $string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));

    if (strlen(utf8_decode($string)) > $limit)
    {
        $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);

        if (strlen(utf8_decode($string)) > $limit)
        {
            $string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
        }

        $string .= $more;
    }

    return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}

Voici quelques tests:

// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_...  (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

Ils travaillent tous les deux comme il est, si je laisse tomber la deuxième preg_replace() je reçois le texte suivant:

Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog et un jour le chien paresseux bossu la pauvre renard vers le bas jusqu'à sa mort ....

Je ne peux pas utiliser substr() car il ne fonctionne que sur le niveau de l'octet et je n'ai pas accès à mb_substr() ATM, je l'ai fait plusieurs tentatives pour se joindre à la seconde regex avec le premier mais sans succès.

S'il vous plaît aider S.M.S., j'ai été aux prises avec ce pendant près d'une heure.

EDIT: Je suis désolé, je suis éveillé pendant 40 heures et je manqué sans vergogne ceci:

$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);

Pourtant, si quelqu'un a un plus optimisé regex (ou qui ne tient pas compte de l'espace de fuite) s'il vous plaît part:

"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"

EDIT 2: Je ne peux toujours pas se débarrasser des espaces de fuite, quelqu'un peut me aider à sortir

EDIT 3: D'accord, aucun de mes modifications ne fonctionnent vraiment, je dupés par RegexBuddy - Je devrais probablement laisser cette tâche à un autre jour et dormir maintenant. Off pour aujourd'hui.

La solution

Peut-être que je peux vous donner un joyeux matin après une longue nuit de cauchemars RegExp:

'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'

Boiling vers le bas:

^      # Start of String
(       # begin capture group 1
 .{1,x} # match 1 - x characters
 (?<=\S)# lookbehind, match must end with non-whitespace 
 (?=\s) # lookahead, if the next char is whitespace, match
 |      # otherwise test this:
 .{x}   # got to x chars anyway.
)       # end cap group
.*     # match the rest of the string (since you were using replace)

Vous pouvez toujours ajouter le |$ à la fin de (?=\s) mais depuis votre code vérifiait déjà que la longueur de la chaîne était plus longue que la $limit, je ne me sentais pas ce cas serait neccesary.

Autres conseils

Avez-vous envisagé d'utiliser wordwrap? ( http://us3.php.net/wordwrap )

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow