Question

I am slowly refining a punctuation fixing function in PHP which is used to clean user input. The function currently adds spaces after punctuation, removes spaces before punctuation, and capitalizes the first word of each sentence. I have seen a few people looking for a similar function so I am happy to share what I have so far. It is pretty close to where I want it, however, when it adds a space after a comma it should avoid doing so when the comma is inside a number such as 1,000 Can anyone suggest the quickest way to modify my code to ignore commas inside numbers? Maybe there are ways to shorten what I have but still reach the same result? Thanks for your time...

function format_punc($string){
    $punctuation = ',.;:';
    $string = str_replace(' ?', '?', str_replace(' .', '.', str_replace(' ,', ',', preg_replace('/(['.$punctuation.'])[\s]*/', '\1 ', $string))));
    $string = trim(preg_replace('/[[:space:]]+/', ' ', preg_replace('/([\.!\?]\s+|\A)(\w)/e', '"$1" . strtoupper("$2")', $string)));
    if($string[strlen($string)-1]==','){
        $string = substr($string, 0, -1).'.';
    }
    return $string;
}
Était-ce utile?

La solution

Here is my updated php fix punctuation function... It seems to be working correctly now. I am sure there are ways to condense it but it works to do the following to a string...

Reduce duplicate punctuation such as !! to !
Reduce multiple spaces to single spaces
Remove any spaces before ? . ,
Add spaces after ; :
Add spaces after commas but not when they are part of a number
Add spaces after periods but not when they are part of a number or abbreviation
Remove whitespace from beginning and end of string
Capitalize first word of sentences
Change last character to a period if it is a comma

function format_punc($string){
    $punctuation = ';:';
    $spaced_punc = array(' ?', ' .', ' ,');
    $un_spaced_punc = array('?', '.', ',');
    $string = preg_replace("/([.,!?;:])+/iS","$1",$string);
    $string = preg_replace('/[[:space:]]+/', ' ', $string);
    $string = str_replace($spaced_punc, $un_spaced_punc, $string);
    $string = preg_replace('/(['.$punctuation.'])[\s]*/', '\1 ', $string);
    $string = preg_replace('/(?<!\d),|,(?!\d{3})/', ', ', $string);
    $string = preg_replace('/(\.)([[:alpha:]]{2,})/', '$1 $2', $string);
    $string = trim($string);
    $string = preg_replace('/([\.!\?]\s+|\A)(\w)/e', '"$1" . strtoupper("$2")', $string);
    if($string[strlen($string)-1]==','){
        $string = substr($string, 0, -1).'.';
    }
    return $string;
}

If you take the time to condense this code and create something that still returns the same results, please share! Thank you and enjoy!

Autres conseils

I think that the regexp should be ([^0-9][.][^0-9])[\s]*

preg_replace('/([^0-9]['.$punctuation.'][^0-9])[\s]*/', '\1 ', $string)

Link to regexp test

This is a bit complex but it should get you in the right direction:

<?php

// The following finds all commas in $string and identifies which comma is preceded and followed by a number

$string = 'Hello, my name, is John,Doe. I have 3,425 cats.';

function strpos_r($haystack, $needle)
{
    if(strlen($needle) > strlen($haystack))
        trigger_error(sprintf("%s: length of argument 2 must be <= argument 1", __FUNCTION__), E_USER_WARNING);

    $seeks = array();
    while($seek = strrpos($haystack, $needle))
    {
        array_push($seeks, $seek);
        $haystack = substr($haystack, 0, $seek);
    }
    return $seeks;
}

var_dump($commas = strpos_r($string, ',')); // gives you the location of all commas

for ($i = 0; i <= count($commas) - 1; $i++)
{
    if (is_numeric($commas[$i] - 1) && is_numeric($commas[$i] + 1)) 
    {
      // this means the characters before and after a given comma are numeric
      // don't add space (or delete the space) here

    }
}
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top