Question

Hi I still can't work it out. I'm using preg_replace. I've searched but failed to find a solution. I need to remove unknown characters in the string but preserve new lines.

    $summary = "ASDASDASDASDSASD 
    [BS][BS][BS] hello
    this is a new line
[BS][BS][BS] 
this is another new line"; 
    // [BS] is an unknown character if you ever encountered it before in Notepadd++.
    // See screenshot, taken from Notepad++
    // The output in the browser is a series of whitespaces.
    // I can't paste the unknown symbol here. 

    echo preg_replace('/[\x00-\x1F\x80-\xFF]/','', $summary); 

    // Output: ASDASDASDASDSASD hello this is a new line this is another new line

//Expected Output:
//ASDASDASDASDSASD 
//    hello
//    this is a new line
//this is another new line

I'll appreciate all help that i can get.

This is the BS I am talking about

Was it helpful?

Solution

I am looking at http://www.asciitable.com/ and feel like the RegEx should be something like this:

/[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F-\xFF]/

The range (in fact a blacklist of characters) excludes ASCII tab, new line and carriage return characters which you probably want to keep.

PS: The BS is how Notepad++ represents backspace character (ASCII 0x08).

OTHER TIPS

It's this

echo preg_replace('/[\x00-\x09\x0B-\x0C\x0E-\x1F\x80-\xFF]/','', $summary);

Because 0D and 0A (as in \x0D and \x0A being included in \x00-\x1F) are CR+LF. You need to exclude these (and so define multiple Ranges)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top