Question

I have a CSV file I need to clean up. This is a one-time thing so I'd like to do it in Notepad++ if possible.

The CSV file has two fields, one of which is wrapped in quotes. I'd like to remove any Carriage Returns from within the quoted field. I was trying to use this pattern but can't get it quite right...

(.*)\"(.*)\n(.*)\"(.*)

Also correct me if I am wrong, but I presume the "replace with" value would be something along the lines of:

\1\2\3\4

Thanks in advance.

I'm also open to alternate solutions such as a quick and dirty PERL script.

Was it helpful?

Solution

After much hunting around StackOverflow I found a regex pattern suggested for a similar problem that I only had to modify slightly from single quote to double quote. I ran it in PERL. It worked great! Unfortunately I can't find the original post to give credit where credit is due.

Anyway, this is what I finally used. Thanks everyone for your help!

$string123 =~ s/((?:^[^"]*"|(?!^))[^"]*?(?:"[^"]*"[^"]*?)*?)(\n{1,})/$1/g; 

OTHER TIPS

In case anyone else finds this, and wants a real legit answer for this problem in notepad++, consider:

Use the paragraph symbol/show invisible characters command to display CR and LF's.

Now, note that the true line feeds at the end of each csv record are usually (depends on what created the file) ONLY LFs. The real record separator DOES NOT HAVE A CR IN IT. And now notice that the carriage returns that are imbedded within a field and wrapped in quotes are usually CR/LFs (two non-print characters side-by-side, a CR and a LF)!

So now, IT'S EASY. Highlight a CRLF combination, pull up f&r, and your 'find what:' entry should be two empty boxes for the non-printable CRLF combination. Leave the replace field empty, and run it.

TADA! No more embedded carriage returns within quotes, and all for-real line-feeds are left intact.

Barry has the solution and it appears to be the best solution I can find anywhere, and I looked at length because I found, at first, that this solution wasn't working for me in Perl.

What I found is that a slight adjustment to that regex worked. I'm not sure the cause of this nuance, but Perl isn't able to find carriage returns when it looks for \n but does find carriage returns denoted in hex form /x0D.


So, instead of:

$string123 =~ s/((?:^[^"]"|(?!^))[^"]?(?:"[^"]"[^"]?)?)(\n{1,})/$1/g;


This worked for me:

$string123 =~ s/((?:^[^"]"|(?!^))[^"]?(?:"[^"]"[^"]?)?)(\x0D{1,})/$1/g;

Thanks Barry, big help!

I have encountered that problem and have prevailed using Notepad++. Shreyas answer is either wrong or outdated, because now using \r\n in regexp search and replace works. That said I used the following:

[^"]"(([^"]*)\r\n([^"]*))+"

The way it works, is that it matches:

[somethin0]"[somethin1]NEWLINE[somethin2]"

Where somethin1 and somethin2 are \2 and \3 (and \1 is the entire inner part) and somethin0 is delimiting character (comma most likely). To get what we want we substitute with:

[somethin0]"\2 \3"

And get the expected result! Well, mostly. Single replace like that removes single newline character within quotes. However, that should only be a minor inconvenience of spamming replaceAll button a few times (where "a few times" is the maximum amount of newlines present between quotes)

The problem with Notepad++ is that it doesnt allow you to run a regex with special characters like \n or \t. Regex has to be pure regex and for special characters there is the extended search mode.

I would suggest you run your regex on Eclipse if possible.

If thats not possible, here's a quick and dirty solution

Replace all \n with a special character say # or & which will not occur in your file otherwise.

Next run your regex to look for this special character within quotes and delete it

Finally replace the remaining special characters with \n

Click the π button to show the hidden chars. Then select the one carriage return char. Press Ctr+H and replace with an empty string

Hope it works for you.

Well... my RegEx is terrible and I can't answer your question. However, here is a nice little JS function that you can use that should be able to do what you want.

function removeNewLines(str){
    var quotedStrings = str.split(/["'](.*)?["']/g),
        i = 0;

    for( ; i < quotedStrings.length; i++){
        str = str.replace(quotedStrings[i], quotedStrings[i].replace(/[\r\n]/g,""));
    }
    return str;
}
removeNewLines("\"asdf\r\nas\"asdf\'as\nd\'asdf\"asdf\r\nasf\r\n\"") === "\"asdfas\"asdf'asd'asdf\"asdfasf\"";

Here is an answer specifically for Notepad++

Menu: TextFX > TextFX Edit > Delete Blank Lines

Within that menu item there is also a Delete Surplus Blank Lines option.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top