سؤال

For part 1, see this SO post

I have a CSV that has certain fields separated by the " symbol as a TextQualifier.

See below for example. Note that each integer (eg. 1,2,3 etc) is supposed to be a string. the qualified strings are surrounded by the " symbol.

1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2""

Notice how the last qualified string has a " symbol as part of the string.

User @mjolinor suggested this powershell script, which works to fix the above scenario, but it does not fix the "Part 2" scenario below.

(get-content file.txt -ReadCount 0) -replace '([^,]")"','$1' |
 set-content newfile.txt

Here is part 2 of the question. I need a solution for this: The extra " symbol can appear randomly in the string. Here's another example:

1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"

Can you suggest an elegant way to automate the cleaning of the CSV to eliminate redundant " qualifiers?

هل كانت مفيدة؟

المحلول

You just need a different regex:

(get-content file.txt -ReadCount 0) -replace '(?<!,)"(?!,|$)',''|
 set-content newfile.txt

That one will replace any double quote that is not immediately preceeded by a comma, or followed by either a comma or the end of the line.

$text = '1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"'
$text -replace '(?<!,)"(?!,|$)',''

1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2"
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top