문제

For part 1, see this SO post

I have a CSV that has certain fields separated by the " symbol as a TextQualifier.

See below for example. Note that each integer (eg. 1,2,3 etc) is supposed to be a string. the qualified strings are surrounded by the " symbol.

1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2""

Notice how the last qualified string has a " symbol as part of the string.

User @mjolinor suggested this powershell script, which works to fix the above scenario, but it does not fix the "Part 2" scenario below.

(get-content file.txt -ReadCount 0) -replace '([^,]")"','$1' |
 set-content newfile.txt

Here is part 2 of the question. I need a solution for this: The extra " symbol can appear randomly in the string. Here's another example:

1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"

Can you suggest an elegant way to automate the cleaning of the CSV to eliminate redundant " qualifiers?

도움이 되었습니까?

해결책

You just need a different regex:

(get-content file.txt -ReadCount 0) -replace '(?<!,)"(?!,|$)',''|
 set-content newfile.txt

That one will replace any double quote that is not immediately preceeded by a comma, or followed by either a comma or the end of the line.

$text = '1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"'
$text -replace '(?<!,)"(?!,|$)',''

1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2"
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top