Regular Expression to manipulate all single double quotes with in a field with two double quotes in FileHelpers BeforeReadRecords event

StackOverflow https://stackoverflow.com/questions/23335791

  •  10-07-2023
  •  | 
  •  

سؤال

In Filehelpers BeforeReadRecords Event, we need Regular Expression to manipulate all single "double quotes" with in a field with two double quotes.

CSV Content:

"Mine"s Minesweeper", "Yours"s Minesweeper", "Uncle Sam"s Minesweeper"
"Mine"s Minesweeper2", "Yours"s Minesweeper2", "Unknown Minesweeper3"

Need helps to create a vb.net Regular Expression to replace the all inner double quotes. Currently we are using below approrach?

Dim engine As New FileHelperEngine(cb.CreateRecordClass())
                AddHandler engine.BeforeReadRecord, AddressOf BeforeReadRecordHandler

Event Code

    Sub BeforeReadRecordHandler(engine As EngineBase, e As FileHelpers.Events.BeforeReadEventArgs(Of Object))
        Try
            Dim newLine As String = ""
            Dim sep As String = ""
            Dim arr() As String = e.RecordLine.Split(",")
            arr.AsParallel().ForAll(Sub(x)
                                        If x.Length > 1 Then
                                            newLine = String.Format("{0}{1}{2}", newLine, sep, x.Substring(1, IIf(x.Length <= 2, 0, x.Length - 2)).Replace("""", """"""))


                                        Else
                                            newLine = String.Format("{0}{1}{2}", newLine, sep, x)
                                        End If

                                        sep = ","
                                    End Sub)
            e.RecordLine = newLine
        Catch ex As Exception

        End Try
    End Sub

Trying to generate regular expression for ;

String.Format("{0}{1}{2}", newLine, sep, x.Substring(1, IIf(x.Length <= 2, 0, x.Length - 2)).Replace("""", """""")).

Output should be

CSV Content:

"Mine""s Minesweeper", "Yours""s Minesweeper", "Uncle Sam""s Minesweeper"
"Mine""s Minesweeper2", "Yours""s Minesweeper2", "Unknown Minesweeper3"
هل كانت مفيدة؟

المحلول

.Net supports arbitrary length lookbehind, so you can use the following;

(?<!(^|,)\s*)"(?!\s*($|,))

Use with

Regexp.replace(input,(?<!(^|,)\s*)"(?!\s*($|,)),"""""",RegexOptions.Multiline)

This matches any " not preceded by the start of line or a comma, and not suceeded by end of string or comma, both conditions ignoring an arbitrary amount of whitespace.

It will misbehave if an entry in the CSV is not bracketed by quotes, or if a comma occurs in the text of an entry.

نصائح أخرى

You can use negative lookbehind and negative lookahead to accomplish this somehow, see this regex for example.

(?<!^)(?<!, )"(?!$)(?!, )(?!")

This regex has some problems of course, it assumes that:

  • No double quotation to be replaced is followed by (, ).
  • No double quotation to be replaced is preceded by (, ).
  • Separating commas are always followed by a space.

If you can make sure that the above is valid for your input, then use the regex I referenced.

Potentially open to more error, you could use positive lookaheads and lookbehinds to replace any " characters which are surrounded by letters.

Obviously this assumes that the only " characters you wish to replace are immediately surrounded by letters.

(?!>[A-Za-z])"(?=[a-z]) // Use with a Regex.Replace.

I'm assuming that the trailing letter characters will all be lower case (usually an s), whereas you might have an upper case single character at the beginning.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top