Question

I would like to know a way (of course I want to know the best way) to edit a csv file, preferably without having to read the original and write the results to a new one.

Example 1: I want to split the contents of a column into two columns 123abc into 123, abc

Example 2: I want to remove letters from the beginning of a column AB123 into 123

Clarification: I do not need help solving my examples; I just need to be pointed in the right direction of reading and editing the same file.

Here is a code example of using a temporary file that seems to me is too slow

    Dim currentLine as string()
tempFile.AutoFlush = True

Do Until origFile.EndOfData
    currentLine = origFile.ReadFields
    currentLine(1) = currentLine(1).Replace("ABC","") 'would convert ABC123 to 123
    For index as Integer = 0 to currentLine.Count - 2
        tempFile.Write(currentLine(index) & ",")
    Next
        tempFile.Write(currentLine(currentLine.Count - 1))
        tempFile.WriteLine()
Loop

tempFile.Flush()
tempFile.Dispose()
origFile.Dispose()
IO.File.Copy(tempFile,OrigFile,True)
IO.File.Delete(tempFile)
Was it helpful?

Solution

You really will be best off writing a new file. You can write to a temporary file, delete the old file and rename the new file as the old file.

Something like:

Imports System.IO

Module Module1

    Sub Main()
        Dim tempfile = Path.GetTempFileName()
        Dim inputFile = "C:\Temp\input.txt"

        Using sw = New StreamWriter(tempfile)
            Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(inputFile)
                MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
                MyReader.Delimiters = New String() {","}
                Dim currentRow As String()
                While Not MyReader.EndOfData
                    Try
                        currentRow = MyReader.ReadFields()
                        If currentRow.Count >= 2 Then
                            currentRow(1) = currentRow(1).Replace("ABC", "")
                        End If
                        sw.WriteLine(String.Join(",", currentRow))
                    Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                        MsgBox("Line " & ex.Message & " is invalid.  Skipping")
                    End Try
                End While
            End Using
        End Using

        File.Delete(inputFile)
        File.Move(tempfile, inputFile)

    End Sub

End Module

(Basically copying the TextFieldParser example from http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.readfields.aspx )

OTHER TIPS

You can use ADO to connect to the CSV file. See here for more info:

How to read a CSV file into a .NET Datatable

However, I would recommend simply loading it into an array and then writing it back out with simple file IO methods. It's just a lot less code and overhead:

For Each row As String In File.ReadAllLines("C:\test.csv")
    For Each column As String In row.Split(New String() {","}, StringSplitOptions.None)
        ' Do work
    Next
Next

And then something like this to save the file:

File.WriteAllLines(rows)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top