.Net 中的 CSV 文件导入 [关闭]

https://stackoverflow.com/questions/1898

08-06-2019
|

题

我意识到这是一个新手问题，但我正在寻找一种简单的解决方案 - 似乎应该有一个。

将 CSV 文件导入强类型数据结构的最佳方法是什么？再次简单=更好。

解决方案

其他提示

微软的文本字段解析器是稳定的并遵循 RFC 4180 对于 CSV 文件。不要被推迟 Microsoft.VisualBasic 命名空间；它是.NET Framework中的标准组件，只需添加对全局的引用 Microsoft.VisualBasic 集会。

如果您正在为 Windows 进行编译（而不是 Mono）并且不需要解析“损坏的”（不符合 RFC 的）CSV 文件，那么这将是显而易见的选择，因为它是免费的、不受限制的、稳定的，并积极支持，其中大部分对于FileHelpers来说是不能说的。

也可以看看：如何：在 Visual Basic 中读取逗号分隔的文本文件 VB 代码示例。

使用 OleDB 连接。

String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();

如果您期望 CSV 解析的场景相当复杂， 甚至不要考虑推出我们自己的解析器. 。有很多优秀的工具，例如文件助手, ，甚至是来自代码项目.

关键是这是一个相当普遍的问题，你可以打赌很多的软件开发人员已经思考并解决了这个问题。

Brian 提供了一个很好的解决方案，可以将其转换为强类型集合。

给出的大多数 CSV 解析方法都没有考虑转义字段或 CSV 文件的其他一些微妙之处（例如修剪字段）。这是我个人使用的代码。它的边缘有点粗糙，而且几乎没有错误报告。

public static IList<IList<string>> Parse(string content)
{
    IList<IList<string>> records = new List<IList<string>>();

    StringReader stringReader = new StringReader(content);

    bool inQoutedString = false;
    IList<string> record = new List<string>();
    StringBuilder fieldBuilder = new StringBuilder();
    while (stringReader.Peek() != -1)
    {
        char readChar = (char)stringReader.Read();

        if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
        {
            // If it's a \r\n combo consume the \n part and throw it away.
            if (readChar == '\r')
            {
                stringReader.Read();
            }

            if (inQoutedString)
            {
                if (readChar == '\r')
                {
                    fieldBuilder.Append('\r');
                }
                fieldBuilder.Append('\n');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();

                records.Add(record);
                record = new List<string>();

                inQoutedString = false;
            }
        }
        else if (fieldBuilder.Length == 0 && !inQoutedString)
        {
            if (char.IsWhiteSpace(readChar))
            {
                // Ignore leading whitespace
            }
            else if (readChar == '"')
            {
                inQoutedString = true;
            }
            else if (readChar == ',')
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else if (readChar == ',')
        {
            if (inQoutedString)
            {
                fieldBuilder.Append(',');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
        }
        else if (readChar == '"')
        {
            if (inQoutedString)
            {
                if (stringReader.Peek() == '"')
                {
                    stringReader.Read();
                    fieldBuilder.Append('"');
                }
                else
                {
                    inQoutedString = false;
                }
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else
        {
            fieldBuilder.Append(readChar);
        }
    }
    record.Add(fieldBuilder.ToString().TrimEnd());
    records.Add(record);

    return records;
}

请注意，这不能处理未用双引号分隔的字段的边缘情况，但 meerley 内部有带引号的字符串。看这个帖子以获得更好的扩展以及一些正确库的链接。

我同意 @不是我自己. 文件助手经过充分测试，可以处理各种边缘情况，如果您自己动手，您最终将不得不处理这些情况。看看 FileHelpers 做了什么，只有在您完全确定以下情况时才编写自己的文件：(1) 您永远不需要处理 FileHelpers 所做的边缘情况，或者 (2) 您喜欢编写此类内容并且打算当你必须解析这样的东西时，你会感到欣喜若狂：

1、“比尔”、“史密斯”、“主管”、“无可奉告”

2、“德雷克”、“奥马利”、“看门人”

哎呀，我没有被引用，我换了一条新路线！

我很无聊，所以我修改了一些我写的东西。它尝试以面向对象的方式封装解析，同时减少文件的迭代次数，它只在顶部 foreach 迭代一次。

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // usage:

            // note this wont run as getting streams is not Implemented

            // but will get you started

            CSVFileParser fileParser = new CSVFileParser();

            // TO Do:  configure fileparser

            PersonParser personParser = new PersonParser(fileParser);

            List<Person> persons = new List<Person>();
            // if the file is large and there is a good way to limit
            // without having to reparse the whole file you can use a 
            // linq query if you desire
            foreach (Person person in personParser.GetPersons())
            {
                persons.Add(person);
            }

            // now we have a list of Person objects
        }
    }

    public abstract  class CSVParser 
    {

        protected String[] deliniators = { "," };

        protected internal IEnumerable<String[]> GetRecords()
        {

            Stream stream = GetStream();
            StreamReader reader = new StreamReader(stream);

            String[] aRecord;
            while (!reader.EndOfStream)
            {
                  aRecord = reader.ReadLine().Split(deliniators,
                   StringSplitOptions.None);

                yield return aRecord;
            }

        }

        protected abstract Stream GetStream(); 

    }

    public class CSVFileParser : CSVParser
    {
        // to do: add logic to get a stream from a file

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        } 
    }

    public class CSVWebParser : CSVParser
    {
        // to do: add logic to get a stream from a web request

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        }
    }

    public class Person
    {
        public String Name { get; set; }
        public String Address { get; set; }
        public DateTime DOB { get; set; }
    }

    public class PersonParser 
    {

        public PersonParser(CSVParser parser)
        {
            this.Parser = parser;
        }

        public CSVParser Parser { get; set; }

        public  IEnumerable<Person> GetPersons()
        {
            foreach (String[] record in this.Parser.GetRecords())
            {
                yield return new Person()
                {
                    Name = record[0],
                    Address = record[1],
                    DOB = DateTime.Parse(record[2]),
                };
            }
        }
    }
}

CodeProject 上有两篇文章提供了解决方案的代码，其中一篇使用流阅读器还有一个导入 CSV 数据使用微软文本驱动程序.

一个简单的好方法是打开文件，并将每一行读入数组、链表、您选择的数据结构中。不过，处理第一行时要小心。

这可能超出了您的能力范围，但似乎还有一种直接的方法可以使用连接字符串.

为什么不尝试使用 Python 而不是 C# 或 VB？它有一个很好的 CSV 模块可供导入，可以为您完成所有繁重的工作。

今年夏天，我必须在 .NET 中使用 CSV 解析器来完成一个项目，并最终选择了 Microsoft Jet Text Driver。您可以使用连接字符串指定文件夹，然后使用 SQL Select 语句查询文件。您可以使用 schema.ini 文件指定强类型。我一开始并没有这样做，但后来我得到了糟糕的结果，其中数据类型并不立即明显，例如 IP 号码或“XYQ 3.9 SP1”等条目。

我遇到的一个限制是它无法处理超过 64 个字符的列名；它会截断。这不应该是一个问题，除非我正在处理设计非常糟糕的输入数据。它返回一个 ADO.NET 数据集。

这是我找到的最好的解决方案。我会对推出自己的 CSV 解析器持谨慎态度，因为我可能会错过一些最终情况，而且我没有找到任何其他免费的 .NET CSV 解析包。

编辑：此外，每个目录只能有一个 schema.ini 文件，因此我动态附加到该文件以强类型化所需的列。它只会对指定的列进行强类型化，并推断任何未指定的字段。我真的很欣赏这一点，因为我正在处理导入流畅的 70+ 列 CSV，并且不想指定每一列，而只想指定行为不当的列。

我输入了一些代码。datagridviewer 中的结果看起来不错。它将一行文本解析为对象的数组列表。

    enum quotestatus
    {
        none,
        firstquote,
        secondquote
    }
    public static System.Collections.ArrayList Parse(string line,string delimiter)
    {        
        System.Collections.ArrayList ar = new System.Collections.ArrayList();
        StringBuilder field = new StringBuilder();
        quotestatus status = quotestatus.none;
        foreach (char ch in line.ToCharArray())
        {                                
            string chOmsch = "char";
            if (ch == Convert.ToChar(delimiter))
            {
                if (status== quotestatus.firstquote)
                {
                    chOmsch = "char";
                }                         
                else
                {
                    chOmsch = "delimiter";                    
                }                    
            }

            if (ch == Convert.ToChar(34))
            {
                chOmsch = "quotes";           
                if (status == quotestatus.firstquote)
                {
                    status = quotestatus.secondquote;
                }
                if (status == quotestatus.none )
                {
                    status = quotestatus.firstquote;
                }
            }

            switch (chOmsch)
            {
                case "char":
                    field.Append(ch);
                    break;
                case "delimiter":                        
                    ar.Add(field.ToString());
                    field.Clear();
                    break;
                case "quotes":
                    if (status==quotestatus.firstquote)
                    {
                        field.Clear();                            
                    }
                    if (status== quotestatus.secondquote)
                    {                                                                           
                            status =quotestatus.none;                                
                    }                    
                    break;
            }
        }
        if (field.Length != 0)            
        {
            ar.Add(field.ToString());                
        }           
        return ar;
    }

如果您可以保证数据中没有逗号，那么最简单的方法可能是使用字符串分割.

例如：

String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);

您可能可以使用一些库来提供帮助，但这可能是您能得到的最简单的。只需确保数据中不能有逗号，否则您将需要更好地解析它。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow