.Net에서 CSV 파일 가져오기 [닫기]

https://stackoverflow.com/questions/1898

08-06-2019
|

문제

나는 이것이 초보자 질문이라는 것을 알고 있지만 간단한 해결책을 찾고 있습니다. 하나가 있어야 할 것 같습니다.

CSV 파일을 강력한 형식의 데이터 구조로 가져오는 가장 좋은 방법은 무엇입니까?다시 간단하게 = 더 좋습니다.

해결책

다른 팁

마이크로소프트의 TextFieldParser 안정적이고 따라온다 RFC 4180 CSV 파일의 경우.에 의해 연기되지 마십시오 Microsoft.VisualBasic 네임스페이스;.NET Framework의 표준 구성 요소이므로 전역 참조를 추가하기만 하면 됩니다. Microsoft.VisualBasic 집회.

Mono가 아닌 Windows용으로 컴파일하고 "깨진"(RFC 규격이 아닌) CSV 파일을 구문 분석할 필요가 없을 것으로 예상된다면 이 파일은 무료이고 제한이 없으며 안정적이므로 확실한 선택이 될 것입니다. 적극적으로 지원되며 대부분은 FileHelpers에 대해서는 말할 수 없습니다.

또한보십시오: 어떻게:Visual Basic에서 쉼표로 구분된 텍스트 파일 읽기 VB 코드 예제의 경우.

OleDB 연결을 사용하십시오.

String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();

CSV 구문 분석에 대해 상당히 복잡한 시나리오가 예상되는 경우 우리 자신의 파서를 굴릴 생각조차 하지 마세요.다음과 같은 훌륭한 도구가 많이 있습니다. FileHelpers, 또는 심지어 코드프로젝트.

요점은 이것이 매우 일반적인 문제이며 다음과 같이 확신할 수 있다는 것입니다. 많이 의 소프트웨어 개발자가 이미 이 문제에 대해 생각하고 해결했습니다.

Brian은 이를 강력한 형식의 컬렉션으로 변환하기 위한 훌륭한 솔루션을 제공합니다.

제공된 대부분의 CSV 구문 분석 방법은 이스케이프 필드 또는 CSV 파일의 기타 미묘한 부분(예: 필드 자르기)을 고려하지 않습니다.제가 개인적으로 사용하는 코드는 다음과 같습니다.가장자리가 약간 거칠고 오류 보고가 거의 없습니다.

public static IList<IList<string>> Parse(string content)
{
    IList<IList<string>> records = new List<IList<string>>();

    StringReader stringReader = new StringReader(content);

    bool inQoutedString = false;
    IList<string> record = new List<string>();
    StringBuilder fieldBuilder = new StringBuilder();
    while (stringReader.Peek() != -1)
    {
        char readChar = (char)stringReader.Read();

        if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
        {
            // If it's a \r\n combo consume the \n part and throw it away.
            if (readChar == '\r')
            {
                stringReader.Read();
            }

            if (inQoutedString)
            {
                if (readChar == '\r')
                {
                    fieldBuilder.Append('\r');
                }
                fieldBuilder.Append('\n');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();

                records.Add(record);
                record = new List<string>();

                inQoutedString = false;
            }
        }
        else if (fieldBuilder.Length == 0 && !inQoutedString)
        {
            if (char.IsWhiteSpace(readChar))
            {
                // Ignore leading whitespace
            }
            else if (readChar == '"')
            {
                inQoutedString = true;
            }
            else if (readChar == ',')
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else if (readChar == ',')
        {
            if (inQoutedString)
            {
                fieldBuilder.Append(',');
            }
            else
            {
                record.Add(fieldBuilder.ToString().TrimEnd());
                fieldBuilder = new StringBuilder();
            }
        }
        else if (readChar == '"')
        {
            if (inQoutedString)
            {
                if (stringReader.Peek() == '"')
                {
                    stringReader.Read();
                    fieldBuilder.Append('"');
                }
                else
                {
                    inQoutedString = false;
                }
            }
            else
            {
                fieldBuilder.Append(readChar);
            }
        }
        else
        {
            fieldBuilder.Append(readChar);
        }
    }
    record.Add(fieldBuilder.ToString().TrimEnd());
    records.Add(record);

    return records;
}

이는 필드가 큰따옴표로 구분되지 않고 내부에 인용된 문자열이 있는 meerley의 극단적인 경우를 처리하지 않는다는 점에 유의하세요.보다 이 게시물 좀 더 나은 설명과 적절한 라이브러리에 대한 링크가 필요합니다.

@에 동의합니다.나 자신이 아니다. FileHelpers 잘 테스트되었으며 직접 수행할 경우 결국 처리해야 하는 모든 종류의 극단적인 경우를 처리합니다.FileHelpers가 수행하는 작업을 살펴보고 (1) FileHelpers가 수행하는 극단적인 경우를 처리할 필요가 전혀 없거나 (2) 이러한 유형의 작업을 좋아하고 앞으로도 계속 수행할 것이라고 확신하는 경우에만 직접 작성하십시오. 다음과 같은 내용을 구문 분석해야 할 때 매우 기뻐하십시오.

1,"빌","스미스","감독자", "코멘트 없음"

2 , '드레이크,' , '오말리', "관리인,

이런, 저는 인용되지 않았고 새로운 줄에 있습니다!

심심해서 제가 쓴 내용을 일부 수정했습니다.파일을 통한 반복 횟수를 줄이면서 OO 방식으로 구문 분석을 캡슐화하려고 시도하고 foreach 상단에서 한 번만 반복합니다.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

namespace ConsoleApplication1
{
    class Program
    {

        static void Main(string[] args)
        {

            // usage:

            // note this wont run as getting streams is not Implemented

            // but will get you started

            CSVFileParser fileParser = new CSVFileParser();

            // TO Do:  configure fileparser

            PersonParser personParser = new PersonParser(fileParser);

            List<Person> persons = new List<Person>();
            // if the file is large and there is a good way to limit
            // without having to reparse the whole file you can use a 
            // linq query if you desire
            foreach (Person person in personParser.GetPersons())
            {
                persons.Add(person);
            }

            // now we have a list of Person objects
        }
    }

    public abstract  class CSVParser 
    {

        protected String[] deliniators = { "," };

        protected internal IEnumerable<String[]> GetRecords()
        {

            Stream stream = GetStream();
            StreamReader reader = new StreamReader(stream);

            String[] aRecord;
            while (!reader.EndOfStream)
            {
                  aRecord = reader.ReadLine().Split(deliniators,
                   StringSplitOptions.None);

                yield return aRecord;
            }

        }

        protected abstract Stream GetStream(); 

    }

    public class CSVFileParser : CSVParser
    {
        // to do: add logic to get a stream from a file

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        } 
    }

    public class CSVWebParser : CSVParser
    {
        // to do: add logic to get a stream from a web request

        protected override Stream GetStream()
        {
            throw new NotImplementedException();
        }
    }

    public class Person
    {
        public String Name { get; set; }
        public String Address { get; set; }
        public DateTime DOB { get; set; }
    }

    public class PersonParser 
    {

        public PersonParser(CSVParser parser)
        {
            this.Parser = parser;
        }

        public CSVParser Parser { get; set; }

        public  IEnumerable<Person> GetPersons()
        {
            foreach (String[] record in this.Parser.GetRecords())
            {
                yield return new Person()
                {
                    Name = record[0],
                    Address = record[1],
                    DOB = DateTime.Parse(record[2]),
                };
            }
        }
    }
}

CodeProject에는 솔루션에 대한 코드를 제공하는 두 개의 기사가 있습니다. 스트림리더 그리고 그거 하나 CSV 데이터 가져오기 사용하여 마이크로소프트 텍스트 드라이버.

이를 수행하는 간단하고 좋은 방법은 파일을 열고 각 줄을 배열, 연결 목록, 선택한 데이터 구조로 읽는 것입니다.하지만 첫 번째 줄을 다룰 때는 주의하세요.

이것은 당신의 머리 위에 있을 수도 있지만, 연결 문자열.

C#이나 VB 대신 Python을 사용해 보는 것은 어떨까요?여기에는 모든 무거운 작업을 수행하는 가져올 수 있는 멋진 CSV 모듈이 있습니다.

이번 여름 프로젝트를 위해 .NET에서 CSV 파서를 사용해야 했고 Microsoft Jet Text Driver를 선택했습니다.연결 문자열을 사용하여 폴더를 지정한 다음 SQL Select 문을 사용하여 파일을 쿼리합니다.Schema.ini 파일을 사용하여 강력한 유형을 지정할 수 있습니다.처음에는 이 작업을 수행하지 않았지만 IP 번호나 "XYQ 3.9 SP1"과 같은 항목과 같이 데이터 유형이 즉시 명확하지 않은 잘못된 결과를 얻었습니다.

제가 겪은 한 가지 제한 사항은 64자를 초과하는 열 이름을 처리할 수 없다는 것입니다.잘립니다.매우 잘못 설계된 입력 데이터를 다루고 있다는 점을 제외하면 이는 문제가 되지 않습니다.ADO.NET DataSet을 반환합니다.

이것이 내가 찾은 최고의 솔루션이었습니다.나는 최종 사례 중 일부를 놓칠 수도 있고 .NET용 다른 무료 CSV 구문 분석 패키지를 찾지 못했기 때문에 내 자신의 CSV 파서를 굴리는 것에 주의할 것입니다.

편집하다:또한 디렉터리당 하나의 Schema.ini 파일만 있을 수 있으므로 필요한 열을 강력하게 입력하기 위해 이 파일에 동적으로 추가했습니다.지정된 열만 강력하게 입력하고 지정되지 않은 필드는 추론합니다.저는 유동적인 70개 이상의 열이 있는 CSV를 가져오는 작업을 다루고 있었고 각 열을 지정하고 싶지 않고 오작동하는 열만 지정하고 싶었기 때문에 이 점에 대해 정말 감사했습니다.

일부 코드를 입력했습니다.DataGridviewer의 결과는 좋아 보였습니다.한 줄의 텍스트를 개체 배열 목록으로 구문 분석합니다.

    enum quotestatus
    {
        none,
        firstquote,
        secondquote
    }
    public static System.Collections.ArrayList Parse(string line,string delimiter)
    {        
        System.Collections.ArrayList ar = new System.Collections.ArrayList();
        StringBuilder field = new StringBuilder();
        quotestatus status = quotestatus.none;
        foreach (char ch in line.ToCharArray())
        {                                
            string chOmsch = "char";
            if (ch == Convert.ToChar(delimiter))
            {
                if (status== quotestatus.firstquote)
                {
                    chOmsch = "char";
                }                         
                else
                {
                    chOmsch = "delimiter";                    
                }                    
            }

            if (ch == Convert.ToChar(34))
            {
                chOmsch = "quotes";           
                if (status == quotestatus.firstquote)
                {
                    status = quotestatus.secondquote;
                }
                if (status == quotestatus.none )
                {
                    status = quotestatus.firstquote;
                }
            }

            switch (chOmsch)
            {
                case "char":
                    field.Append(ch);
                    break;
                case "delimiter":                        
                    ar.Add(field.ToString());
                    field.Clear();
                    break;
                case "quotes":
                    if (status==quotestatus.firstquote)
                    {
                        field.Clear();                            
                    }
                    if (status== quotestatus.secondquote)
                    {                                                                           
                            status =quotestatus.none;                                
                    }                    
                    break;
            }
        }
        if (field.Length != 0)            
        {
            ar.Add(field.ToString());                
        }           
        return ar;
    }

데이터에 쉼표가 없다고 보장할 수 있다면 가장 간단한 방법은 아마도 다음을 사용하는 것입니다. 문자열.분할.

예를 들어:

String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);

도움을 주기 위해 사용할 수 있는 라이브러리가 있을 수 있지만 아마도 가능한 한 간단할 것입니다.데이터에 쉼표가 없어야 합니다. 그렇지 않으면 데이터를 더 잘 구문 분석해야 합니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow