Domanda

I have a flat file with an unfortunately dynamic column structure. There is a value that is in a hierarchy of values, and each tier in the hierarchy gets its own column. For example, my flat file might resemble this:

StatisticID|FileId|Tier0ObjectId|Tier1ObjectId|Tier2ObjectId|Tier3ObjectId|Status
1234|7890|abcd|efgh|ijkl|mnop|Pending
...

The same feed the next day may resemble this:

StatisticID|FileId|Tier0ObjectId|Tier1ObjectId|Tier2ObjectId|Status
1234|7890|abcd|efgh|ijkl|Complete
...

The thing is, I don't care much about all the tiers; I only care about the id of the last (bottom) tier, and all the other row data that is not a part of the tier columns. I need normalize the feed to something resembling this to inject into a relational database:

StatisticID|FileId|ObjectId|Status
1234|7890|ijkl|Complete
...

What would be an efficient, easy-to-read mechanism for determining the last tier object id, and organizing the data as described? Every attempt I've made feels kludgy to me.

Some things I've done:

  • I have tried to examine the column names for regular expression patterns, identify the columns that are tiered, order them by name descending, and select the first record... but I lose the ordinal column number this way, so that didn't look good.
  • I have placed the columns I want into an IDictionary<string, int> object to reference, but again reliably collecting the ordinal of the dynamic columns is an issue, and it seems this would be rather non-performant.
È stato utile?

Soluzione

I ran into a simular problem a few years ago. I used a Dictionary to map the columns, it was not pretty, but it worked.

First make a Dictionary:

private Dictionary<int, int> GetColumnDictionary(string headerLine)
    {
        Dictionary<int, int> columnDictionary = new Dictionary<int, int>();
        List<string> columnNames = headerLine.Split('|').ToList();

        string maxTierObjectColumnName = GetMaxTierObjectColumnName(columnNames);
        for (int index = 0; index < columnNames.Count; index++)
        {
            if (columnNames[index] == "StatisticID")
            {
                columnDictionary.Add(0, index);
            }

            if (columnNames[index] == "FileId")
            {
                columnDictionary.Add(1, index);
            }

            if (columnNames[index] == maxTierObjectColumnName)
            {
                columnDictionary.Add(2, index);
            }

            if (columnNames[index] == "Status")
            {
                columnDictionary.Add(3, index);
            }
        }

        return columnDictionary;
    }

    private string GetMaxTierObjectColumnName(List<string> columnNames)
    {
        // Edit this function if Tier ObjectId is greater then 9
        var maxTierObjectColumnName = columnNames.Where(c => c.Contains("Tier") && c.Contains("Object")).OrderBy(c => c).Last();

        return maxTierObjectColumnName;
    }

And after that it's simply running thru the file:

private List<DataObject> ParseFile(string fileName)
    {
        StreamReader streamReader = new StreamReader(fileName);

        string headerLine = streamReader.ReadLine();
        Dictionary<int, int> columnDictionary = this.GetColumnDictionary(headerLine);

        string line;
        List<DataObject> dataObjects = new List<DataObject>();
        while ((line = streamReader.ReadLine()) != null)
        {
            var lineValues = line.Split('|');

            string statId = lineValues[columnDictionary[0]];
            dataObjects.Add(
                new DataObject()
                {
                    StatisticId = lineValues[columnDictionary[0]],
                    FileId = lineValues[columnDictionary[1]],
                    ObjectId = lineValues[columnDictionary[2]],
                    Status = lineValues[columnDictionary[3]]
                }
            );
        }

        return dataObjects;
    }

I hope this helps (even a little bit).

Altri suggerimenti

Personally I would not try to reformat your file. I think the easiest approach would be to parse each row from the front and the back. For example:

itemArray = getMyItems();
statisticId = itemArray[0];
fileId = itemArray[1];
//and so on for the rest of your pre-tier columns

//Then get the second to last column which will be the last tier
lastTierId = itemArray[itemArray.length -1];

Since you know the last tier will always be second from the end you can just start at the end and work your way forwards. This seems like it would be much easier than trying to reformat the datafile.

If you really want to create a new file, you could use this approach to get the data you want to write out.

I don't know C# syntax, but something along these lines:

  1. split line in parts with | as separator
  2. get parts [0], [1], [length - 2] and [length - 1]
  3. pass the parts to the database handling code
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top