Question

here is my problem

I have the following array (for example)

string[] arr = new[] { "s_0001", "s_0002", "s_0003", "sa_0004", "sa_0005", "sab_0006", "sab_0007" };

I want to do something that gives the following output

s_0001
sa_0004
sab_0006

I've tried everything but no luck! this will be the first step in a long project and any help would be most appreciated.

[edit] I don't know when will the letters change, but I know that there will always be an underscore to separate the letters from the numbers. I need to somehow extract these letters, and then get rid of the duplicate ones

[edit] More specifically.. I wanna have unique entries of each string before the underscore, the numbers I don't care about

[edit] Ok guys! You're really active I give you that. I didn't expect I would get such quick answers. But as it seems (since I've been working on this for the last 8 hours) I've asked the wrong question

Here is my code

//Loop through the XML files in the Directory and get
//the objectName and GUID of each file
string[] arr_xmlFiles = Directory.GetFiles(Dir, "*.xml");   //Array with all XML Files in the Directory

foreach (string xmlFile in arr_xmlFiles)
{
    try
    {
        //Get the XMLs Name
        XDocument xmlF = XDocument.Load(xmlFile);
        string objectName = xmlF.Root.Name.ToString();

        //Get the XMLs GUID
        XElement oDcElement = xmlF.Root.FirstNode as XElement;
        Guid oGuid = new Guid(oDcElement.Attribute("DataclassId").Value);

        //Prints out the results 
        Console.WriteLine(" " + objectName + "    " + oGuid);
    }
    catch (XmlException) { }
}

What I'm doing basically is the following I get all the XML files in a directory (They contain the ObjectName with its GUID)

i.e

CM_Commands [0ee2ab91-4971-4fd3-9752-cf47c8ba4a01].xml    
CM_Commands [1f627f72-ca7b-4b07-8f93-c5750612c209].xml

Sorry the breaking sign was '[' not '_' but it doesn't matter.

Now I save all these XMLs in an Array, then I wanna extract from these XMLs the ObjectName and the GUID for each one

After I do that I wanna do some modifications on only one of each XML that holds the same objectName

That's all

Was it helpful?

Solution

EDIT #3: detailed comments added to snippet below (see updated code under EDIT 2). Also note that if you want to return these from a method you'll need to setup a new class with these properties, such as:

public class MyClass 
{
    public string ObjectName { get; set; }
    public string Guid { get; set; }
    public string FileName { get; set; }
}

With a class available, the select statement would change from select new { ... } to:

/* start of query unchanged ... */
select new MyClass
{
    ObjectName = split[0],
    Guid = split[1],
    FileName = f.FullName
};

Your method, with all this code, would then have a return type of IEnumerable<MyClass>. You could easily change it to a List<MyClass> by using return results.ToList();.

EDIT #2: to extract the objectName and Guid from your filename you don't need to do all that tedious XML work to get the information from the internal details.

Assuming your objectName and Guid are always separated by a space, you can use the following code. Otherwise more parsing (or, optionally, a regex) may be needed.

string path = @"C:\Foo\Bar"; // your path goes here
var dirInfo = new DirectoryInfo(path);

// DirectoryInfo.GetFiles() returns an array of FileInfo[]
// FileInfo's Name property gives us the file's name without the full path
// LINQ let statement stores the split result, splitting the filename on spaces
// and dots to get the objectName, and Guid separated from the file extension.
// The "select new" projects the results into an anonymous type with the specified
// properties and respectively assigned values. I stored the fullpath just in case.
var query = from f in dirInfo.GetFiles("*.xml")
            let split = f.Name.Split(new[] { ' ', '.' })
            select new 
            {
                ObjectName = split[0],
                Guid = split[1],
                FileName = f.FullName
            };

// Now that the above query has neatly separated the ObjectName, we use LINQ
// to group by ObjectName (the group key). Multiple files may exist under the same
// key so we then select the First item from each group.
var results = query.GroupBy(o => o.ObjectName)
                   .Select(g => g.First());

// Iterate over the results using the projected property names.
foreach (var item in results)
{
    Console.WriteLine(item.FileName);
    Console.WriteLine("ObjectName: {0} -- Guid {1}", item.ObjectName, item.Guid);
}

This fits your sample data, however if you anticipate filenames with . characters the above will break. To remedy such a scenario change:

  1. The Split to: let split = f.Name.Split(' ')
  2. The Guid to: Guid = split[1].Substring(0, split[1].LastIndexOf('.')),


Since you know there'll always be an underscore you can try this approach:

string[] arr = {"s_0001", "s_0002", "s_0003", "sa_0004", "sa_0005", "sab_0006", "sab_0007"};

var query = arr.GroupBy(s => s.Substring(0, s.IndexOf('_')))
               .Select(g => g.First());

foreach (string s in query)
    Console.WriteLine(s);    // s_0001, sa_0004, sab_0006

This will take the first item of each group so unless your items are pre-sorted, you may want to throw in an OrderBy in the Select: .Select(g => g.OrderBy(s => s).First());

EDIT: in response to your edit, to get the distinct letters before the underscore (i.e., s, sa, sab) you can use the Enumerable.Distinct method as follows:

var query = arr.Select(s => s.Substring(0, s.IndexOf('_')))
               .Distinct();    // s, sa, sab

That will give you an IEnumerable<string> that you can iterate over with a foreach as shown earlier.

OTHER TIPS

Dictionary<string,string> lettersToRecords = new Dictionary<string,string>();
arr.Foreach((record) =>
    {
        string letters = record.Split('_')[0];
        if(!lettersToRecords.Keys.Contains(letters))
        {
            lettersToRecords[letters] = record;
        }
    });

This was my first instinct:

string[] arr = {"s_0001", "s_0002", "s_0003", "sa_0004", "sa_0005", "sab_0006", "sab_0007"};

arr.Select(a => Regex.Match(a,@"([A-Za-z]+)_([0-9]+)").Groups[1].ToString()).Distinct();

arr[0] arr[3] arr[6]

So essentially each element of the array represents two values: the prefix ("s", "sa", "sab") and the suffix ("0001", "0002", "0003", "0004", "0005", "0006", "0007").

Here's an example using Linq to break the strings apart into a prefix and a suffix, then grouping the elements together based on the prefix. The final step just iterates over the groupings and outputs the prefix as well as the suffix of the first element found with that prefix:

string[] arr = new[] { "s_0001", "s_0002", "s_0003", "sa_0004", "sa_0005", "sab_0006", "sab_0007" };

var elementsByPrefix = arr.Select(s =>
{
    int indexOfUnderscore = s.IndexOf('_');
    if (indexOfUnderscore >= 0)
    {
        return new { Prefix = s.Substring(0, indexOfUnderscore), Suffix = s.Substring(indexOfUnderscore + 1, s.Length - (indexOfUnderscore + 1)) };
    }
    else
    {
        return new { Prefix = s, Suffix = string.Empty };
    }
}).GroupBy(item => item.Prefix);

foreach (var element in elementsByPrefix)
{
    Console.WriteLine("{0}_{1}", element.Key, element.First().Suffix);
}

The output of this code does not exactly match your original question, because this will output "sab_0006" instead of "sab_0007", but you didn't really specify what the rules are for outputting one vs. the other, so I'm just making the assumption that you wanted either the first element with that prefix or an arbitrary element with that prefix.

You could use string.Split('_') on each of the strings in the array.

Memorize the prefix and after extracting one word with that prefix ignore all of the words with the same prefix.

If the array has a specific order you can even optimize a little.

As far as I could understand you want to distinct the set by element's prefix, so do next:

class YourStringComparer : System.Collections.Generic.IEqualityComparer<string[]>
{
    public bool Equals(string[] x, string[] y)
    {
        throw new NotImplementedException(); // not used here
    }

    public int GetHashCode(string[] obj)
    {
        return obj.First().GetHashCode();
    }
}

string[] arr = new[] { "s_0001", "s_0002", "s_0003", "sa_0004", "sa_0005", "sab_0006", "sab_0007" };

var r = arr.Select(s => s.Split('_')).Distinct(new YourStringComparer());
// "s_0001", "sa_0004", "sab_0006"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top