Question

I have a long list of words in C#, and I want to find all the words within that list that have the same first and last letters and that have a length of between, say, 5 and 7 characters. For example, the list might have:

"wasted was washed washing was washes watched watches wilts with wastes wits washings"

It would return

Length: 5-7, First letter: w, Last letter: d, "wasted, washed, watched" Length: 5-7, First letter: w, Last letter: s, "washes, watches, wilts, wastes"

Then I might change the specification for a length of 3-4 characters which would return

Length: 3-4, First letter: w, Last letter: s, "was, wits"

I found this method of splitting which is really fast, made each item unique, used the length and gave an excellent start: Spliting string into words length-based lists c#

Is there a way to modify/use that to take account of first and last letters?

EDIT

I originally asked about the 'fastest' way because I usually solve problems like this with lots of string arrays (which are slow and involve a lot of code). LINQ and lookups are new to me, but I can see that the ILookup used in the solution I linked to is amazing in its simplicity and is very fast. I don't actually need the minimum processor time. Any approach that avoids me creating separate arrays for this information would be fantastic.

Was it helpful?

Solution

this one liner will give you groups with same first/last letter in your range

 int min = 5;
 int max = 7;
 var results = str.Split()
                     .Where(s => s.Length >= min && s.Length <= max)
                     .GroupBy(s => new { First = s.First(), Last = s.Last()});

OTHER TIPS

var minLength = 5;
var maxLength = 7;
var firstPart = "w";
var lastPart = "d";

var words = new List<string> { "washed", "wash" }; // so on

var matches = words.Where(w => w.Length >= minLength && w.Length <= maxLength && 
                               w.StartsWith(firstPart) && w.EndsWith(lastPart))
                   .ToList();

for the most part, this should be fast enough, unless you're dealing with tens of thousands of words and worrying about ms. then we can look further.

Just in LINQPad I created this:

void Main()
{
var words = new []{"wasted", "was", "washed", "washing", "was", "washes", "watched", "watches", "wilts", "with", "wastes", "wits", "washings"};

var firstLetter = "w";
var lastLetter = "d";
var minimumLength = 5;
var maximumLength = 7;

var sortedWords = words.Where(w => w.StartsWith(firstLetter) && w.EndsWith(lastLetter) && w.Length >= minimumLength && w.Length <= maximumLength);
sortedWords.Dump();
}

If that isn't fast enough, I would create a lookup table:

Dictionary<char, Dictionary<char, List<string>> lookupTable;

and do:

lookupTable[firstLetter][lastLetter].Where(<check length>)

Here's a method that does exactly what you want. You are only given a list of strings and the min/max length, correct? You aren't given the first and last letters to filter on. This method processes all the first/last letters in the strings.

private static void ProcessInput(string[] words, int minLength, int maxLength)
{
    var groups = from word in words
                 where word.Length > 0 && word.Length >= minLength && word.Length <= maxLength
                 let key = new Tuple<char, char>(word.First(), word.Last())
                 group word by key into @group
                 orderby Char.ToLowerInvariant(@group.Key.Item1), @group.Key.Item1, Char.ToLowerInvariant(@group.Key.Item2), @group.Key.Item2
                 select @group;
    Console.WriteLine("Length: {0}-{1}", minLength, maxLength);
    foreach (var group in groups)
    {
        Console.WriteLine("First letter: {0}, Last letter: {1}", group.Key.Item1, group.Key.Item2);
        foreach (var word in group)
            Console.WriteLine("\t{0}", word);
    }
}

Just as a quick thought, I have no clue if this would be faster or more efficient than the linq solutions posted, but this could also be done fairly easily with regular expressions.

For example, if you wanted to get 5-7 letter length words that begin with "w" and end with "s", you could use a pattern along the lines of:

\bw[A-Za-z]{3,5}s\b

(and this could fairly easily be made to be more variable driven - For example, have a variable for first letter, min length, max length, last letter and plug them in to the pattern to replace w, 3, 5 & s)

Them, using the RegEx library, you could then just take your captured groups to be your list.

Again, I don't know how this compares efficiency-wise to linq, but I thought it might deserve mention.

Hope this helps!!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top