Question

For some game where one would need to find anagrams from a bunch of loose letters I ended up implementing a permutation algorithm to find all possible anagrams and filter those if needed for known letter positions (-match is great, by the way). But for longer words this proved very much error-prone, as skimming a large list of gibberish doesn't really reveal the proper words that were hidden within.

So I thought that if I would have a large list of English words (should be obtainable somewhere) I could just intersect my list of permutations with the list of proper words and get (hopefully) all real words from the permutation list.

Since many operators in PS work differently with collections I thought I could just do something like

$wordlist -contains $permlist

and get the intersection back. Unfortunately it's not that easy. Other options I have thought of would be to iterate over one list and do a -contains for each item:

$permlist | ? { $wordlist -contains $_ }

This probably would work but is also very slow, I think (especially when $wordlist is the result of a gc wordlist.txt). Or I could build a gigantic regular expression:

$wordlist -matches (($permlist | %{ "^$_`$" }) -join "|")

But that would probably not be very fast either. I could maybe also use findstr with above gigantic regex but that feels just wrong.

Are there any built-in solutions I could use and that are better than my attempts so far? Otherwise I'd probably put the word list into a hashtable and use the iterative -contains approach which should be fast enough then.

Was it helpful?

Solution

$left = New-HashSet string
$left.Add("foo")
$left.Add("bar")
$right = New-HashSet string
$right.Add("bar")
$right.Add("baz")

$left.IntersectWith($right)
$left.UnionWith($right)

(borrowing New-HashSet from Josh Einstein)

Warning: those methods on HashSet are in-place algorithms that modify the original collection. If you want functional-style transform on immutable objects, you'll need to bring LINQ to the party:

add-type system.core

$asqueryable = [system.linq.queryable].getmethods() | ? { $_.name -eq "AsQueryable" } | select -first 1
$asqueryable = $asqueryable.MakeGenericMethod([string])
$leftAsQueryable = $asqueryable.Invoke($null, (,$left))

$intersect = [system.linq.queryable].getmethods() | ? { $_.name -eq "Intersect" } | select -first 1
$intersect = $intersect.MakeGenericMethod([string])
$result = $intersect.Invoke($null, ($leftAsQueryable, $right))

Clearly, someone needs to wrap this static-generic-reflection crap into a friendly cmdlet! Don't worry, I'm working on it...

OTHER TIPS

You could spell-check your list of words and eliminate all spelling errors against a standard dictionary.

With the GNU aspell package installed,

 cat text.txt | aspell list

will give you a list of all the miss-spelled words.
You can work with other dictionaries with aspell.


Or just pickup an anagram generator like this one made for Scrabble players.

The Revolution Word Finder has two options; an Anagram Finder and a Scrabble Solver. The Anagram Finder takes a list of letters and returns all the valid anagrams which can be created using them relative to a fixed list of words. Each anagram is checked for validity against the SOWPODS word list which is the word list used in current International Scrabble Tournaments.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top