Question

There is the problem of trying to see if two unique strings are anagrams of each other. The first solution that I had considered would be to sort both strings and see if they were equal to each other.

I have been considering another solution and I would like to discuss if the same would be feasible.

The idea would be to assign a numerical value to each character and sum it up such that a unique set of characters would produce a unique value. As we are testing for anagrams, we do not mind if the checksum of "asdf" and "adsf" are the same -- in fact, we require it to be that way. However the checksum of strings "aa" and "b" shouldn't be equal.

I was considering assigning the first 52 prime numbers to alphabets "a" through "z" and then "A" through "Z"(assume we only have alphabets).

The above scheme would break if the sum of any two or more primes in the set of 52 primes could result in another prime existing in the set.

My doubts are :-

  1. Is there any numbering scheme that would satify my requirements?
  2. I'm unsure about the math involved ; is it possible to prove/is there any proof that suggests that the sum of two or more primes in the set of the first 52 primes has at least one value that exists in the same set?

Thanks.

Was it helpful?

Solution

Use multiplication instead of addition. Primes are "multiplicatively unique", but not "additively unique".

OTHER TIPS

A slightly more clunky way to do it would require the length of your longest string max_len (or the largest number of any specific character for slightly better performance). Given that, your hash could look like

number_of_a*max_len^51 + number_of_b*max_len^50 + ... + number_of_Z*max_len^0

If you preferred to use primes, multiplication will work better, as previously mentioned.

Of course, you could achieve the same effect by having an array of 52 values instead.

You are trying to compare two sorted strings for equality by comparing two n-bit numbers for equality. As soon as your strings are long enough that there are more than 2^n possible sorted strings you will definitely have two different sorted strings that produce the same n-bit number. It is likely, by the http://en.wikipedia.org/wiki/Birthday_problem, that you will hit problems before this, unless (as with multiplication of primes) there is some theorem saying that you cannot have two different strings from the same number.

In some cases you might save time by using this idea as a quick first check for equality, so that you only need to compare sorted strings if their numbers match.

Don't use prime numbers - prime numbers properties are related to division, not sums. However, the idea is good, you could use bit sets but you would hit another problem - duplicate letters (same problem with primes, 1+1+1=3). So, you can use an integer sets, an array 1...26 of frequency of letters.

def primes(n):
    array = [i for i in range(2,n+1)]
    p = 2

    while p <= n:
        i = 2*p
        while i <= n:
            array[i-2] = 0
            i += p
        p += 1

    return [num for num in array if num > 0]

def anagram(a):
    # initialize a list
    anagram_list = []
    for i in a: 
        for j in a: 
            if i != j and (sorted(str(i))==sorted(str(j))):
                anagram_list.append(i)
    return anagram_list

if __name__ == '__main__':
    print("The Prime Numbers are:\n",primes(1000),"\n")
    a = primes(1000)
    print("Prime Numbers between 0 to 100:")
    T100 = a[:25]
    print(T100,"\n")
    print("The Anagram elements from 0 to 100 are listed :", anagram(T100),"\n")
    print("Prime Numbers between 101 to 200:")
    T200 = a[25:46]
    print(T200,"\n")
    print("The Anagram elements from 101 to 200 are listed :",anagram(T200),"\n")
    print("Prime Numbers between 201 to 300:")
    T300 = a[46:62]
    print(T300,"\n")
    print("The Anagram elements from 201 to 300 are listed :",anagram(T300),"\n")
    print("Prime Numbers between 301 to 400:")
    T400 = a[62:78]
    print(T400,"\n")
    print("The Anagram elements from 301 to 400 are listed :",anagram(T400),"\n")
    print("Prime Numbers between 401 to 500:")
    T500 = a[78:95]
    print(T500,"\n")
    print("The Anagram elements from 401 to 500 are listed :",anagram(T500),"\n")
    print()
    print("Prime Numbers between 501 to 600:")
    T600 = a[95:109]
    print(T600,"\n")
    print("The Anagram elements from 501 to 600 are listed :",anagram(T600),"\n")
    print("Prime Numbers between 601 to 700:")
    T700 = a[109:125]
    print(T700,"\n")
    print("The Anagram elements from 601 to 700 are listed :",anagram(T700),"\n")
    print("Prime Numbers between 701 to 800:")
    T800 = a[125:139]
    print(T800,"\n")
    print("The Anagram elements from 701 to 800 are listed :",anagram(T800),"\n")
    print()
    print("Prime Numbers between 801 to 900:")
    T900 = a[139:154]
    print(T900,"\n")
    print("The Anagram elements from 801 to 900 are listed :",anagram(T900),"\n")
    print("Prime Numbers between 901 to 1000:")
    T1000 = a[154:168]
    print(T1000,"\n")
    print("The Anagram elements from 901 to 1000 are listed :",anagram(T1000),"\n")

Out Put:

The Prime Numbers are:
 [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997] 

Prime Numbers between 0 to 100:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] 

The Anagram elements from 0 to 100 are listed : [13, 17, 31, 37, 71, 73, 79, 97] 

Prime Numbers between 101 to 200:
[101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199] 

The Anagram elements from 101 to 200 are listed : [113, 131, 137, 139, 173, 179, 193, 197] 

Prime Numbers between 201 to 300:
[211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293] 

The Anagram elements from 201 to 300 are listed : [239, 293] 

Prime Numbers between 301 to 400:
[307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397] 

The Anagram elements from 301 to 400 are listed : [313, 331, 337, 373, 379, 397] 

Prime Numbers between 401 to 500:
[401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499] 

The Anagram elements from 401 to 500 are listed : [419, 491] 


Prime Numbers between 501 to 600:
[503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599] 

The Anagram elements from 501 to 600 are listed : [] 

Prime Numbers between 601 to 700:
[601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691] 

The Anagram elements from 601 to 700 are listed : [613, 619, 631, 691] 

Prime Numbers between 701 to 800:
[701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797] 

The Anagram elements from 701 to 800 are listed : [] 


Prime Numbers between 801 to 900:
[809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887] 

The Anagram elements from 801 to 900 are listed : [] 

Prime Numbers between 901 to 1000:
[907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997] 

The Anagram elements from 901 to 1000 are listed : [919, 991] 

You can also re-frame it how you want if you are a python developer. If a newbee to Python please learn the concept of List.

I don't understand the complexity of the attempted examples so far, so I wrote a simple Python 3 example:

from operator import mul
from functools import reduce

TO_PRIME = dict( \
    a=2, b=3, c=5, d=7, e=11, \
    f=13, g=17, h=19, i=23, j=29, \
    k=31, l=37, m=41, n=43, o=47, \
    p=53, q=59, r=61, s=67, t=71, \
    u=73, v=79, w=83, x=89, y=97, z=101 \
    )

def anagram_product(string):
    return reduce(mul, (TO_PRIME[char.lower()] for char in string if char.isalpha()), 1)

def anagram_check(string_1, string_2):
    return anagram_product(string_1) == anagram_product(string_2)

# True examples

print(repr('Astronomer'), repr('Moon starer'), anagram_check('Astronomer', 'Moon starer'))

print(repr('The Morse code'), repr('Here come dots'), anagram_check('The Morse code', 'Here come dots'))

# False examples (near misses)

print(repr('considerate'), repr('cure is noted'), anagram_check('considerate', 'cure is noted'))

print(repr('debit card'), repr('bed credit'), anagram_check('debit card', 'bed credit'))

OUTPUT

> python3 test.py
'Astronomer' 'Moon starer' True
'The Morse code' 'Here come dots' True
'considerate' 'cure is noted' False
'debit card' 'bed credit' False
> 

The next step is to get this from a product to a sum. One approach I imagine is to map the letters to irrational numbers instead of primes. These irrational numbers would need to be of a type that don't become rational through any sort of addition. Here's a crude example:

from math import pi

ROUND = 4

TO_IRRATIONAL = {letter: pi ** n for n, letter in enumerate('abcdefghijklmnopqrstuvwxyz')}

def anagram_sum(string):
    return round(sum(TO_IRRATIONAL[char.lower()] for char in string if char.isalpha()), ROUND)

def anagram_check(string_1, string_2):
    return anagram_sum(string_1) == anagram_sum(string_2)

# True examples

print(repr('Astronomer'), repr('Moon starer'), anagram_check('Astronomer', 'Moon starer'))

print(repr('The Morse code'), repr('Here come dots'), anagram_check('The Morse code', 'Here come dots'))

# False examples (near misses)

print(repr('considerate'), repr('cure is noted'), anagram_check('considerate', 'cure is noted'))

print(repr('debit card'), repr('bed credit'), anagram_check('debit card', 'bed credit'))

OUTPUT

> python3 test2.py
'Astronomer' 'Moon starer' True
'The Morse code' 'Here come dots' True
'considerate' 'cure is noted' False
'debit card' 'bed credit' False
> 

I'm not saying this is an optimal set of irrational numbers, just a rough demonstration of the concept. (Note my need to use round() to make this work which points to one design flaw -- the finite representation of irrational numbers.)

Here is an implementation in c# using the prime numbers way:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;

namespace Anag
{
    class Program
    {
        private static int[] primes100 = new int[]
                                            {
                                                3, 7, 11, 17, 23, 29, 37,
                                                47, 59, 71, 89, 107, 131,
                                                163, 197, 239, 293, 353,
                                                431, 521, 631, 761, 919,
                                                1103, 1327, 1597, 1931,
                                                2333, 2801, 3371, 4049,
                                                4861, 5839, 7013, 8419,
                                                10103, 12143, 14591, 17519,
                                                21023, 25229, 30293, 36353,
                                                43627, 52361, 62851, 75431,
                                                90523, 108631, 130363,
                                                156437, 187751, 225307,
                                                270371, 324449, 389357,
                                                467237, 560689, 672827,
                                                807403, 968897, 1162687,
                                                1395263, 1674319, 2009191,
                                                2411033, 2893249, 3471899,
                                                4166287, 4999559, 5999471,
                                                7199369
                                            };

        private static int[] getNPrimes(int _n)
        {
            int[] _primes;

            if (_n <= 100)
                _primes = primes100.Take(_n).ToArray();
            else
            {
                _primes = new int[_n];

                int number = 0;
                int i = 2;

                while (number < _n)
                {

                    var isPrime = true;
                    for (int j = 2; j <= Math.Sqrt(i); j++)
                    {
                        if (i % j == 0 && i != 2)
                            isPrime = false;
                    }
                    if (isPrime)
                    {
                        _primes[number] = i;
                        number++;
                    }
                    i++;
                }

            }

            return _primes;
        }

        private static bool anaStrStr(string needle, string haystack)
        {
            bool _output = false;

            var needleDistinct = needle.ToCharArray().Distinct();

            int[] arrayOfPrimes = getNPrimes(needleDistinct.Count());

            Dictionary<char, int> primeByChar = new Dictionary<char, int>();
            int i = 0;
            int needlePrimeSignature = 1;

            foreach (var c in needleDistinct)
            {
                if (!primeByChar.ContainsKey(c))
                {
                    primeByChar.Add(c, arrayOfPrimes[i]);

                    i++;
                }
            }

            foreach (var c in needle)
            {
                needlePrimeSignature *= primeByChar[c];
            }

            for (int j = 0; j <= (haystack.Length - needle.Length); j++)
            {
                var result = 1;
                for (int k = j; k < needle.Length + j; k++)
                {
                    var letter = haystack[k];
                    result *= primeByChar.ContainsKey(letter) ? primeByChar[haystack[k]] : 0;
                }

                _output = (result == needlePrimeSignature);
                if (_output)
                    break;
            }

            return _output;
        }


        static void Main(string[] args)
        {
            Console.WriteLine("Enter needle");
            var _needle = Console.ReadLine(); ;
            Console.WriteLine("Enter haystack");
            var _haystack = Console.ReadLine(); 

            Console.WriteLine(anaStrStr(_needle, _haystack));
            Console.ReadLine();

        }
    }
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top