Question

I'm new to this ERLANG, I know the basics. It's like scheme but broader. I know how to create a function but I'm having problems with creating a function that gets the Longest Common Subsequence.

lcs(str1,str2) is a function that will accept two string and output an integer:

lcs(algorithm,logarithm) will output 7 because the longest common subsequence that can be made is lgrithm which is 7 in size.

Any answer is greatly appreciated.

Was it helpful?

Solution

You have a pretty good implementation of an LCS algorithm available on Rosettacode, which is:

-module(lcs).
-compile(export_all).

lcs_length(S,T) ->
    {L,_C} = lcs_length(S,T,dict:new()),
    L.

lcs_length([]=S,T,Cache) ->
    {0,dict:store({S,T},0,Cache)};
lcs_length(S,[]=T,Cache) ->
    {0,dict:store({S,T},0,Cache)};
lcs_length([H|ST]=S,[H|TT]=T,Cache) ->
    {L,C} = lcs_length(ST,TT,Cache),
    {L+1,dict:store({S,T},L+1,C)};
lcs_length([_SH|ST]=S,[_TH|TT]=T,Cache) ->
    case dict:is_key({S,T},Cache) of
        true -> {dict:fetch({S,T},Cache),Cache};
        false ->
            {L1,C1} = lcs_length(S,TT,Cache),
            {L2,C2} = lcs_length(ST,T,C1),
            L = lists:max([L1,L2]),
            {L,dict:store({S,T},L,C2)}
    end.

lcs(S,T) ->
    {_,C} = lcs_length(S,T,dict:new()),
    lcs(S,T,C,[]).

lcs([],_,_,Acc) ->
    lists:reverse(Acc);
lcs(_,[],_,Acc) ->
    lists:reverse(Acc);
lcs([H|ST],[H|TT],Cache,Acc) ->
    lcs(ST,TT,Cache,[H|Acc]);
lcs([_SH|ST]=S,[_TH|TT]=T,Cache,Acc) ->
    case dict:fetch({S,TT},Cache) > dict:fetch({ST,T},Cache) of
        true ->
            lcs(S,TT,Cache, Acc);
        false ->
            lcs(ST,T,Cache,Acc)
    end.

Used as:

raringbeast:Code pierre $ erl
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.9.1  (abort with ^G)
1> c(lcs).
{ok,lcs}
2> lcs:lcs("logarithm", "algorithm").
"lgrithm"
3> lcs:lcs_length("logarithm", "algorithm").
7

--

Edit: Made the algorithm a little easier to understand

The cache here is just an interesting way to improve algorithm performance in some cases, but this isn't required here. We can just write, removing the cache:

lcs_length([],_T) ->
    0;
lcs_length(_S,[]) ->
    0;
lcs_length([_H|ST],[_H|TT]) ->
    1 + lcs_length(ST,TT);
lcs_length([_SH|ST]=S,[_TH|TT]=T) ->
    L1 = lcs_length(S,TT),
    L2 = lcs_length(ST,T),
    lists:max([L1,L2]). 

To sum up:

  1. The LCS of "" and anything is 0.
  2. Same for the LCS of anything and "".
  3. The LCS of two words starting by the same letter is the LCS of the two words minus the first letter plus 1.
  4. The LCS of two words starting by different letters is the max of (a) the LCS of the first word minus the first letter and the second, and (b) the LCS of the first word and the second minus the first letter.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top