Question

I need to be able to write a function that shows repeated words from a string and return a list of strings in order of its occurrence and ignore non-letters

e.g at hugs prompt

repetitions :: String -> [String]

repetitions > "My bag is is action packed packed."
output> ["is","packed"]
repetitions > "My name  name name is Sean ."
output> ["name","name"]
repetitions > "Ade is into into technical drawing drawing ."
output> ["into","drawing"]
Was it helpful?

Solution

To split a string into words, use the words function (in the Prelude). To eliminate non-word characters, filter with Data.Char.isAlphaNum. Zip the list together with its tail to get adjacent pairs (x, y). Fold the list, consing a new list that contains all x where x == y.

Someting like:

repetitions s = map fst . filter (uncurry (==)) . zip l $ tail l
  where l = map (filter isAlphaNum) (words s)

I'm not sure that works, but it should give you a rough idea.

OTHER TIPS

I am new to this language so my solution could be a kind of ugly in the eyes of an Haskell veteran, but anyway:

let repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (words (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') x)))))

This part will remove all non letters and non spaces from a string s:

filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') s

This one will split a string s to words and group the same words to lists returning list of lists:

List.group (words s)

When this part will remove all lists with less than two elements:

filter (\x -> (length x) > 1) s

After what we will concatenate all lists to one removing one element from them though

concat (map tail s)

This might be inelegent, however it is conceptually very simple. I'm assuming that its looking for consecutive duplicate words like the examples.

-- a wrapper that allows you to give the input as a String
repititions :: String -> [String]
repititions s = repititionsLogic (words s)
-- dose the real work 
repititionsLogic :: [String] -> [String]
repititionsLogic [] = []
repititionsLogic [a] = []
repititionsLogic (a:as) 
    | ((==) a (head as)) = a : repititionsLogic as
    | otherwise = repititionsLogic as

Building on what Alexander Prokofyev answered:

repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (word (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') x)))))

Remove unnecessary parenthesis:

repetitions x = concat (map tail (filter (\x -> length x > 1) (List.group (word (filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x)))))

Use $ to remove more parenthesis (each $ can replace an opening parenthesis if the ending parenthesis is at the end of the expression):

repetitions x = concat $ map tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x

Replace character ranges with functions from Data.Char, merge concat and map:

repetitions x = concatMap tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x

Use a section and currying in points-free style to simplify (\x -> length x > 1) to ((>1) . length). This combines length with (>1) (a partially applied operator, or section) in a right-to-left pipeline.

repetitions x = concatMap tail $ filter ((>1) . length) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x

Eliminate explicit "x" variable to make overall expression points-free:

repetitions = concatMap tail . filter ((>1) . length) . List.group . word . filter (\c -> isAlpha c || isSeparator c)

Now the entire function, reading from right to left, is a pipeline that filters only alpha or separator characters, splits it into words, breaks it into groups, filters those groups with more than 1 element, and then reduces the remaining groups to the first element of each.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top