Sorry I don't quite get FP yet, I want to split a sequence of lines into a sequence of sequences of lines, assuming an empty line as paragraph division, I could do it in python like this:
def get_paraghraps(lines): paragraphs =  paragraph =  for line in lines: if line == "": # I know it could also be "if line:" paragraphs.append(paragraph) paragraph =  else: paragraph.append(line) return paragraphs
How would you go about doing it in Erlang or Haskell?
I'm only a beginning Haskell programmer (and the little Haskell I learnt was 5 years ago), but for a start, I'd write the natural translation of your function, with the accumulator ("the current paragraph") being passed around (I've added types, just for clarity):
type Line = String type Para = [Line] -- Takes a list of lines, and returns a list of paragraphs paragraphs :: [Line] -> [Para] paragraphs ls = paragraphs2 ls  -- Helper function: takes a list of lines, and the "current paragraph" paragraphs2 :: [Line] -> Para -> [Para] paragraphs2  para = [para] paragraphs2 ("":ls) para = para : (paragraphs2 ls ) paragraphs2 (l:ls) para = paragraphs2 ls (para++[l])
*Main> paragraphs ["Line 1", "Line 2", "", "Line 3", "Line 4"] [["Line 1","Line 2"],["Line 3","Line 4"]]
So that's a solution. But then, Haskell experience suggests that there are almost always library functions for doing things like this :) One related function is called groupBy, and it almost works:
paragraphs3 :: [Line] -> [Para] paragraphs3 ls = groupBy (\x y -> y /= "") ls *Main> paragraphs3 ["Line 1", "Line 2", "", "Line 3", "Line 4"] [["Line 1","Line 2"],["","Line 3","Line 4"]]
Oops. What we really need is a "splitBy", and it's not in the libraries, but we can filter out the bad ones ourselves:
paragraphs4 :: [Line] -> [Para] paragraphs4 ls = map (filter (/= "")) (groupBy (\x y -> y /= "") ls)
or, if you want to be cool, you can get rid of the argument and do it the pointless way:
paragraphs5 = map (filter (/= "")) . groupBy (\x y -> y /= "")
I'm sure there is an even shorter way. :-)
Edit: ephemient points out that
(not . null) is cleaner than
(/= ""). So we can write
paragraphs = map (filter $ not . null) . groupBy (const $ not . null)
(not . null) is a strong indication that we really should abstract this out into a function, and this is what the Data.List.Split module does, as pointed out in the answer below.
I'm also trying to learn Haskell. A solution for this question could be:
paragraphs :: [String] -> [[String]] paragraphs  =  paragraphs lines = p : (paragraphs rest) where (p, rest) = span (/= "") (dropWhile (== "") lines)
where I'm using the functions from Data.List. The ones I'm using are already available from the Prelude, but you can find their documentation in the link.
The idea is to find the first paragraph using
span (/= ""). This will return the paragraph, and the lines following. We then recurse on the smaller list of lines which I call
Before splitting out the first paragraph, we drop any empty lines using
dropWhile (== ""). This is important to eat the empty line(s) separating the paragraphs. My first attempt was this:
paragraphs :: [String] -> [[String]] paragraphs  =  paragraphs lines = p : (paragraphs $ tail rest) where (p, rest) = span (/= "") lines
but this fails when we reach the final paragraph since
rest is then the empty string:
*Main> paragraphs ["foo", "bar", "", "hehe", "", "bla", "bla"] [["foo","bar"],["hehe"],["bla","bla"]*** Exception: Prelude.tail: empty list
Dropping empty lines solves this, and it also makes the code treat any number of empty lines as a paragraph separator, which is what I would expect as a user.
The cleanest solution would be to use something appropriate from the split package.
You'll need to install that first, but then
Data.List.Split.splitWhen null should do the job perfectly.
get_paragraphs  paras para = paras ++ [para] get_paragraphs ("":ls) paras para = get_paragraphs ls (paras ++ [para])  get_paragraphs (l:ls) paras para = get_paragraphs ls paras (para ++ [l])
You want to group the lines, so
Data.List seems like a good candidate. It uses a custom function to determine which lines are "equal" so one can supply something that makes lines in the same paragraph "equal". For example:
import Data.List( groupBy ) inpara :: String -> String -> Bool inpara _ "" = False inpara _ _ = True paragraphs :: [String] -> [[String]] paragraphs = groupBy inpara
This has some limitations, since
inpara can only compare two adjacent lines and more complex logic doesn't fit into the framework given by
groupBy. A more elemental solution if is more flexible. Using basic recursion one can write:
paragraphs  =  paragraphs as = para : paragraphs (dropWhile null reminder) where (para, reminder) = span (not . null) as -- splits list at the first empty line
span splits a list at the point the supplied function becomes false (the first empty line),
dropWhile removes leading elements for which the supplied function is true (any leading empty lines).
Better late than never.
import Data.List.Split (splitOn) paragraphs :: String -> [[String]] paragraphs s = filter (not . null) $ map words $ splitOn "\n\n" s paragraphs "a\nb\n\nc\nd" == [["a", "b"], ["c", "d"]] paragraphs "\n\na\nb\n\n\nc\nd\n\n\n" == [["a", "b"], ["c", "d"]] paragraphs "\n\na\nb\n\n \n c\nd\n\n\n" == [["a", "b"], ["c", "d"]]