I'll be using the following imports.
import Data.Char
import Data.List
import qualified Data.Text.Lazy as T
import Criterion.Main
import Test.QuickCheck
I managed to get blazing speeds compared to this reference implementation from the blog post:
capitalize :: T.Text -> T.Text
capitalize = T.tail . T.scanl (\a b -> if isSpace a then toUpper b else b) ' '
Using mapAccumL
is much faster. Here's String
and Text
versions.
{-# INLINE f #-}
f a b = (b, if isSpace a then toUpper b else b)
string :: String -> String
string = snd . mapAccumL f ' '
text :: T.Text -> T.Text
text = snd . T.mapAccumL f ' '
First, let's make sure that the optimization is valid
λ. quickCheck $ \xs ->
capitalize (T.pack xs) == text (T.pack xs)
+++ OK, passed 100 tests.
Now for some benchmark results from criterion
, running each function on a 3.2 M file of Lorem Ipsum. Here's our reference speed.
benchmarking reference
collecting 100 samples, 1 iterations each, in estimated 56.19690 s
mean: 126.4616 ms, lb 126.0039 ms, ub 128.6617 ms, ci 0.950
std dev: 4.432843 ms, lb 224.7290 us, ub 10.55986 ms, ci 0.950
String
is only about 30% slower than the optimized reference Text
version and the mapAccumL
version using Text
is almost twice as fast!
benchmarking string
collecting 100 samples, 1 iterations each, in estimated 16.45751 s
mean: 165.1451 ms, lb 165.0927 ms, ub 165.2112 ms, ci 0.950
std dev: 301.0338 us, lb 250.2601 us, ub 370.2991 us, ci 0.950
benchmarking text
collecting 100 samples, 1 iterations each, in estimated 16.88929 s
mean: 67.67978 ms, lb 67.65432 ms, ub 67.72081 ms, ci 0.950
std dev: 162.8791 us, lb 114.9346 us, ub 246.0348 us, ci 0.950
But there are even more easy gains to be had. Data.Char.isSpace
is known for its performance issues, so lets try the fast Data.Attoparsec.Char8.isSpace
instead. Our quickcheck
test won't pass, but the performance is great.
benchmarking string/atto
collecting 100 samples, 1 iterations each, in estimated 12.91881 s
mean: 129.2176 ms, lb 129.1328 ms, ub 129.4941 ms, ci 0.950
std dev: 705.3433 us, lb 238.2757 us, ub 1.568524 ms, ci 0.950
benchmarking text/atto
collecting 100 samples, 1 iterations each, in estimated 15.76300 s
mean: 38.63183 ms, lb 38.62850 ms, ub 38.63730 ms, ci 0.950
std dev: 21.41514 us, lb 15.27777 us, ub 33.98801 us, ci 0.950
We're now about 3x
faster than the original reference. For comparison, the very fast python code (which is just calling out to C),
print open('lorem.txt').read().title()
rips through the text file in 30ms
.