Question

I try process file which writen by russian symbols. When read and after write text to file i get something like:

"\160\192\231\229\240\225\224\233\228\230\224\237"

How i can get normal symbols ?

Thanks

Was it helpful?

Solution 3

I have got success.

{-# LANGUAGE ImplicitParams #-}

import Network.HTTP
import Text.HTML.TagSoup
import Data.Encoding
import Data.Encoding.CP1251
import Data.Encoding.UTF8

openURL x =  do 
        x <- simpleHTTP (getRequest x)
        fmap (decodeString CP1251) (getResponseBody x)

main :: IO ()
main = do
    tags <- fmap parseTags $ openURL "http://www.trade.su/search?ext=1"
    let TagText r  = partitions (~== "<input type=checkbox>") tags !! 1 !! 4
    appendFile "out" r

OTHER TIPS

If you are getting strings with backslashes and numbers in, then it sounds like you might be calling "print" when you want to call "putStr".

If you deal with Unicode, you might try utf8-string package

import System.IO hiding (hPutStr, hPutStrLn, hGetLine, hGetContents, putStrLn)
import System.IO.UTF8
import Codec.Binary.UTF8.String (utf8Encode)
main = System.IO.UTF8.putStrLn "Вася Пупкин"

However it didn't work well in my windows CLI garbling the output because of codepage. I expect it to work fine on other Unix-like systems if your locale is set correctly. However writing to file should be successfull on all systems.

UPDATE:

An example on encoding package usage.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top