Tuesday, March 27, 2012

A more readable haskell string tokenizer

Another crack and a tokenizer
otherTok :: String -> [Char] -> [[Char]]
otherTok [] _ = []
otherTok cs delim = foldl(\acc c -> if c `elem` delim then [] : acc else (head acc ++ [c]) : (tail acc) ) [] cs

*Main> otherTok "blah,blah blah. blah! blah" " ,.!"
["blah","","blah","","blah","blah","*** Exception: Prelude.head: empty list

Encountering a delimiter char produces an empty string, which I can remove later with filter, not sure what to do with the Exception.

P.S. Problem solved
otherTok :: String -> [Char] -> [[Char]]
otherTok [] _ = []
otherTok cs delim = foldl(\acc c -> if c `elem` delim then [] : acc else (head acc ++ [c]) : (tail acc) ) [""] cs

*Main> filter (/="") (otherTok "blah,blah blah. blah! blah" " ,.!")
["blah","blah","blah","blah","blah"] 


P.S.S The tokens that result are in reverse order of how the words show up in a line, also easily fixed 

rev :: [a] -> [a]
rev [] = []
rev xs = foldr(\x acc -> acc ++ [x]) [] xs
 

 *Main> rev(filter(/="") (otherTok "abc,def.ghi" ",."))
["abc","def","ghi"]
 

Or just use the builotin reverse and stop trying to reinvent the wheel

No comments:

Post a Comment