Sunday, February 19, 2012

haskell string tokenizer

I recently needed a function to split strings on a delimeter and found this one on http://blog.julipedia.org/2006/08/split-function-in-haskell.html:


tokenizeString :: String -> [Char] -> [[Char]]
tokenizeString [] _ = [""]
tokenizeString (c:cs) delim
  | c `elem` delim = [] : rest
  | otherwise = (c : head rest) : tail rest
    where rest = tokenizeString cs delim

It took me a little while to wrap my mind around the recursion and what goes on here, a pen and some paper and trace it out this is really really cool.   It's a shame that I was not able to come up with this on my  own.

Execution trace original string: "a,b" delimiter list: ","
I shortened some names to save space by hMt is really head $ mT and tmT is tail $ mT where mT is myTokenizer.  I am no expert, but the secret sauce appears to be in the base case which always returns a [""] -- list of strings.  And how the string elements are concatenated:

Char : [Char]: [[Char]]  -- Single character to a string to a list of strings

No comments:

Post a Comment