Tuesday, March 27, 2012

A more readable haskell string tokenizer

Another crack and a tokenizer
otherTok :: String -> [Char] -> [[Char]]
otherTok [] _ = []
otherTok cs delim = foldl(\acc c -> if c `elem` delim then [] : acc else (head acc ++ [c]) : (tail acc) ) [] cs

*Main> otherTok "blah,blah blah. blah! blah" " ,.!"
["blah","","blah","","blah","blah","*** Exception: Prelude.head: empty list

Encountering a delimiter char produces an empty string, which I can remove later with filter, not sure what to do with the Exception.

P.S. Problem solved
otherTok :: String -> [Char] -> [[Char]]
otherTok [] _ = []
otherTok cs delim = foldl(\acc c -> if c `elem` delim then [] : acc else (head acc ++ [c]) : (tail acc) ) [""] cs

*Main> filter (/="") (otherTok "blah,blah blah. blah! blah" " ,.!")
["blah","blah","blah","blah","blah"] 


P.S.S The tokens that result are in reverse order of how the words show up in a line, also easily fixed 

rev :: [a] -> [a]
rev [] = []
rev xs = foldr(\x acc -> acc ++ [x]) [] xs
 

 *Main> rev(filter(/="") (otherTok "abc,def.ghi" ",."))
["abc","def","ghi"]
 

Or just use the builotin reverse and stop trying to reinvent the wheel

Sunday, March 25, 2012

Mac OS finder/account slowness

I've noticed that it takes finder a good 15 seconds to startup sometimes.  None of the forums I visited had any good answers, most dealt with generic account slowness and most recommended to just create a new account.  I decided to run top and see what is running and process flags when I noticed at the top of the list mount and mount_nfs sleeping.

PID   COMMAND      %CPU TIME     #TH  #WQ  #POR #MRE RPRVT  RSHRD  RSIZE  ...
2429  mount_nfs    0.0  00:00.00 1    0    15   28   112K   288K   428K   17M    586M   2385 2428 sleeping ...
2428  mount        0.0  00:00.00 1    0    14   26   104K   284K   404K   9496K  578M   2385 2385 sleeping 
(output  discarded for brevity)
180   Finder       0.0  01:35.63 4    1    213  522  18M    51M    62M    37M    908M   180  174  sleeping 

I booted up my NFS server (which for my own reasons I do not keep constantly powered on) and tried launching finder again and this time, the window popped up with no delay. 

Shut off the NFS server, closed and reopened finder just to make sure that this delay is repeatable and reproducible.  Indeed it is and every time I open a finder window Mac OS (not sure which one of the internals) will attempt to mount_nfs.

Added the following to my NFS mount options
-dumbtimer -timeo=3

Closed the finder window clicked on the finder icon again, this time the delay was much shorter.  Hope this helps someone.

P.S. Having solved this problem does not make me a mac convert.  Figuring out why it takes finder so long to list a directory when ls only takes a split second might get me half way there.


P.S.S. Also deleting related items from /Users/$USER/Library/Preferences


Finder:
com.apple.finder.plist
Digital Photo Professional:
com.canon.Digital Photo Professional.LSSharedFileList.plist
com.canon.Digital Photo Professional.plist

helped, and now they work well even without NFS.  So the forums were right about that, just pointed to a different place when they mentioned preference cache.  So in my case I found the root cause, but resolved it the same way everyone else did.


Thursday, March 1, 2012

HP-UX getent

I spent quite a bit of time looking for a getent equivalent for HP-UX.  I needed something that would return a non zero value if a user or a group I queried were not there.  In case you are having as much fun as I am, the commands are pwget and grget

Sunday, February 19, 2012

haskell string tokenizer

I recently needed a function to split strings on a delimeter and found this one on http://blog.julipedia.org/2006/08/split-function-in-haskell.html:


tokenizeString :: String -> [Char] -> [[Char]]
tokenizeString [] _ = [""]
tokenizeString (c:cs) delim
  | c `elem` delim = [] : rest
  | otherwise = (c : head rest) : tail rest
    where rest = tokenizeString cs delim

It took me a little while to wrap my mind around the recursion and what goes on here, a pen and some paper and trace it out this is really really cool.   It's a shame that I was not able to come up with this on my  own.

Execution trace original string: "a,b" delimiter list: ","
I shortened some names to save space by hMt is really head $ mT and tmT is tail $ mT where mT is myTokenizer.  I am no expert, but the secret sauce appears to be in the base case which always returns a [""] -- list of strings.  And how the string elements are concatenated:

Char : [Char]: [[Char]]  -- Single character to a string to a list of strings

Wednesday, February 15, 2012

Fun with Haskell

Had some more time to play with Haskell today, I am still very new at this, not being able to use iteration makes coding "interesting".  Also this makes nesting data structures a little challenging.

For example take a function that returns a list:
multAddFunc :: Integer -> Integer -> [Integer]
multAddFunc a b = [a*b, a+b]

*Main> multAddFunc 2 3
[6,5]

Now lets say we had two lists of numbers [1,2,3] [4,5,6] And we wanted to write a function that processed them:
*Main> [ multAddFunc x y | x<-[1,2,3], y<-[4,5,6] ]
[[4,5],[5,6],[6,7],[8,6],[10,7],[12,8],[12,7],[15,8],[18,9]]

This just returned a list of lists, observe

*Main> :t [ multAddFunc x y | x<-[1,2,3], y<-[4,5,6] ]
[[4,5],[5,6],[6,7],[8,6],[10,7],[12,8],[12,7],[15,8],[18,9]]

But what if we wanted a flattened list?  For this we would have to concatenate the lists together using the ++ operator

*Main> :t (++)
(++) :: [a] -> [a] -> [a]

listMath :: [Integer] -> [Integer] -> [Integer]
listMath [] [] = []
listMath (x:xs) (y:ys) = (multAddFunc x y) ++ (listMath xs ys)

*Main> listMath [1,2,3] [4,5,6]
[4,5,10,7,18,9]

And there we have it a flattened list










Monday, January 30, 2012

Shingles


It is easy to create a naive Data Leakage Protection (DLP) Product that will look for exact data or pattern matches, it is a lot more difficult to spot similarity between documents such as this document is x% similar to this reference.  This article looks interesting and the approach seems easy to implement  http://nlp.stanford.edu/IR-book/html/htmledition/near-duplicates-and-shingling-1.html