Haskell

Connecting to a Riak node using Haskell and viewing Riak buckets

Recently at $DAYJOB I've been lucky enough to start playing with Riak for a couple of projects that I thought it would be great for. So far I've been extremely impressed with Riak's speed, configurability, and overall ease of use, and I look forward to using it more in the future. For capacity planning reasons, I performed speed tests using Bash and Curl, Python and Urllib, and even moved on to using Python and Riak's protocol buffered client. (Also fixed a bug in the documentation, WINNING!). But one day the question came to me, (this shouldn't be a surprise to anyone whose been following this blog): "can I connect to Riak with Haskell?" The answer is "Yes!" you can use the Network.Riak library.

I went looking through the documentation for some quick examples to quickly get started, but I couldn't find any. I guess I'm so spoiled by all the examples within the Python documentation, I just assumed that would exist. So instead I just decided to make my own. I'll be doing a mini-series of posts showing how to do various things using Riak and Haskell. Let's get started.

I'm going to assume that you already have both the Haskell platform and the riak-haskell-client installed (cabal install riak). Also, Riak should be installed and running (basic configuration is fine) on your local machine. If you're not connectingto your local machine, there is a line commented out in the code that will show you how to connect to a remote node.

Let's copy a command from the Riak documentation so we have something to test with:

  1. $ curl -v -XPOST <a href="http://localhost:8098/buckets/test/keys/test_key" title="http://localhost:8098/buckets/test/keys/test_key">http://localhost:8098/buckets/test/keys/test_key</a> \
  2. -H 'Content-Type: text/plain' \
  3. -d 'this is a test'

You should see output similar to:

  1. * About to connect() to 127.0.0.1 port 8098 (#0)
  2. * Trying 127.0.0.1...
  3. * Adding handle: conn: 0xd1f3e0
  4. * Adding handle: send: 0
  5. * Adding handle: recv: 0
  6. * Curl_addHandleToPipeline: length: 1
  7. * - Conn 0 (0xd1f3e0) send_pipe: 1, recv_pipe: 0
  8. * Connected to 127.0.0.1 (127.0.0.1) port 8098 (#0)
  9. > POST /buckets/test/keys/test_key HTTP/1.1
  10. > User-Agent: curl/7.33.0
  11. > Host: 127.0.0.1:8098
  12. > Accept: */*
  13. > Content-Type: text/plain
  14. > Content-Length: 14
  15. >
  16. * upload completely sent off: 14 out of 14 bytes
  17. < HTTP/1.1 204 No Content
  18. < Vary: Accept-Encoding
  19. * Server MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) is not blacklisted
  20. < Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
  21. < Date: Sun, 08 Dec 2013 07:42:19 GMT
  22. < Content-Type: text/plain
  23. < Content-Length: 0
  24. <
  25. * Connection #0 to host 127.0.0.1 left intact

And with just a quick verification we know that there is data now in the server:

  1. curl <a href="http://127.0.0.1:8098/buckets/test/keys/test_key" title="http://127.0.0.1:8098/buckets/test/keys/test_key">http://127.0.0.1:8098/buckets/test/keys/test_key</a>
  2. this is a test

Now onto the Haskell part! Copy and paste the code below into a file, riak_get_buckets.hs:

  1. module Main where
  2.  
  3. import qualified Network.Riak as R
  4. import qualified Data.ByteString.Lazy as L
  5.  
  6. main :: IO ()
  7. main = do
  8. let client = R.defaultClient
  9. -- If you want to connect to a remote Riak server use this line
  10. -- let client = R.Client "FQDN_or_IP_address" "8087" (L.empty)
  11. con <- R.connect client
  12. buckets <- R.listBuckets con
  13. print buckets

Now run the code:

  1. $runhaskell riak_get_buckets.hs
  2. fromList ["test"]

Viola! You have now gotten a Seq of your buckets. I think the code is pretty easy to explain and should look familiar to anyone who's connected to a database through a programming language before. In the first line you create a connection to the server:

  1. con <- R.connect client

The next line is communicating with the server to get the buckets from the server:
  1. buckets <- R.listBuckets con

Finally, we print the response out:
  1. print buckets

In the next port or two I will show you how to "GET" and "POST" data into Riak using the Network.Riak library.

My First Yesod App

First off, I just wanted to say that I hope everyone had a relaxing and enjoyable holiday season and that you enjoyed your New Year's celebration. Whatever you did that day or night, don't name it after me.

In my last post, I showed you how to create a simple web service that responded to three different URLs and interacted with a database using Python and the Flask framework. Now I'm going to show you how to program the same thing in Haskell using the Yesod framework. For those of you too “efficient” to look them up on the previous page HERE, I'm going to repost the requirements:
Using python with gevent 0.13.x and your choice of additional libraries and/or frameworks, implement a single HTTP server with API endpoints that provide the following functionalities:

      A Fibonacci endpoint that accepts a number and returns the Fibonacci calculation for that number, and returns result in JSON format. example:

      1. $ curl -s 'http://127.0.0.1:8080/fib/13'
      2. {"response": 233}
      1. $ curl -s 'http://127.0.0.1:8080/fib/12'
      2. {"response": 144}
      An endpoint that fetches the Google homepage and returns the sha1 of the response message-body (HTTP body data).example:

      1. $ curl -s 'http://127.0.0.1:8080/google-body'
      2. {"response": "272cca559ffe719d20ac90adb9fc4e5716479e96"}
      Using some external storage of your choice (can be redis, memcache, sqlite, mysql, etc), provide a means to store and then retrieve a value.Example:

      1. $ curl -d 'value=something' 'http://127.0.0.1:8080/store'
      2. $ curl 'http://127.0.0.1:8080/store'
      3. {"response": "something"}</li>

Like the last post, I'm going to talk about the individual functions first, then post the whole code at the end. Let's start with the first requirement, creating a good old Fibonacci sequence:

  1. handleFibR :: Int -> Handler RepJson
  2. handleFibR num = jsonToRepJson $ object ["response" .= show_fib]
  3. where
  4. show_fib = show $ fib num
  5. fib :: Int -> Int
  6. fib 0 = 0
  7. fib 1 = 1
  8. fib n = fib (n - 1) + fib (n – 2)

I'm going to go ahead and describe the code from the bottom up - it's a little weird but it's a lot easier to explain that way, trust me. The show_fib function is just a simple function to sum the values created from the Fibonacci sequence. The result of that function is used as the “value” component of a Pair type that is created with the “.=” operator and the “response” string, and is contained within a list. The object function takes a list of Pairs as its input and creates a Value type, which is described in the documentation as “A JSON value represented as a Haskell value.” This Value is then passed as the input into the jsonToRepJson function. All of these functions come together beautifully so that when you point your browser to http://localhost:3000/fib/24, you get this response:
{"response":"46368"}

For my next trick, I'm going to pull a SHA1 hash out from the Google homepage source code.

  1. gGoogR :: Handler RepJson$
  2. getGoogR = do$
  3. body <- try (simpleHttp "http://www.google.com”)
  4. case body of$
  5. Left (SomeException ex) -> jsonToRepJson $ object [“response” .= (“ERROR: “ ++ (show ex))]
  6. Right val -> jsonToRepJson $ object [“response” .= (showDigest $ sha1 val)]

Much like the last function, this function will return a Handler containing a RepJson . First I use the simpleHttp function to travel to the interwebs and pull the Google homepage. Because simpleHttp will throw an HttpException with any non 200 status code, I have the function called within a try function, putting the result into “body”. Body is of the Either type, which means it can have one of two possible values (like Schrodinger's cat). If something went wrong, the value would be in the “Left” side of the Either type. If that's what happened, I don't really care what went wrong so I just return a generic error message. If everything flowed smoothly like all code does (snicker), the data would be on the “Right” side of Either, allowing me to pull the data out using the Right function and named val. The code after this point is extremely similar to the previous example, the difference being the output. The website source code is used as input for the sha1 function, creating a Digest type, then I carry that over to showDigest, which returns a string 160 characters long. All of this is bubbled up to the handler and the user sees:

{"response":"ddd27a244477532f7be5207582afca72b9f74224"}

Your results will differ! For dealing with the database, we need functions that can handle both GET and POST requests. Before I explain those functions, I want to take a quick moment to share the database schema and the “runDB” function:

  1. share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
  2. Stuff
  3. value Text
  4. deriving Show
  5. |]
  6.  
  7. runDB action = do
  8. Challenge pool <- getYesod
  9. runSqlPool action pool

If you are a little confused by these two things, don't fear. I will do my best to describe them in a moment. If that still doesn't help then maybe viewing the entire code base below will. The first code block above is responsible for creating the Stuff database which holds a single column, called “value”. It reminds me of user defined datatypes created with the “data” Haskell keyword.

The second block is really me cargo cult programming. I've seen this technique used in a lot of the examples of the Yesod book, so I copied it while I was writing this project. The best way I can describe it is as a wrapper function for using an item from a pool of database connectors, and using some of those connectors to to run the query.

Now that you know what the database looks like and how we access it, we can move onto the functions that interact with it. Here is the code for the POST request:

  1. postStoreR :: Handler ()
  2. postStoreR = do
  3. mvalue <- runInputPost $ ireq textField "value"
  4. runDB $ insert $ Stuff mvalue
  5. sendResponseStatus status200 ()

This function just returns a Handler unit. Using the ireq function, we look through the POST request for the expected input keyed as “value”. The output of that function goes through the runInputPost, and deposits the contents into mvalue. We take mvalue, change it to become a Stuff type, pass that to the insert function which, when it runs, returns an automatically created key. and then moving that along to runDB, which inserts our data into the database. The last line returns the 200 status back to the client, using the sendResponseStatus.

Finally, for the GET request we have:

  1. getStoreR :: Handler RepJson
  2. getStoreR = do
  3. mvalue <- runDB $ selectFirst [] [Desc StuffValue, LimitTo 1]
  4. case mvalue of
  5. Nothing -> jsonToRepJson $ object ["response" .= (show "NO DATA IN DATABASE")]
  6. Just mvalue' -> jsonToRepJson $ object ["response" .= (show . stuffValue $ entityVal mvalue')]

The result of the selectFirst function provides the input for runDB. The first argument for selectFirst is an empty list, this argument is for filtering on some kind of value( greater than, less than, not equal to, etc). I have left it blank because I really don't care what the value of “value” is; I just want it. The second list is telling the database to put the column values in descending order. The first line is the Haskell equivalent of the following SQL code:

SELECT * FROM Stuff GROUP BY VALUE DESC LIMIT 1;

The results of which are named mvalue. Since it's possible to have nothing in the response, I use the case statement to dig inside mvalue and look around. If “Nothing” was returned, I send back a little json blurb letting the user know that nothing was found, most likely because there isn't data in the database. If something was returned, pull that value out, and mix it all in the with json recipe you've seen me using thus far, and then send the data on its way.

As the title says, this was my first Yesod web app. I know that I have only scratched the surface of what this framework can do and I'm really interested in creating more with it. I will admit that I initially found the interaction with the database a little cumbersome when compared to Django or Flask. That doesn't mean I don't like it, it's just a little awkward when I was first trying to understand how to work with it. Once I got over those differences, I realized that it mentally translates to SQL better than the other frameworks. Again, I really like Yesod and look forward to using it in the future.

As always, I and my code welcome questions, comments, and the occasional funny and creative insult.

  1. {-# LANGUAGE TypeFamilies, QuasiQuotes, MultiParamTypeClasses, TemplateHaskell #-}
  2. {-# LANGUAGE GADTs,OverloadedStrings,FlexibleContexts, FlexibleInstances #-}
  3.  
  4. import Yesod as Y
  5. import Data.Text (pack, Text)
  6. import Network.HTTP.Conduit (simpleHttp)
  7. import Network.HTTP.Types (status200)
  8. import Data.Digest.Pure.SHA (showDigest, sha1)
  9. import Database.Persist.Sqlite
  10. import Data.Maybe
  11. import Control.Exception.Lifted hiding (Handler)
  12. import Data.ByteString.Lazy.Internal (ByteString)
  13.  
  14. share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
  15. Stuff
  16. value Text
  17. deriving Show
  18. |]
  19.  
  20. data Challenge = Challenge ConnectionPool
  21.  
  22. mkYesod "Challenge" [parseRoutes|
  23. /fib/#Int FibR
  24. /google-body GoogR GET
  25. /store StoreR POST GET
  26. |]
  27.  
  28. instance Yesod Challenge
  29.  
  30. instance RenderMessage Challenge FormMessage where
  31. renderMessage _ _ = defaultFormMessage
  32.  
  33. instance YesodPersist Challenge where
  34. type YesodPersistBackend Challenge = SqlPersist
  35.  
  36. runDB action = do
  37. Challenge pool <- getYesod
  38. runSqlPool action pool
  39.  
  40. handleFibR :: Int -> Handler RepJson
  41. handleFibR num = jsonToRepJson $ object ["response" .= show_fib]
  42. where
  43. show_fib = show $ fib num
  44. fib :: Int -> Int
  45. fib 0 = 0
  46. fib 1 = 1
  47. fib n = fib (n - 1) + fib (n - 2)
  48.  
  49. getGoogR :: Handler RepJson
  50. getGoogR = do
  51. body <- try (simpleHttp "http://www.google.com")
  52. case body of
  53. Left (SomeException ex) -> jsonToRepJson $ object ["response" .= ("ERROR: " ++ (show ex))]
  54. Right val -> jsonToRepJson $ object ["response" .= (showDigest $ sha1 val)]
  55.  
  56. postStoreR :: Handler ()
  57. postStoreR = do
  58. mvalue <- runInputPost $ ireq textField "value"
  59. runDB $ Y.insert $ Stuff mvalue
  60. sendResponseStatus status200 ()
  61.  
  62. getStoreR :: Handler RepJson
  63. getStoreR = do
  64. mvalue <- runDB $ Y.selectFirst [] [Y.Desc StuffValue, Y.LimitTo 1]
  65. case mvalue of
  66. Nothing -> jsonToRepJson $ object ["response" .= (show "NO DATA IN DATABASE")]
  67. Just mvalue' -> jsonToRepJson $ object ["response" .= (show . stuffValue $ Y.entityVal mvalue')]
  68.  
  69. main = withSqlitePool ":memory:" 10 $ \pool -> do
  70. runSqlPool (runMigration migrateAll) pool
  71. warpDebug 3000 $ Challenge pool

Project Euler: Problem 16

I'm not dead yet! I've just been insanely busy the last month or two with changing jobs and preparing my first programming presentation for BayPiggies and Silicon Valley Code Camp (which is a post for the near future). Both of these have kept me away from my blog. Let me make it up to you with a solution to project Euler problem #16.

The challenge is:

2^15 = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.
What is the sum of the digits of the number 21000?

Let's start with some Python code:

  1. #!/usr/bin/python
  2.  
  3. print sum([int(i) for i in str(2 ** 1000)])

For this solution, using a more functional approach definitely reduced the code base. But one thing I was a little surprised about is that having a list comprehension within the sum function is actually faster than a generator expression. Usually one hears how generator expressions are preferred over list comprehensions because they are more efficient with memory, among other reasons. However, it's actually faster to give sum a list. One quick caveat, this whole sum and list comprehension thing applies to Python 2. The same seems to be also be true for Python 3, at least from the interpreter:

  1. >>> import timeit
  2. >>> timeit.timeit("sum(int(x) for x in str(2 ** 1000))", number=1000)
  3. 0.11109958100132644
  4. >>> timeit.timeit("sum([int(x) for x in str(2 ** 1000)])", number=1000)
  5. 0.09597363900684286
  6. >>> timeit.timeit("sum(int(x) for x in str(2 ** 1000))", number=10000)
  7. 1.051396899012616
  8. >>> timeit.timeit("sum([int(x) for x in str(2 ** 1000)])", number=10000)
  9. 0.9054670640034601
  10. >>> timeit.timeit("sum(int(x) for x in str(2 ** 1000))", number=100000)
  11. 10.498383879996254
  12. >>> timeit.timeit("sum([int(x) for x in str(2 ** 1000)])", number=100000)
  13. 8.992312036993098

On to the Haskell code:

  1. module Main where
  2.  
  3. import Data.Char
  4.  
  5. main :: IO ()
  6. main = print . sum . map digitToInt . show $ 2 ^ 1000

Maybe it's just me and my Haskell/Python-centric brain, but I think the algorithm is simple enough to easily see the similarities and differences between the two languages. If I wanted to write the Haskell code to better match the Python code (syntactic differences aside), it would look like this: (inside the Haskell interpreter)

  1. Prelude Data.Char> print . sum $ [ digitToInt x | x <- show (2 ^ 1000)]

Even though this code may be easier to read for a Python programmer, it's not “good” Haskell code. It'll get the job done, but the map is obfuscated by the list comprehension. We can also adjust the Python code to make it resemble Haskell by using map:

print sum(map(int, str(2 ** 1000)))

But that might get you “dinged” because some people think that using map is “too functional” or “not Pythonic”, even if the code might be faster. I don't subscribe to that line of thinking...but that's a discussion for another time.

Times:
python – list comprehension : .032s
python – map : .030s
haskell – list ( interpreted) : .155s
haskell – map (interpreted) : .155s
haskell – list (compiled) : .006s
haskell – map (compiled) : .006s

As always, questions, comments, and complaints are encouraged. I hope everyone will forgive me for not posting for so long... sometimes life happens.

Project Euler: Problem 14

“The following iterative sequence is defined for the set of positive integers: n → n/2 (n is even) n → 3n + 1 (n is odd).  Using the rule above and starting with 13, we generate the following sequence: 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1.  It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms.  Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.  Which starting number, under one million, produces the longest chain?  NOTE: Once the chain starts the terms are allowed to go above one million.”

Not many of you may be aware of this, but about a year ago I wrote up a blog post that discussed Collatz chains in Haskell.  You can find that post here: . Having some of the code already written made coming up with the solution easier.  However, just because I had one function doesn't mean I had the whole problem licked.  I still had a fair amount of work in front of me.  Below is my code from the first attempt at a solution:

  1. module Main where
  2.  
  3. import Data.List
  4.  
  5. chain' :: Integer -> [Integer]
  6. chain' 1 = [1]
  7. chain' n
  8.    | n <= 0 = []
  9.    | even n = n : chain' (n `div` 2)
  10.    | odd n = n : chain' (n * 3 + 1)
  11.  
  12. main :: IO ()
  13. main = do
  14.     let seqx = map chain' [3..1000000]
  15.     let lengthx = map length seqx
  16.     print . maximum $ zip lengthx seqx

This code appears to be logically correct but was incredibly slow - so slow that after over 2 minutes it still hadn’t completed.  I admit I can be a little impatient with these things from time to time, but in this case something was obviously wrong.

I devised two optimizations:

  • Reverse the order of the list. I will be more likely to find the number with the longest chain near 1,000,000 than 3.
  • Use odd numbers only. This is based on the fact that in the chain' function an odd number gets multiplied right off the bat, whereas an even number is instantly divided by 2, and also on the assumption that a higher number will be more likely to have a longer chain.  (I admit this was a complete experiment - I had no proof that it would work ahead of time, and knew it gave me the right answer only after the fact.)

The code then morphed into:

  1. module Main where
  2.  
  3. import Data.List
  4.  
  5. chain' :: Integer -> [Integer]
  6. chain' 1 = [1]
  7. chain' n
  8.    | n <= 0 = []
  9.    | even n = n : chain' (n `div` 2)
  10.    | odd n = n : chain' (n * 3 + 1)
  11.  
  12. main :: IO ()
  13. main = do
  14.     let seqx = map chain' [999999,999997..3]
  15.     let lengthx = map length seqx
  16.     print . maximum $ zip lengthx seqx

The problem I ran into with this code was that I received stack overflow errors; my list of tuples holding another long list of int’s was taking up to much memory.  I fixed this problem by computing the length of the list immediately after generating it.  The new code looked like this:

  1. import Data.List
  2.  
  3. chain' :: Integer -> [Integer]
  4. chain' 1 = [1]
  5. chain' n
  6.    | n <= 0 = []
  7.    | even n = n : chain' (n `div` 2)
  8.    | odd n = n : chain' (n * 3 + 1)
  9.  
  10. main :: IO ()
  11. main = do
  12.     let seqx = map (\x → (x, length $ chain' x) [999999,999997..3]
  13.     print . maximum $ seqx

This got me a result within the one minute time frame, but it still wasn't the right answer.  Can you figure out why?  Using the great code Jedai posted in the comments of my Apache log post, I was able to get my answer and finally complete the problem:

  1. module Main where                                                                             
  2.  
  3.  import Data.Tuple
  4.  import Data.List (sortBy)
  5.  import Data.Function (on)
  6.  
  7.  chain' :: Integer -> [Integer]
  8.  chain' 1 = [1]
  9.  chain' n
  10.    | n <= 0 = []
  11.    | even n = n : chain' (n `div` 2)
  12.    | odd n = n : chain' (n * 3 + 1)
  13.  
  14.  main :: IO ()
  15.  main = do
  16.      let seqx = map (\x -> (x, length $ chain' x)) [999999,999997..3]
  17.      print . fst . head $ sortBy (flip compare `on` snd) seqx

After figuring that out, getting the python answer was a breeze:

  1. #!/usr/bin/python
  2. """Python solution for Project Euler problem #14."""
  3.  
  4. from itertools import imap
  5.  
  6. def sequence(number):
  7. t_num = number
  8. count = 1
  9.  
  10. while(t_num > 1):
  11. if t_num % 2 == 0:
  12. t_num /= 2
  13. else:
  14. t_num = (t_num * 3) + 1
  15.  
  16. count += 1
  17.  
  18. return (count, number)
  19.  
  20. if __name__ == "__main__":
  21. print max(imap(sequence, xrange(999999,3,-2)))

Here are the speed numbers:
Haskell (complied) : 14.758s
Python : 18.537s
Haskell (runghc): 15.217s

I think the use of recursion in my Haskell code is affecting its speed of computation.  As I learned from problem 12, I can use the State Monad again to speed things up.  But I also learned from the comments of problem 12 that some people were able to substitute a scan or fold in the State Monad’s place.  So I decided to shoot for one more solution.  After studying up on scan and fold, and finding that neither was really what I wanted, I found iterate. Using iterate I was able to change the program to this:

  1. module Main where
  2.  
  3. import Data.Tuple
  4. import Data.List (sortBy, iterate)
  5. import Data.Function (on)
  6.  
  7. chain' :: Integer -> Int
  8. chain' n  
  9.     | n < 1 = 0
  10.     | otherwise = 1 + (length $ (takeWhile ( > 1) $ iterate (\x -> if even x then x `div` 2 else x * 3 + 1) n))
  11.  
  12. main :: IO ()
  13. main = do
  14.     let seqx = map (\x -> (x, chain' x)) [999999,999997..3]
  15.     print . fst . head $ sortBy (flip compare `on` snd) seqx

The new chain' function doesn't read as cleanly as the old one, but it does remove the recursion I was talking about earlier.  The computer gods rewarded my efforts by reducing the run times to these:

Haskell (complied) : 10.933s
Haskell (runghc): 11.744s

From 14.758 to 10.933 - almost 4 seconds taken off the clock!  I think a speed up like that calls for some celebrating.  Which is exactly what I'm going to do before I start on problem 15.

If you made it this far down into the article, hopefully you liked it enough to share it with your friends. Thanks if you do, I appreciate it.

Bookmark and Share

Syndicate content