The Devil is in the Details, part 2

In the last blog post I made with this title, I told a little story on how I helped a learning Haskell programmer see why his version of the code wasn't as complete as the example given. I also thought that this series wouldn't be having a second part to it, but it looks like I was wrong.

I've recently been given a copy of “Learn You a Haskell For Great Good” to write up a review for it. I've been trying to go through that book as fast and thoroughly as I can because I really want to learn the material and because I will be able to give a better review if I actually complete the book. Yeah, big shocker there, I know.

Just a quick personal note before I continue. I'm not trying to point out these short coming in code that I see to please my ego. I'm doing this because I see a pattern that I don't think is correct and I don't only want to prevent this problem from happening within my own code, but also to bring the idea to public forefront so that other people can see what is going on to and (hopefully) learn from it.

In “Learn You a Haskell”, more specifically Chapter 5 – High Order Functions, the book walks someone through how to construct a Collatz chain. The code in the book is:

  1. chain :: Integer[Integer]
  2. chain 1 = [1]
  3. chain n
  4. | even n = n : chain ( n ` div` 2)
  5. | odd n = n : chain ( n * 3 + 1)

and as an example from the book as well, when you run the above code you get this as a result:

  1. ghci > chain 10
  2. [10,5,16,8,4,2,1]

This is the right answer, but much like the code in the last blog post, what happens if you give it a negative argument, let's say -10:

  1. ghci> chain (-10)
  2. [-10,-5,-14,-7,-20,-10,-5,-14,-7,-20,-10,...] #(I'm sure you can see the pattern at this point).

So I propose to make a small change to fix this problem. Here is my version of chain:

  1. chain :: Integer -> [Integer]
  2. chain' 1 = [1]
  3. chain' n
  4. | n <= 0 = []
  5. | even n = n : chain' (n `div` 2)
  6. | odd n = n : chain' ( n * 3 + 1)

At the end of the day, I'm just making the case that when I, or anyone really, programs something up, a quick thought should be given towards what type of input verification should be done before the function/application/whatever runs so that it runs cleanly and correctly. Ok, time to get off my soap box and start looking for these same holes on the programs I've written.

A Comment about Form

Wouldn't it be more logical to place the definition of chain 1 = [1] right after the out of bounds input verification?

chain :: Integer -> [Integer]
chain' n
| n <= 0 = []
| n == 1 = [1]
| even n = n : chain' (n `div` 2)
| odd n = n : chain' ( n * 3 + 1)

I just think that looks more elegant.

Hey Avery, I definitely feel

Hey Avery,

I definitely feel that the whole guards vs. pattern matching "thang" boils down to coding style. Some people prefer pattern matching (my version), and other people prefer guards (your version). At the end of the day, all Haskell programmers should be able to read both, and choose for themselves which one they prefer to use when they write code.

One thing that was bugging me

One thing that was bugging me was the continued checking for n<=0 through the recursion process. I think we can agree intuitively that no matter what positive integer >0 we have we will always get a positive integer greater than 0. So I've rewritten the program.

  1. chain :: Integer -> [Integer]
  2. chain' n
  3. | n <= 0 = []
  4. | otherwise = chain_recursive' n
  5.  
  6. chain_recursive :: Interger -> [Integer]
  7. | n == 1 = [1]
  8. | even n = n : chain_recursive' (n `div` 2)
  9. | odd n = n : chain_recursive' ( n * 3 + 1)

oops missed line

oops missed line "chain_recursive' n" above.

So I was curious to see how

So I was curious to see how much of an improvement the change you made would have on the system. So I put together 2 scripts which looked like so (using my chain code because it's smaller and conveys the idea):

  1. module Main where
  2.  
  3. chain :: Integer -> [Integer]
  4. chain 1 = [1]
  5. chain n
  6. | even n = n : chain ( n ` div` 2)
  7. | odd n = n : chain ( n * 3 + 1)
  8.  
  9. main :: IO ()
  10. main = do
  11. let x = map (chain) [1..10000]

I compiled in ghc 7.0.3, with -rtsopts, ran both and this is what I got:

chain 2 (Ivan's version)
477,519,704 bytes allocated in the heap
302,848 bytes copied during GC
28,056 bytes maximum residency (1 sample(s))
20,576 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 910 collections, 0 parallel, 0.01s, 0.01s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed

INIT time 0.00s ( 0.00s elapsed)
MUT time 0.48s ( 1.73s elapsed)
GC time 0.01s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.49s ( 1.75s elapsed)

%GC time 2.2% (0.7% elapsed)

Alloc rate 990,256,904 bytes per MUT second

Productivity 97.7% of total user, 27.6% of total elapsed

chain 1 (my version)
477,185,208 bytes allocated in the heap
301,696 bytes copied during GC
28,056 bytes maximum residency (1 sample(s))
20,576 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 909 collections, 0 parallel, 0.01s, 0.01s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed

INIT time 0.00s ( 0.00s elapsed)
MUT time 0.49s ( 1.76s elapsed)
GC time 0.01s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.50s ( 1.78s elapsed)

%GC time 2.2% (0.7% elapsed)

Alloc rate 972,186,709 bytes per MUT second

Productivity 97.7% of total user, 27.6% of total elapsed

Thought you might enjoy that. :)

Negative Collatz

Collatz is frequently extended to the negative integers; see http://mathworld.wolfram.com/CollatzProblem.html

Either only accepting positive natural numbers or stopping when the number closest to zero of a cycle is reached (1, 0, -1, -5, -17) would be a "correct" answer as I see it.

Euler

Also, there was just a C/Python/Erlang/Haskell Project Euler #12 speed comparison question on Stack Overflow, http://stackoverflow.com/questions/6964392/speed-comparison-c-vs-python-vs-erlang-vs-haskell; I pointed them at your blog.

agf, A personal double thanks

agf,

A personal double thanks is in order. Thank you VERY much for posting my blog up on stackoverflow. If you live in the area or ever visit it, let me know I'll buy you a beer.

Secondly, thank you for pointing me to the documentation on how Collatz chains are suppose to function with negative numbers.

Collatz Chains

Hey mate,

For this example the key issue is not the lack of coverage of the integer space. Take a look at the definition of a Collatz Chain on Wikipedia (http://en.wikipedia.org/wiki/Collatz_conjecture):

  • The Collatz conjecture is a conjecture in mathematics named after Lothar Collatz, who first proposed it in 1937. The conjecture is also known as the 3n + 1 conjecture, the Ulam conjecture (after Stanisław Ulam), Kakutani's problem (after Shizuo Kakutani), the Thwaites conjecture (after Sir Bryan Thwaites), Hasse's algorithm (after Helmut Hasse), or the Syracuse problem; the sequence of numbers involved is referred to as the hailstone sequence or hailstone numbers, or as wondrous numbers.Take any natural number n. If n is even, divide it by 2 to get n / 2, if n is odd multiply it by 3 and add 1 to obtain 3n + 1. Repeat the process (which has been called "Half Or Triple Plus One", or HOTPO) indefinitely. The conjecture is that no matter what number you start with, you will always eventually reach 1. The property has also been called oneness.

I've emphasised the key point. Natural Numbers (http://en.wikipedia.org/wiki/Natural_number)

  • In mathematics, natural numbers are the ordinary counting numbers 1, 2, 3, ... (sometimes zero is also included). There is no universal agreement about which set of numbers is designated by the term "natural numbers": some use it to designate the positive integers {1, 2, 3, ...}, others include the number 0, so that the term designates the non-negative integers {0, 1, 2, 3, ...}. The former definition is the traditional one, the use of the latter definition appears first in the 19th century. Some authors use the term natural number to exclude zero and whole number to include it; others use whole number in a way that excludes zero, or in a way that includes both zero and the negative integers.

Pretty much however you look at it, Natural Numbers don't include negatives. As a result, I prefer the Author's version of the Collatz chain function as it actually results in an error if you pass in a negative.

The devil is indeed in the detail ;)

Just my $0.02 :)

PS. It'd be really nice to be able to add blockquotes and links in here ;)

Wow, I'm stupid!

Time to eat some humble pie!

His solution isn't better, for some reason I thought he had an error condition. I am totally wrong, for that I apologise and revoke my statement. Your solution is indeed better as it doesn't result in an infinite list.

Thanks :)
OJ

Damn Dude, I think you did

Damn Dude,

I think you did more research on this problem than the book's author. And I'll admit that I sure as hell didn't put in this much effort into the function.

All things considered, the title of the blog this blog post in particular, I think in the future I should be doing more research like this, as you pointed out "the devil is in the details." :P

If you made it this far down into the article, hopefully you liked it enough to share it with your friends. Thanks if you do, I appreciate it.

Bookmark and Share