Pangrams


If you haven’t figured it out by now, I enjoy solving problems. And over the course of the last year or two, I’ve learned that interview questions make for great problems to work on. Actual interview problems are nice because they are usually quick, but have a quirk or two in there that makes them challenging, unlike simple questions like the Fizz Buzz problem that just checks if you have the most basic coding skills. (Has anyone actually been asked that question in an interview?)

The most recent problem I got to sink my teeth into (found it on a recruiting site, but not going to share where I got it; wouldn’t be fair to the company posting the problem) is for finding pangrams in sentences. If you don’t know what a pangram is Wikipedia defines them as, “a sentence using every letter of the alphabet at least once.” Yeah, I didn’t know what they were either until I started programming this little puzzle. Here is the code:

  1. module Main (main) where
  2.  
  3. import System (getArgs)
  4. import qualified Data.Set as S
  5. import qualified Data.Text as T
  6. import qualified Data.Text.IO as TI (readFile)
  7.  
  8. buildList :: FilePath -> IO [T.Text]
  9. buildList filename = TI.readFile filename >>=
  10. return . map (T.toLower . T.filter (/=' ')) . T.lines
  11.  
  12. compareAndPrint :: S.Set Char -> String
  13. compareAndPrint sset = if S.null result
  14. then "NULL"
  15. else S.toList result
  16. where result = S.difference (S.fromList ['a'..'z']) sset
  17.  
  18. main = do
  19. args <- getArgs
  20. sentences <- buildList $ head args
  21. mapM_ (putStrLn . compareAndPrint) $ map( S.fromList . T.unpack) sentences

I came up with the solution pretty quickly by using Sets. Having a set of the alphabet and finding the difference of the letters used in the sentence makes the problem almost trivial. The hard part for me was figuring out how to filter out the spaces and change all characters to lower case in the buildList function. I eventually figured it out, but it took some head against wall action to get it right.

This is going to be my last post for this year. I would like to wish you all Happy Holidays and a Happy New Year. Thank for reading and see you again in 2012. I would also like to thank everyone from planet.haskell.org who decided to read this. Welcome!

Axio, thank you for

Axio, thank you for clarifying your position. I agree with you that agf's answer does cover up the complexity of the algorithm. There is a point though that if a candidate do pull that little code snippet out that a part of me would be impressed because it does display a stronger knowledge of python. Maybe, like you implied, it might be best to use that code snippet, then follow through with some explanation (before the interviewer even has a chance to ask) about the underlying algorithm and thus able to show both a strong knowledge of the language as well as knowledge about the underlying algo. That'r my opinion anyway. ;)

Context & Complexity

Certainly, which solution you use depends on the context -- if you're in an interview and they want you to write an algorithm, an answer like this could be considered snide.

However, in practice, a solution that is hard to screw up, is easily readable, and requires little code, is often a good solution. Anyone with knowledge of sets should be able to read it without trouble.

If this step turns out to be time critical, or this is for an interview, knowing the time complexity is indeed important. For CPython, the subset test is implemented in "setobject.c" -- see http://hg.python.org/cpython/file/082eea3fc9bc/Objects/setobject.c#l1747.

The algorithm is O(n), linear in the length of the set, with regard to set membership tests. As you can see at http://wiki.python.org/moin/TimeComplexity#set each set membership test is O(1) average case and O(n) worst case. The algorithm also requires that the string you're testing against be converted to a set, which is O(k), linear with the length of the string.

This makes the complexity of the subset test O(n + k) average case and O(n^2) worst case. In an interview, knowing it would be linear time average case and quadratic time worst case (and why) would probably be plenty. Knowing the basics of how sets and dictionaries work in Python should be enough to determine that.

As all of the items in the set are single characters, I'm sure that the hash implementation is such that you will actually get O(1) membership tests. Furthermore, n is actually a known constant, 26, since the set contains only the lowercase ASCII characters. Combine this with the algorithm short circuiting if the the set created from the string isn't long enough to be a pangram (the O(n) step doesn't take place when k < n) and the subset test is O(k) in this particular case.

It scales linearly with the length of the string to be tested.

It's actually a one-liner in Haskell

import qualified Data.Set as S
import Data.Char
import Data.Function

pangram = (S.isSubsetOf `on` (fromList . map toLower)) ['a'..'z']

Or even using only the Prelude

  1. pangram = ($['a'..'z']) . all . flip elem

Hussain, That makes my code

Hussain,

That makes my code look large and clunky. Thank you for sharing and to help me understand more of the Haskell language and how I could use it more efficiently. I will definitely be keeping this in my little recipe book.

Different approach

I use a different approach.
Store the ascii code of alphabetic chars (only) in a table, and then check for the size of this table.
That should be 26 if the whole alphabet was in the sentence…
Using the integer value of letters allows for some quick validity checks.

  1. (define sentence "The quick brown fox jumps over the lazy dog.")
  2.  
  3. (define (pangram-p str)
  4. (= 26
  5. (table-length
  6. (fold-left
  7. (lambda (h e)
  8. (let ((ord (char->integer e)))
  9. (cond
  10. ((and (>= ord 65) (<= ord 90))
  11. (table-set! h ord #t))
  12. ((and (>= ord 97) (<= ord 122))
  13. (table-set! h (- ord 32) #t))
  14. (else #t))
  15. h))
  16. (make-table)
  17. (string->list str)))))
  18. <python>

Axio, what language is this?

Axio, what language is this?

That's Scheme. Gambit-C

That's Scheme. Gambit-C Scheme, to be more precise (with extra fold-left, since it is not there by default)

Python module to check if a string is a pangram

  1. from string import ascii_lowercase
  2.  
  3. letters = set(ascii_lowercase)
  4.  
  5. def pangram(sentence):
  6. return letters.issubset(sentence.lower())

Jeezzz... I'm absolutely

Jeezzz... I'm absolutely speechless. That is an absolutely beautiful reduction of my code. Thank you for sharing that, I will definitely remember that little nugget.

Happy New Year agf!

Happy New Year ...

... to you as well.

Hum,

I doubt this is a valid interview answer, though…

Axio, why do you think this

Axio, why do you think this is not a valid interview answer?

Because I think interviews

Because I think interviews questions are more about the algorithm you use for solving, than about using the tool that does it for you, because the tool often hides the computational complexity (and its implementation may even be inefficient!)

Of course, you could (and should?) say "and it can be done by using the builtin FOO and BAR" but I believe it is more important to show that you know how to solve the problem without relying on a specific brand/language.

On the other hand, I have no grudge against using this kind of solutions in practice, as it may be sufficiently efficient, and easy to read/understand.

If you made it this far down into the article, hopefully you liked it enough to share it with your friends. Thanks if you do, I appreciate it.

Bookmark and Share