Google encypted searching


That “series of tubes” known as the Internet went nuts two weeks ago because Google now allows you to use their search engine through https. I consider this a good thing; the more traffic that flows around encrypted the better. Yeah, there is some overhead to it, but everything has a price and I feel the price is well worth paying.

Of course there are a slew of blogs out there telling Firefox users how to use this feature. Saying pretty much the same thing: install a search engine plugin (of which there are 10 on Add-ons for Firefox) and you're good to go. But there is a small problem; I don’t use the search box. I type my searches in the address field and the one plugin that I did try didn’t change this - I was still searching unencrypted. So like a good geek I went searching for an answer and found it. It's a little preference called keyword.URL

All you have to do is:

  • (in a new window or tab) open up about:config
  • Promise to be really careful ;)
  • Type “keyword” in the text field at the top and you’ll get back two entries:
    1. keyword.URL
    2. keyword.enabled
  • You’ll notice the default of keyword.URL as :
  • http://www.google.com/search?ie=UTF-8&oe=UTF-8&sourceid=navclient&gfns=1&q=

    And we’ll just make a small change of adding an “s” to the end of http:

    https://www.google.com/search?ie=UTF-8&oe=UTF-8&sourceid=navclient&gfns=1&q=

  • verify that “keyword.enabled” is true, otherwise this little hack won’t work.

And that is all. Now the next time you use the address bar to search, you to will be searching through https. One note though for vimperator users - this trick unfortunately doesn’t work (yeah, I’m bummed about that too). I’m not sure why and I hope to figure that out next, but for now we’ll just have to wait a little longer.

Websites by Email

I'd just like to share with you these forty-seven lines of python code that amaze me. Not because they do anything special, but because they represent something that is pretty important to programming languages, modules, or libraries (depending on the language). Let's take a quick gander at the code:

  1. #!/usr/bin/python
  2. """
  3. #
  4. # This is a simple script to grab the contents of a a website,
  5. # encoded into a MIME message, and email it
  6. """
  7. import sys
  8. import smtplib
  9. import pycurl
  10. from cStringIO import StringIO
  11. from email.mime.text import MIMEText
  12. from email.mime.multipart import MIMEMultipart
  13.  
  14. data_buf = StringIO()
  15. curl = pycurl.Curl()
  16.  
  17. # Setup pycurl to grab the data
  18. curl.setopt(pycurl.URL, sys.argv[1])
  19. curl.setopt(pycurl.WRITEFUNCTION, data_buf.write )
  20. curl.perform()
  21. curl.close()
  22.  
  23. # Begin creating email
  24. #create html & text parts of the email
  25. part1 = MIMEText(data_buf.getvalue(), 'html')
  26. part2 = MIMEText(data_buf.getvalue(), 'text')
  27.  
  28. # next 5 lines put it all together
  29. msg = MIMEMultipart('alternative')
  30. msg['Subject'] = 'Website by Email'
  31.  
  32. msg.attach(part1)
  33. msg.attach(part2)
  34.  
  35. # email away
  36. # this code was copied from:
  37. # <a href="http://www.mkyong.com/python/how-do-send-email-in-python-via-smtplib/<br />
  38. #" title="http://www.mkyong.com/python/how-do-send-email-in-python-via-smtplib/<br />
  39. #">http://www.mkyong.com/python/how-do-send-email-in-python-via-smtplib/<br />
  40. #</a> Geshi keeps adding extra lines here, I'm not sure why
  41. to = 'add_your_own'
  42. gmail_user = 'add_your_own'
  43. gmail_pwd = 'add_your_own'
  44. smtpserver = smtplib.SMTP("smtp.gmail.com", 587)
  45. smtpserver.ehlo()
  46. smtpserver.starttls()
  47. smtpserver.login(gmail_user, gmail_pwd)
  48. smtpserver.sendmail(gmail_user, to, msg.as_string())
  49. print 'done!'
  50. smtpserver.quit()

This little script started out as a work assignment (which I have modified to make it more general). The task was to grab the contents of a particular website and email them. Not terribly complicated, but also not one I was thinking would be accomplished with twenty-eight lines of actual code.

Let's take a second to review which modules are being used above:

The code used by PycURL to grab the website, and deposit the data into the string.

StringIO module, to hold the contents of the website. (I wasn't able to get a regular string to work here. If you know how, please tell me.)

The MIME module for reformatting the website contents.

The SMTPlib module, for having all the code in it to properly communicate with a SMTP server, including TLS for encryption, to send said email through the web.

It wasn't until doing this project that I fully realized the importance of modules. It was these four modules that saved me uncountable hours of coding, testing, and debugging. Even if I had written out the functionality I needed from scratch, what I would have written would not have anywhere near the functionality that these other modules provided. In correlation to this, I wonder if there is some kind of connection between how popular a lanugage is and how easily extendable it is. I have no way to prove this of course, but both Perl & Python could be good examples (and also happen to be the languages I'm most familiar with). Both languages are popular, as shown by the normalized graph by langpop.com, and both are also extremely easy to add extra functionality to. The PycURL, CLyther, and PyCUDA modules are great cases in point. PycURL allows Python to tap into the CURL library. CLyther allows Python to use OpenCL, and PyCUDA allows Python to access the CUDA libraries. It makes my head hurt just to think about the amount of code that it would take to perform these same functions, if written from scratch.

After having this moment of realization, I find I am extremely grateful to all the programmers out there who put in their time to help create modules like these and to help them perform as well as they do. I tip my hat at you all.

Getting all installed software, and their versions, on Debian/Ubuntu pt 2: Now with Threads!

Even though Poisonbit's solution to the original problem is the fastest (Thanks again PoisenBit!) I decided to use the original code as an excuse to learn multi-threading programming in PERL and see if it might improve the performance of the original code. One of those "for shits and giggles" moments.

First off, here is the new and improved code:

  1. #!/usr/bin/perl
  2. #######################################################################
  3. # Created By: Bryce Verdier
  4. # on 4/14/10
  5. #
  6. # Function: grab all installed packages, using threads
  7. # find their exact versions, and display them
  8. # NOTE: FOR USE ON DEBIAN BASED MACHINES
  9. #######################################################################
  10.  
  11. use threads;
  12.  
  13. my $temp_pack;
  14. my $temp_ver;
  15. my $returned_version;
  16. my %pack_hash :shared;
  17. my $thread_count = 0;
  18. my $pack_count;
  19. my @return = `dpkg --get-selections`;
  20.  
  21. sub get_package_ver
  22. {
  23. my %args = @_;
  24. my $temp_ver;
  25. my $returned_ver = `dpkg -s $args{package}`;
  26.  
  27. $returned_ver =~ m/^Version: (.+)$/m;
  28.  
  29. $temp_ver = $1;
  30.  
  31. lock($args{hash});
  32. $args{hash}{$args{package}} = $temp_ver;
  33. }
  34.  
  35. foreach (@return)
  36. {
  37. $_ =~ m/^(\S+)[ \t].*/;
  38.  
  39. $_ = $1;
  40. }
  41.  
  42.  
  43. $pack_count = @return;
  44. while ( $pack_count - $thread_count >= 1)
  45. {
  46. my $th1 = threads->create(\&get_package_ver, hash => \%pack_hash,
  47. package => $return[$thread_count]);
  48. my $th2 = threads->create(\&get_package_ver, hash => \%pack_hash,
  49. package => $return[$thread_count+1]);
  50.  
  51. $th1->join();
  52. $th2->join();
  53.  
  54. $thread_count = $thread_count + 2;
  55. }
  56.  
  57. # Get the odd package, if there is one
  58. if ($pack_count - $thread_count == 1)
  59. {
  60. my $th1 = threads->create(\&get_package_ver, hash => \%pack_hash,
  61. package => $return[$thread_count]);
  62.  
  63. $th1->join();
  64.  
  65. $thread_count++;
  66. }
  67.  
  68. while((my $key, my $value) = each(%pack_hash))
  69. {
  70. print "$key : $value\n";
  71. }

For all the number crunchers out there, putting things into two threads reduced the program execution time by more than 1 minute. For a program that took two and a half seconds to complete, a reduction to one and a half seconds is pretty significant.Well, in my book anyway.

After first following the suggestions of Sam for the regexes, (Thanks again Sam!) I pulled out the version checking code into its own function so that each thread would have a very specific thing to do. After that I realized I needed to clean up the package names that were being sent to the threads, creating the foreach loop on line 35. Within the foreach loop I tried something that I didn't expect to work - the line:

$_ = $1;

There isn't a reason for the line above to not work, but in the PERL code I've seen "$_" is not being used as a pointer to write data to, only to retrieve data from. And I guess that is why I did not expect it to work. But then, that's why I'm doing this - to learn things. :-D

After these changes and adding the threads code, I was almost done. For some reason the data from each thread wasn't getting stored into the hash. It wasn't until I looked a little deeper at the examples on perldoc that I saw what I needed to do. In the section "Shared And Unshared Data" I noticed I needed to mark %pack_hash as shared, so all the threads could access it. Which I did like so:

my %pack_hash :shared;

All in all, my first multi-threading coding expirence in perl wasn't bad. Granted the program isn't complicated, but this was truely new territory for me. I haven't tried doing any kind of multi-process/multi-threading programming since my operating systems class almost 3 years ago, so there were some battles to fight in my head on how to modify things to work with multiple threads. But again, it was a good experience. And I'm going to reiterate this so everyone remembers: don't use this code in production. Poisonbit's solution is MUCH faster than mine. Like me, use this code for learning.

Getting all installed software, and their versions, on Debian/Ubuntu

AAAHHH work! Everyone has those horror stories where your boss, or the client, comes and asks for some horrible feature that will require an entire rewrite of the program. Fortunately, that hasn't happened to me (yet) and that is not what this blog entry is about. It's about the even more (what I believe to be) unlikely scenario when someone wants a feature that they think will be difficult to create and after some research you implement that feature in a small amount of time. It's a rare event and great confidence booster when it does happen though.

A co-worker wanted a feature added to a project. He thought that it might take a while to complete so he talked to me first about it to “plant the seed” and get my brain started on figuring out a solution, not expecting a quick turnaround. Of course, as the title hints at, the problem was to find out all the packages installed on our Debian boxes as well as their version numbers. So after a little bit of googleing and man page reading I had a basic algorithm to build on. Twenty minutes of coding and testing later, I had this script:

  1. #!/usr/bin/perl
  2. #######################################################################
  3. # Created By: Bryce Verdier
  4. # on 4/14/10
  5. #
  6. # Function: grab all installed packages, find their exact
  7. # versions, and display them
  8. # NOTE: FOR USE ON DEBIAN BASED MACHINES
  9. #######################################################################
  10.  
  11. my $temp_pack;
  12. my $temp_ver;
  13. my $returned_version;
  14. my %pack_hash;
  15. my @return = `dpkg --get-selections`;
  16.  
  17. foreach (@return)
  18. {
  19. $_ =~ m/(\S*)[ \t].*/i;
  20.  
  21. $temp_pack = $1;
  22.  
  23. $returned_version = `dpkg -s $temp_pack`;
  24.  
  25. $returned_version =~ m/Version: (.*)/i;
  26.  
  27. $temp_ver = $1;
  28.  
  29. $pack_hash{$temp_pack} = $temp_ver;
  30.  
  31. }
  32.  
  33. while((my $key, my $value) = each(%pack_hash))
  34. {
  35. print "$key : $value\n";
  36. }

I will admit that is a little slow (on my desktop it takes around two and a half minutes to complete) and could probably benefit from some parallelization. However, that might be over-engineering for such a simple task. I'll code that feature up next, grab a stopwatch, and test it just to find out. Anybody gonna place bets one way or another? In the meantime, I'm proud of my use of regex's in this script, the "\S" removes the trailing whitespace from the package names, instead of just using ".*", which should speed things up a bit because I don't have to call chomp on each package name. Also using a hash for storage simplifies the data management and should allow for an easier time porting the code into a larger script later.

Syndicate content