Getting all installed software, and their versions, on Debian/Ubuntu pt 2: Now with Threads!

Even though Poisonbit's solution to the original problem is the fastest (Thanks again PoisenBit!) I decided to use the original code as an excuse to learn multi-threading programming in PERL and see if it might improve the performance of the original code. One of those "for shits and giggles" moments.

First off, here is the new and improved code:

  1. #!/usr/bin/perl
  2. #######################################################################
  3. # Created By: Bryce Verdier
  4. # on 4/14/10
  5. #
  6. # Function: grab all installed packages, using threads
  7. # find their exact versions, and display them
  8. # NOTE: FOR USE ON DEBIAN BASED MACHINES
  9. #######################################################################
  10.  
  11. use threads;
  12.  
  13. my $temp_pack;
  14. my $temp_ver;
  15. my $returned_version;
  16. my %pack_hash :shared;
  17. my $thread_count = 0;
  18. my $pack_count;
  19. my @return = `dpkg --get-selections`;
  20.  
  21. sub get_package_ver
  22. {
  23. my %args = @_;
  24. my $temp_ver;
  25. my $returned_ver = `dpkg -s $args{package}`;
  26.  
  27. $returned_ver =~ m/^Version: (.+)$/m;
  28.  
  29. $temp_ver = $1;
  30.  
  31. lock($args{hash});
  32. $args{hash}{$args{package}} = $temp_ver;
  33. }
  34.  
  35. foreach (@return)
  36. {
  37. $_ =~ m/^(\S+)[ \t].*/;
  38.  
  39. $_ = $1;
  40. }
  41.  
  42.  
  43. $pack_count = @return;
  44. while ( $pack_count - $thread_count >= 1)
  45. {
  46. my $th1 = threads->create(\&get_package_ver, hash => \%pack_hash,
  47. package => $return[$thread_count]);
  48. my $th2 = threads->create(\&get_package_ver, hash => \%pack_hash,
  49. package => $return[$thread_count+1]);
  50.  
  51. $th1->join();
  52. $th2->join();
  53.  
  54. $thread_count = $thread_count + 2;
  55. }
  56.  
  57. # Get the odd package, if there is one
  58. if ($pack_count - $thread_count == 1)
  59. {
  60. my $th1 = threads->create(\&get_package_ver, hash => \%pack_hash,
  61. package => $return[$thread_count]);
  62.  
  63. $th1->join();
  64.  
  65. $thread_count++;
  66. }
  67.  
  68. while((my $key, my $value) = each(%pack_hash))
  69. {
  70. print "$key : $value\n";
  71. }

For all the number crunchers out there, putting things into two threads reduced the program execution time by more than 1 minute. For a program that took two and a half seconds to complete, a reduction to one and a half seconds is pretty significant.Well, in my book anyway.

After first following the suggestions of Sam for the regexes, (Thanks again Sam!) I pulled out the version checking code into its own function so that each thread would have a very specific thing to do. After that I realized I needed to clean up the package names that were being sent to the threads, creating the foreach loop on line 35. Within the foreach loop I tried something that I didn't expect to work - the line:

$_ = $1;

There isn't a reason for the line above to not work, but in the PERL code I've seen "$_" is not being used as a pointer to write data to, only to retrieve data from. And I guess that is why I did not expect it to work. But then, that's why I'm doing this - to learn things. :-D

After these changes and adding the threads code, I was almost done. For some reason the data from each thread wasn't getting stored into the hash. It wasn't until I looked a little deeper at the examples on perldoc that I saw what I needed to do. In the section "Shared And Unshared Data" I noticed I needed to mark %pack_hash as shared, so all the threads could access it. Which I did like so:

my %pack_hash :shared;

All in all, my first multi-threading coding expirence in perl wasn't bad. Granted the program isn't complicated, but this was truely new territory for me. I haven't tried doing any kind of multi-process/multi-threading programming since my operating systems class almost 3 years ago, so there were some battles to fight in my head on how to modify things to work with multiple threads. But again, it was a good experience. And I'm going to reiterate this so everyone remembers: don't use this code in production. Poisonbit's solution is MUCH faster than mine. Like me, use this code for learning.

Assigning to $_

Hi again,

Assigning to $_ is generally considered an unsafe thing to do unless you also localize $_:


local $_ = 'whatever';

In your example, doing that will possibly (I confess I haven't tested) break the real effect you're seeing, which is that the loop variable in a foreach statement is actually an alias to the original item rather than a "normal variable".

You're probably better off explicitly naming your loop variable as a lexcical:


foreach my $line ( @return )
{
$line =~ /^(\S+)[ \t]/;
$line = $1;
}

This makes it clearer that you're assigning back to the original array entry, rather than setting $_ with the intent of later using its "magic default argument" behaviour - which is the usual reason for assigning to $_.

If you wanted to be really clear with your intent, you'd probably want to rewrite it as a map statement instead of a foreach:


@return = map { /^(\S+)[ \t]/; } @return;

This also has the side effect of filtering out lines that don't match.

That'll teach me not to

That'll teach me not to preview... the map example doesn't need the ; in the block.

Although it won't break anything by keeping it in, it's generally considered less visually-noisy to leave it out so that the only ; is on the end of the line.

Hey Sam, Thanks for the last

Hey Sam,

Thanks for the last two comments and for also pointing out the reason why one shouldn't $_ as a write to variable. I greatly appreciate the insight as well as the alternative to use a map.

Thanks again for the lesson.

If you made it this far down into the article, hopefully you liked it enough to share it with your friends. Thanks if you do, I appreciate it.

Bookmark and Share