Getting all installed software, and their versions, on Debian/Ubuntu

AAAHHH work! Everyone has those horror stories where your boss, or the client, comes and asks for some horrible feature that will require an entire rewrite of the program. Fortunately, that hasn't happened to me (yet) and that is not what this blog entry is about. It's about the even more (what I believe to be) unlikely scenario when someone wants a feature that they think will be difficult to create and after some research you implement that feature in a small amount of time. It's a rare event and great confidence booster when it does happen though.

A co-worker wanted a feature added to a project. He thought that it might take a while to complete so he talked to me first about it to “plant the seed” and get my brain started on figuring out a solution, not expecting a quick turnaround. Of course, as the title hints at, the problem was to find out all the packages installed on our Debian boxes as well as their version numbers. So after a little bit of googleing and man page reading I had a basic algorithm to build on. Twenty minutes of coding and testing later, I had this script:

  1. #!/usr/bin/perl
  2. #######################################################################
  3. # Created By: Bryce Verdier
  4. # on 4/14/10
  5. #
  6. # Function: grab all installed packages, find their exact
  7. # versions, and display them
  8. # NOTE: FOR USE ON DEBIAN BASED MACHINES
  9. #######################################################################
  10.  
  11. my $temp_pack;
  12. my $temp_ver;
  13. my $returned_version;
  14. my %pack_hash;
  15. my @return = `dpkg --get-selections`;
  16.  
  17. foreach (@return)
  18. {
  19. $_ =~ m/(\S*)[ \t].*/i;
  20.  
  21. $temp_pack = $1;
  22.  
  23. $returned_version = `dpkg -s $temp_pack`;
  24.  
  25. $returned_version =~ m/Version: (.*)/i;
  26.  
  27. $temp_ver = $1;
  28.  
  29. $pack_hash{$temp_pack} = $temp_ver;
  30.  
  31. }
  32.  
  33. while((my $key, my $value) = each(%pack_hash))
  34. {
  35. print "$key : $value\n";
  36. }

I will admit that is a little slow (on my desktop it takes around two and a half minutes to complete) and could probably benefit from some parallelization. However, that might be over-engineering for such a simple task. I'll code that feature up next, grab a stopwatch, and test it just to find out. Anybody gonna place bets one way or another? In the meantime, I'm proud of my use of regex's in this script, the "\S" removes the trailing whitespace from the package names, instead of just using ".*", which should speed things up a bit because I don't have to call chomp on each package name. Also using a hash for storage simplifies the data management and should allow for an easier time porting the code into a larger script later.

Always anchor your regexps...

Other people have made good suggestions that work around the issue of running commands multiple times and getting the info from a single command, but I thought I'd point out some problems in the regexps too:


$_ =~ m/(\S*)[ \t].*/i;

You're matching from the start of the line, so anchor with ^.

You also don't care what comes after the space or tab, so don't bother matching it, nothing you are matching is dependent on case, so lose the /i:


$_ =~ m/^(\S*)/;

$_ is default target, return of // gives the grouped items, you always want to match _something_, so + instead of *:


( $temp_pack ) = /^(\S+)/;


$returned_version =~ m/Version: (.*)/i;

Version is at the start, you know the capitalization already:


$returned_version =~ m/^Version: (.*)$/m;

/m turns on multiline processing so that ^ and $ match start and end of line respectively.

What the other commenters have suggested will give you far faster results, but these tips are ones you should bear in mind for all regexps: most important is anchoring your regexps with ^ or $, it's probably the single biggest thing you can do for regexps at the start or end of a line to improve their performance.

Hey Sam, Thanks for posting.

Hey Sam,
Thanks for posting. Also, thank you for the regex pointers. Regex's are kind of a weak spot for me so those little hints you comment with will come in real handy.

More on the subject...

I'm glad the tips will be helpful, your post has actually inspired my ironman posting for the week, I've gone into a little more detail on anchoring regexps:

http://www.illusori.co.uk/perl/2010/04/22/anchoring_regexps.html

Hopefully the explanation of why anchoring is a good thing will be something people find useful.

dpkg -l ?

Maybe you can refactorize using dpkg -l.

See:

$ time test.pl
real 1m5.461s
user 0m52.539s
sys 0m11.613s

$ time dpkg -l | awk -F\ '/ii/{print $2" : "$3}'
real 0m0.169s
user 0m0.040s
sys 0m0.036s

(Output shorted from the comment, is equal.)

Happy hacking.

The comment eats spaces:

Also using code or pre.

Two spaces between -F\ and '

dpkg -l | awk -F\ '/ii/{print $2" : "$3}'

refactorized!

:D


time dpkg -l | perl -lane 'print "$F[1] : $F[2]"'
... output deleted ...
real 0m0.150s
user 0m0.060s
sys 0m0.012s

if...

dpkg -l has output headers and package lines starts by the status. 'ii' is correctly installed, I loose that bit in the previous comment:


time dpkg -l | perl -lane 'print "$F[1] : $F[2]" if m/^ii/'

Hey Poisonbit, Thanks for the

Hey Poisonbit,

Thanks for the comments, all four of them. ;)

And thanks for pointing out that little "-l" argument to dpkg. Somehow I missed that in than man page.

If you made it this far down into the article, hopefully you liked it enough to share it with your friends. Thanks if you do, I appreciate it.

Bookmark and Share