Python

Hiding Keyboard Input From the Screen

For my current job I'm in charge of writing a script that requires a person to input their password and me being the semi-security conscious individual that I am knew that regular input for a script like this would be bad, as that would allow for over the shoulder (literally) view of someone's password. This is just unacceptable.

To solve this problem I hit the internets for a way to hide user input in Perl script. And in my travels I came across this old PERL FAQ site. I decided to try out the code and update it, throw it into a little dummy script. This is what I got (minus the dummy script):

  1. sub get_password
  2. {
  3. print "enter your password: ";
  4. system("stty -echo");
  5. chop(my $password=<>);
  6. print "\n";
  7. system("stty echo");
  8.  
  9. return $password;
  10. }

Saved it, ran it, and Holy Missing Characters Batman, it worked. So just as a mental exercise, I wanted to see if I could get it to emulate the password request when you run sudo, and I wrote this up in Python (because I need the practice):

  1. #!/usr/bin/python
  2.  
  3. import os
  4. import getpass
  5.  
  6. def get_password():
  7. #print "Please enter your password:"
  8. os.system("stty -echo")
  9. password = raw_input( getpass.getuser() + "'s password")
  10. os.system("stty echo")
  11. print "\n"
  12.  
  13. return password
  14.  
  15. def main():
  16. local_password = get_password()
  17. print local_password
  18.  
  19. main()

And again... it worked. Of course I finally realized that the reason why it worked is because of the line "stty -echo". Taking that I typed it into a standard terminal and after hitting enter the results were the same. My terminal functioned normally, but without showing me the characters I was typing. After returning my terminal back to normal I decided to read a little more about stty. After digging through the man page for a little while I came across this argument which is an alias for echo:

[-]crterase
echo erase characters as backspace-space-backspace

Thus the reason why I couldn't see anything change is because my terminal is treating all characters I type on the screen as a three part sequence of non alphanumeric characters. Although I find this a little odd; I have to just go with the words of Bugs Bunny on this one, “I don't ask questions. I just have fun.”

Tweet Dump part 3

Welcome back to another, and probably the last, instalment of the tweet dump project.

The old tweet dump code was done locally. I wanted to see how things would be affected if I ran the tweet dump code to a remote server. Just to get an idea on how travel would affect latency.

So I changed things up a bit, I created a sql database through my hosting provider and imported the schema into the remote database. At which point I also modified up the tweet dump code to use this new remote database. And after running the python script 5 times, here are my numbers:

Run 1 43.998
Run 2 46.352
Run 3 45.029
Run 4 55.024
Run 5 49.174
Average 47.92

In my tweetdump code I have a lot of going back and forth between the server and client;

  1. def getId(in_id):
  2. sql = "select id from ids where twit_id = '" + in_id + "'"
  3. return runQuery(sql)
  4.  
  5. def addId(in_id):
  6. sql= "insert into ids values (null, '" + in_id + "')"
  7. runQuery(sql)
  8.  
  9. def addTweet(in_id, in_tweet):
  10. sql = "insert into tweets values ('" + in_id + "',\"" + in_tweet + "\")"
  11. runQuery(sql)
  12. ...
  13. site_id = getId(ids[j])
  14. if( not site_id):
  15. addId(ids[j])
  16. site_id = getId(ids[j])

When you see the three functions making up the code block near the bottom, you might realize that there are three seperate sql calls to the server. And while this might be fine for a local database, this is not good for a remote one. So I decided to try my hand at creating a sql function to reduce some of the back and forth between the two entities.

  1. delimiter //
  2.  
  3. drop function if exists `get_set_ids`//
  4. create function get_set_ids( in_tweet_id BIGINT )
  5. RETURNS int(10) unsigned
  6. BEGIN
  7. DECLARE new_id int(10) unsigned;
  8. select id into new_id from ids where twit_id = in_tweet_id;
  9. IF ISNULL(new_id) THEN
  10. insert into ids values (null, in_tweet_id);
  11. select id into new_id from ids where twit_id = in_tweet_id;
  12. END IF;
  13. return new_id;
  14. END //
  15.  
  16. delimiter ;

The function above takes the guess work away from the client, and keeps it within the server. By doing this we avoid and entire round of communication between the client & server. So in theory we should see at least some kind of speed-up.

With the function above created I then went and modified the script to take advantage of the function:

  1. def get_set_id(in_id):
  2. sql = "select get_set_ids(" + in_id + ")"
  3. return runQuery(sql)
  4. ...
  5. site_id = get_set_id(ids[j])
  6. addTweet(str(site_id[0][0]), final_texts[j])

After a bit of testing to make sure things worked, I ran five timed tests (on the same hardware and from the same location, to try and reduce any variables that might crop up).

Run 1 25.948
Run 2 24.35
Run 3 26.181
Run 4 24.667
Run 5 25.352
Average 25.3

The difference between the two averages is about 22 seconds. And to be honest, I did not expect these changes to cut down my times by about half. This was a little bit of a shock to me. I guess in this end this is an example of how design can really matter.

Tweet Dump

So one day I was browsing the API of Twitter. And on this particular page on Twitter's website, they talked using a command line utility to interact with Twitter. I was intrigued by this, so I decided to write up a simple script to deposit the tweets and twitter ID's into a database. Why? Because I could. ;)

So if for whatever reason you have to recreate my little experiment, you'll need a couple of things:


    1) A unix shell
      1.1) grep

    2)python installed and working
      2.1) the mysqldb python module installed

    3)mysql database installed

So here is my MySQL schema:

  1. -- MySQL dump 10.11
  2. --
  3. -- Host: localhost Database: twit_bank
  4. -- ------------------------------------------------------
  5. -- Server version 5.0.84-log
  6.  
  7. /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
  8. /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
  9. /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
  10. /*!40101 SET NAMES utf8 */;
  11. /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
  12. /*!40103 SET TIME_ZONE='+00:00' */;
  13. /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
  14. /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
  15. /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
  16. /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
  17.  
  18. --
  19. -- Table structure for table `ids`
  20. --
  21.  
  22. DROP TABLE IF EXISTS `ids`;
  23. /*!40101 SET @saved_cs_client = @@character_set_client */;
  24. /*!40101 SET character_set_client = utf8 */;
  25. CREATE TABLE `ids` (
  26. `id` int(10) unsigned NOT NULL auto_increment,
  27. `twit_id` bigint(20) default NULL,
  28. UNIQUE KEY `id` (`id`)
  29. ) ENGINE=MyISAM AUTO_INCREMENT=50250 DEFAULT CHARSET=latin1;
  30. /*!40101 SET character_set_client = @saved_cs_client */;
  31.  
  32. --
  33. -- Table structure for table `tweets`
  34. --
  35.  
  36. DROP TABLE IF EXISTS `tweets`;
  37. /*!40101 SET @saved_cs_client = @@character_set_client */;
  38. /*!40101 SET character_set_client = utf8 */;
  39. CREATE TABLE `tweets` (
  40. `ids` int(10) unsigned default NULL,
  41. `tweet` char(150) default NULL
  42. ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
  43. /*!40101 SET character_set_client = @saved_cs_client */;
  44. /*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
  45.  
  46. /*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
  47. /*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
  48. /*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
  49. /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
  50. /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
  51. /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
  52. /*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
  53.  
  54. -- Dump completed on 2009-10-19 0:22:26

And once you have that schema install, you can use the script below (Obviously you'll need to change the username and password to your particular database, but you don't need me to tell you that. My readers are S-M-A-R-T!)

  1. #!/usr/bin/python
  2.  
  3. import os
  4. import subprocess
  5. import MySQLdb
  6. import re
  7. import sys
  8.  
  9. #variables
  10. databaseUser = "*****"
  11. databasePassword = "*****"
  12. databaseName = "twit_bank"
  13.  
  14.  
  15. def runQuery(query):
  16.  
  17. db = MySQLdb.connect(user=databaseUser, passwd=databasePassword, db=databaseName)
  18. c = db.cursor()
  19. c.execute(query)
  20. db.commit()
  21.  
  22. return c.fetchall()
  23.  
  24. def getId(in_id):
  25. sql = "select id from ids where twit_id = '" + in_id + "'"
  26. return runQuery(sql)
  27.  
  28. def addId(in_id):
  29. sql= "insert into ids values (null, '" + in_id + "')"
  30. runQuery(sql)
  31.  
  32. def addTweet(in_id, in_tweet):
  33. sql = "insert into tweets values ('" + in_id + "',\"" + in_tweet + "\")"
  34. runQuery(sql)
  35.  
  36. def main():
  37.  
  38. final_texts = list()
  39. j = 0
  40. site_id = list()
  41.  
  42. #code structure grabbed from:
  43. #http://buffis.com/2007/01/09/getting-output-of-a-system-call-in-python/comment-page-1/
  44.  
  45. command = "curl <a href="http://twitter.com/statuses/public_timeline.xml" title="http://twitter.com/statuses/public_timeline.xml">http://twitter.com/statuses/public_timeline.xml</a> | grep -A 3 '<status>' "
  46. process = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
  47. os.waitpid(process.pid, 0)
  48. curl = process.stdout.read().strip()
  49. sys.stdout.flush()
  50.  
  51. ids = re.findall("[0-9]{10,13}", curl)
  52. texts = re.findall("<text>[\x20-\x7E]{1,150}", curl)
  53.  
  54. if ( len(ids) == len(texts)):
  55. for i in texts:
  56. temp = i.lstrip('<text>')
  57. final_texts.append(temp.rstrip('</text>'))
  58.  
  59. while( j < 19):
  60. site_id = getId(ids[j])
  61. if( not site_id):
  62. addId(ids[j])
  63. site_id = getId(ids[j])
  64.  
  65. addTweet(str(site_id[0][0]), final_texts[j])
  66. j = j + 1
  67.  
  68. main()

And this final last piece to put it all together, so you don't have to do all the grabbing manually.(Don't forget to change the location of the script.)

  1. #!/bin/bash
  2.  
  3. while [ 1 ]
  4. do
  5. /home/bryce/programming/personal/scripts/python/twit2mysql.py
  6. sleep 30
  7. done

Now we get to the fun part. After about 24 hours of grabbing the public timeline at 40 tweets a minute I got 50249 ID's and 50234 tweets on record. Now while watching the numbers grow, there was a weird issue. Most of the time there were more ID's than tweets. I'm not sure why this is at the moment, but I'm curious enough to try and solve this bug.

Oh, and something that people should probably know. The reason why I'm only grabbing 40 tweets a minute is because the first time I tried to run this, I was grabbing more like 120 tweets a minute. And I was blocked by Twitter for viewing the public timeline more than 150 times in one hour. And I was blocked for the rest of the hour. So consider that your warning.

My First Python Script

I would like to introduce everyone to my first Python script. Just for the sake of full disclosure it was originally put together by a coworker and then passed off to me for completion. I haven't had much interest until recently to even touch Python. I guess all of us should be so lucky to have a job that forces us to grow in unexpected directions every once and a while.

The script itself is nothing overly complicated, but I'm still proud of it. The reason for that is because I had to dissect the language and semantics to figure out what functionality was still needed to be added to the script. As well as learn to read the language in order to work with the parts that were existing before I touched it.

Just a quick overview of what it actually does ( or is at least suppose to do ;) ). In a MySQL database somewhere in the world, there is a database that holds a schema that was created to try and allow for dynamic recreation of Apache virtualhosts config files at the push of a button. (I'm being a little terse regarding things because I really enjoy the project and have an upcoming blog post with more details. So I don't want to spoil all the fun now. :) )

  1. # -*- coding: utf-8 -*-
  2. import commands
  3. import datetime
  4. import unittest
  5. import MySQLdb
  6. import sys
  7.  
  8. ###################### Constants #############################
  9.  
  10. # What user should we connect to the database as, for updating tables and such?
  11. databaseUser = "******"
  12.  
  13. # What is the password to the database for the given user?
  14. databasePassword = "******"
  15.  
  16. # What is the name of the database that we are connected to?
  17. databaseName = "apache_config"
  18.  
  19. ##################### Begin Code Implementation ###################
  20.  
  21. # Returns a set containing all tuples satisfying the given query.
  22. def runQuery(query):
  23. db = MySQLdb.connect(user=databaseUser, passwd=databasePassword, db=databaseName)
  24. #db = MySQLdb.connect(host=host,user=databaseUser, passwd=databasePassword, db=databaseName)
  25.  
  26. # Execute the query and return the data.
  27. c = db.cursor()
  28. c.execute(query)
  29. return c.fetchall()
  30.  
  31. # Retrieves the master table that holds information on each of the virtualhosts currently being stored.
  32. def retrieveMasterVirtualhostTable(in_virtualhost):
  33. return runQuery("select name, ipaddress from virtualhosts WHERE name = \"" + in_virtualhost + "\"" )
  34.  
  35. #get all information for each virtualhosts based on virtualhost name
  36. def retrieveSitesbyVirtualhosts(in_virtualhost):
  37. return runQuery("SELECT virtualhosts.name, virtualhosts.ipaddress, site.site_name, site.docroot, site.admin_email, site.redirects, site.misc, site.site_id, site.misc FROM site,virtualhosts WHERE site.vhost_id = virtualhosts.id AND virtualhosts.name= \"" + in_virtualhost + "\"")
  38.  
  39. # retrieve all aliaeses for a particular site
  40. def retrieveAliasesbysite_id(in_aliases):
  41. return runQuery("SELECT alias_name FROM site_aliases WHERE site_id= " + in_aliases )
  42.  
  43. # Given a virtualhost entry, returns the header for the associated virtualhost.
  44. def formatVirtualhostHeader(virtualhost):
  45. return 'NameVirtualHost ' + virtualhost[1] + '\nLogFormat "%v %h %l %u %t \\\"%r\\\" %>s %b" vhost\n'
  46.  
  47. def convertWebsiteToVirtualhostEntry(site):
  48. if( not site[5] ):
  49. if ( site[4]):
  50. ret= "\n<VirtualHost " + site[1] + ">\n" + \
  51. " ServerAdmin " + "webmaster@" + site[2] + "\n" + \
  52. " DocumentRoot " + site[3] + "\n" + \
  53. " ServerName " + site[2] + "\n"
  54. else:
  55. ret= "\n<VirtualHost " + site[1] + ">\n" + \
  56. " DocumentRoot " + site[3] + "\n" + \
  57. " ServerName " + site[2] + "\n"
  58.  
  59. # Add in site aliases.
  60. for alias in retrieveAliasesbysite_id(str(site[7])):
  61. ret += " ServerAlias " + alias[0] + "\n"
  62.  
  63. ret+=\
  64. " ErrorLog /usr/local/var/apache/log/" + site[0] + ".error_log\n" + \
  65. " CustomLog /usr/local/var/apache/log/" + site[0] + ".access_log vhost\n"
  66.  
  67. if site[8]:
  68. ret += site[8] + "\n"
  69.  
  70. else:
  71. ret= "\n<VirtualHost " + site[1] + ">\n" + \
  72. " ServerName " + site[2] + "\n"
  73.  
  74. for alias in retrieveAliasesbysite_id(str(site[7])):
  75. ret += " ServerAlias " + alias[0] + "\n"
  76.  
  77. ret += " Redirect " + site[5] + "\n"
  78.  
  79. ret += "</Virtualhost>"
  80. return ret
  81.  
  82. # Execution begins here.
  83. def main():
  84.  
  85. # grab information pertaining
  86. virtualhost = retrieveMasterVirtualhostTable(sys.argv[1])
  87.  
  88. # print the format stuff out at the top of the file
  89. print formatVirtualhostHeader(virtualhost[0])
  90.  
  91. #now grab sites based on the virtualhost called
  92. virtualhostMaster = retrieveSitesbyVirtualhosts(virtualhost[0][0])
  93.  
  94. # For each site in the virtualhostMaster "list"..
  95. for v in virtualhostMaster:
  96.  
  97. ### Print entire output to the screen
  98. print convertWebsiteToVirtualhostEntry(v)
  99.  
  100. main()

I would just like to point out that although I'm proud of my work. I'm not sharing this to try and receive praise from the internet. I'd probably have better luck getting a STD from the internet than praise for this little script. On the contrary, I'm posting this because I'm new to Python, and I would like someone who has a little more experience to guide me in the places that I might not have used the best python methods to get things accomplished. For example, the line:

  1. virtualhostMaster = retrieveSitesbyVirtualhosts(virtualhost[0][0])

Although this works, my intuition tells me that this isn't a very python way of getting that data, using a double index like that. However I just haven't been able to find the correct way to do this. So if someone out there knows of a better way, I'm all eyes!

Syndicate content