Free, Automatic Twisted Error Reporting
Posted 30 Apr 2012 to python, twisted, error reporting, airbrake and has CommentsWhen you have a Twisted server running somewhere in the cloud, there really aren’t that many options for automatic notifications when there’s an error. If you’re using Rails, then there are a number of options to do this sort of thing (New Relic, Airbrake, etc). Many of these will work for WSGI applications, but there isn’t much in the way of automatic error reporting for Python otherwise.
I’ve gotten used to using Airbrake with Rails, so I wanted to figure out how to integrate a Python application with it. It turns out, there’s been at least one attempt, but it’s synchronous (which could potentially stall a Twisted application if it can’t make a connection to the Airbrake server).
Enter txairbrake. It’s written specifically for Twisted applications and is non-blocking (using twisted.web.client.getPage). It’s also dead simple to use:
# import the observer
from txairbrake.observers import AirbrakeLogObserver
# Create observer. Params are api key, environment, and use SSL. The last two are optional.
ab = AirbrakeLogObserver("mykey", "production", True)
# start observing errors
ab.start()
Any uncaught exceptions will then be reported to the Airbrake server, where you can set up email notifications.
Additionally, if you’re tight on cash and don’t want to shell out tons of money to Airbrake, consider setting up Errbit instead. It’s free, Airbrake API compliant, and you can run it for free on Heroku. If you go this route, just pass a new constructor argument of airbrakeHost to the AirbrakeLogObserver with the location of your Errbit server.
There, now you have free, automatic error reporting from within a Twisted application.
Twisted adbapi.ConnectionLost and MySQLdb
Posted 29 Mar 2012 to twisted, mysql, twistar and has CommentsTwisted’s connection pool doesn’t support automagical reconnecting, which means that if the MySQLdb driver loses it’s connection, you get a
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
exception that doesn’t result in a new connection being established for the failed query. Here’s the full trace:
2012-03-29 19:43:02-0400 [HTTPChannel,0,127.0.0.1] Rollback failed
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
File "build/bdist.macosx-10.7-intel/egg/twistar/dbconfig/mysql.py", line 24, in _runInteraction
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/enterprise/adbapi.py", line 455, in _runInteraction
conn.rollback()
--- <exception caught here> ---
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/enterprise/adbapi.py", line 56, in rollback
self._connection.rollback()
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
2012-03-29 19:43:02-0400 [HTTPChannel,0,127.0.0.1] Rollback failed
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/python/threadpool.py", line 207, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
File "build/bdist.macosx-10.7-intel/egg/twistar/dbconfig/mysql.py", line 24, in _runInteraction
--- <exception caught here> ---
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/enterprise/adbapi.py", line 455, in _runInteraction
conn.rollback()
File "/Library/Python/2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.7-intel.egg/twisted/enterprise/adbapi.py", line 70, in rollback
raise ConnectionLost()
twisted.enterprise.adbapi.ConnectionLost:
This is a huge PITA. To take care of this in Twistar, I added a reconnecting class. A ReconnectingMySQLConnectionPool can be used instead of a adbapi.ConnectionPool:
from twisted.enterprise import adbapi
from twistar.registry import Registry
from twistar.dbobject import DBObject
from twistar.dbconfig.mysql import ReconnectingMySQLConnectionPool
Registry.DBPOOL = ReconnectingMySQLConnectionPool("MySQLdb"
user="username",
passwd="pass",
db="dbname",
host="host",
cp_reconnect=True)
class User(DBObject):
pass
def done(user):
print "A user was just created with the name %s" % user.first_name
u = User(first_name="John", last_name="Smith", age=25)
u.save().addCallback(done)
When using the ReconnectingMySQLConnectionPool, any connection breaks from MySQL will result in the ConnectionPool reconnecting and attempting to execute a transaction a second time. This at least alleviates the problem when using the MySQLdb driver.
Testing Twisted Web Resources
Posted 20 Feb 2012 to python, twisted and has CommentsTesting web resources in Twisted can be a bit of a pain, and the Twisted docs don’t describe how best go go about writing tests for twisted.web.resource.Resource objects.
Generally, usage of twisted.web resources looks something like this:
from twisted.internet.defer import inlineCallbacks
from twisted.internet import defer, reactor
from twisted.web import resource
from twisted.web.server import NOT_DONE_YET
class ChildPage(resource.Resource):
def render(self, request):
d = defer.Deferred()
d.addCallback(self.renderResult, request)
reactor.callLater(1, d.callback, "hello")
return NOT_DONE_YET
def renderResult(self, result, request):
request.write(result)
request.finish()
class MainPage(resource.Resource):
def __init__(self):
resource.Resource.__init__(self)
self.putChild('childpage', ChildPage())
I created a small bit of code that wraps some of the testing library in Twisted. This code can be used to easily create tests by just using a DummySite instead of a twisted.web.server.Site. You can then call get and post on that site (and pass optional dictionaries of get/post arguments and headers). Here’s what a test looks like:
from twisted.trial import unittest
from twisted_web_test_utils import DummySite
class WebTest(unittest.TestCase):
def setUp(self):
self.web = DummySite(MainPage())
@inlineCallbacks
def test_get(self):
response = yield self.web.get("childpage")
self.assertEqual(response.value(), "hello")
# if you have params / headers:
response = yield self.web.get("childpage", {'paramone': 'value'}, {'referer': "http://somesite.com"})
Here’s the testing code if you want to use it:
And with that, a few hours worth of work will save me at least a few 10 minute segments in the future.
Stopping Time During Python Tests
Posted 12 Feb 2012 to python, testing and has CommentsWhen running unit tests in Python, it’s often the case that I need to “stop time” so that the current time remains the same during the entire execution of the test. For instance, in cases where I expect the result of a slow (networked) operation to return a value based on a creation time. If this creation process crosses into a new second, then the creation time of each of the objects will not be the same. This becomes a problem when there is latency associated either with the request to create the object or in the response after the object has been created (causing a potentially large difference between a the time the request was made and the time of the response). To compensate, I use a decorator for the unit test methods that need it.
Here’s the decorator function:
import time
def stopTime(f):
original = time.time
def newf(*args, **kwargs):
now = original()
time.time = lambda: now
result = f(*args, **kwargs)
time.time = original
return result
return newf
Here’s an example of usage in a unit test:
import unittest, time
class TestSomething(unittest.TestCase):
@stopTime
def test_something(self):
a = time.time()
time.sleep(3)
b = time.time()
self.assertEqual(a, b)
In this case, a and b will be the same, thus demonstrating your awesome ability to alter the space-time continuum.
To further illustrate what’s occurring, here’s a picture of what you’re doing:
Bandit: An A/B Testing Alternative for Rails
Posted 12 Nov 2011 to rails, vanity, statistics, testing and has CommentsIn a typical A/B test, two alternatives are compared to see which produces the most “conversions” (that is, desired results). For instance, if you have a website with a big “Sign Up” button that you want visitors to click, you may wish to choose different background colors. Under typical A/B testing guildlines, you would pick a number (say, N) of users for a test and show half of them one color and half of them another color. After users are shown the button, you record the number of clicks that result from viewing each color. Once N users view one of the two alternatives, a statistical test (generally categorical, like a Chi-Square Test or a G-Test) is run to determine whether or not the number of clicks (aka, “conversions”) for one color were higher than the number of clicks for the other color. This test determines whether the difference you observed was likely due simply to chance or whether the difference you saw was more likely due to an actual difference in the rate of conversion.
This method of testing is popular, but is fraught with issues (practical and statistical). The bandit gem provides an implementation of an alternative method of testing for Rails that solves many of these issues.
Issues with A/B Testing
There are a number of issues with A/B testing (some of which have been described in more detail here):
- You can’t try anything too crazy without having to worry about half of your users not converting. For instance, you may want to try a horrendous color for your “Buy Now” button but are too afraid about potentially harming sales if your users hate it. In this case, the risk of a big change may outweigh the possible benefit if your users like it.
- A/B testing provides a way of only testing two alternatives at once. Pick two, wait, pick two more, wait - this is not the easiest workflow if you want to test 50 options.
- With A/B Testing, you need to have a fixed sample size to make the test valid (otherwise, you run the risk of repeated significance testing errors, as described in more detail here).
- Due to the fixed sample size requirement, you may have to wait a while before you get any results from your test (especially if the expected improvement is marginal, in which case your sample size would need to be larger). This problem can be compounded if you don’t get much traffic.
- Designers and developers generally don’t want to (and shouldn’t have to) understand statistical concepts like power, p-values, or confidence when creating and evaluating tests.
- There are no good answers for what you should do when A performs just as well as B. Was the sample size just too small (implying you should try again with a large sample)? Go with A? Go with B? Does it matter? The reality is it may matter - but you won’t know.
The Bandit Method
The ultimate goal of A/B testing is to increase conversions. The problem can be described terms that differ greatly from the multitude of questions A/B testing brings (i.e., “Is A better than B?” followed by “Is B better than C?” followed by “Is C better than D?” ad infinitum). Instead, imagine you have a multitude of possible alternatives, and you want to make a decent choice between alternatives you know perform well and alternatives you haven’t tried very often each time a user requests a page. With each page load, pick the best alternative most of the time and an alternative that hasn’t been displayed much some of the time. After each display, monitor the conversions and update what you consider the “better” alternatives to be. This is the basic method of a solution to what is called the multi-armed bandit problem.
With a bandit solution, there is no concept of a “test”. At no point does the system announce a winner and a loser. Alternatives can be added or removed at any time. The better performing alternatives will be displayed more often, and the worst alternatives will rarely be displayed. At any point, if one of the poorly performing alternatives begins to perform better it will be shown more often. This provides solutions to all of the problems listed above:
- Go ahead and try something crazy. If it performs poorly, it won’t be shown very often.
- Pick as many alternatives as you’d like and add them.
- There’s no “test”, and no minimal sample size needed before optimization can start.
- Information about conversions is utilized as users convert or do not convert. There is no pause before results can be immediately used in selecting the next alternative to display to a visitor.
- Designers and developers can add alternatives or remove them at any time. The system will adjust immediately. If an alternative seems to be consistently performing poorly, it can be removed at any time. Alternatively, it can just be left forever. The best option will always be displayed the most often. There are no complicated decisions that have to be made up front or requirements that designers or developers know anything about proper statistical hypothesis testing.
- If one alternative performs the same as another, they will both be displayed with the same regularity. There would be no need to choose one over the other or remove either of them.
Bandit Gem
While there are a few A/B testing libraries for Rails out there, the preeminant one (Vanity) has statistical issues and is unreliable in a production environment. Bandit was created to test the feasibility of a multi-armed bandit based alternative to A/B testing and to solve the issues with the Rails based A/B testing gems. It is still in development, though - use at your own risk.
Resources
- bandit gem
- http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit
- http://en.wikipedia.org/wiki/Multi-armed_bandit
- http://www.evanmiller.org/how-not-to-run-an-ab-test.html
Campfirer.com - A Jabber to Campfirenow.com Gateway
Posted 13 Sep 2011 to campfirer project, campfirenow, jabber and has CommentsCampfire is a web-based group chat service that is directed at businesses. Rather than using a standard protocol, the folk at 37 Signals decided to invent their own. This has led to the necessary creation of a number of custom clients to interact with the API using their unique, one-of-a-kind protocol (for those who don’t want to have to chat in a browser window).
I heart Jabber (XMPP). There’s a good reason Google and Facebook chose that protocol to power their chat. I have no idea why 37 Signals didn’t use Jabber too. Maybe they’re mavericks.
Naturally, I’d like to be able to use one of many Jabber clients to access Campfire, along with all of my other Jabber based accounts. To do this, I wrote a Jabber Component. It provides Multi-User Chat (MUC) support for Jabber servers that utilizes Campfire’s API, so you can “join” a room, “talk”, and see other posts by other users. It’s called Campfirer (campfire + jabber = campfirer).
I’ve set up a running instance of the service at campfirer.com. A description of how to download / set up the code for your own Jabber server can be found there.
The code and more info can be found on the github project page. Pull requests welcome.
Incr/Decr Counters Using memcache-client
Posted 13 Aug 2011 to memcache, ruby and has CommentsBased on some recent changes in the memcached library, the incr method in the memcache-client gem no longer works as expected. For instance, the following:
require 'rubygems'
require 'memcache-client'
m = MemCache.new 'localhost'
m.set('counter', 0)
m.incr('counter')
will result in the following error:
MemCache::MemCacheError: cannot increment or decrement non-numeric value from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:926:in `raise_on_error_response!' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:831:in `cache_incr' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:865:in `call' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:865:in `with_socket_management' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:827:in `cache_incr' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:342:in `incr' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:886:in `with_server' from /usr/lib/ruby/gems/1.8/gems/memcache-client-1.8.5/lib/memcache.rb:341:in `incr' from (irb):5 from /usr/local/lib/site_ruby/1.8/rubygems.rb:123
This is caused by the memcache-client gem marshalling everything before it’s stored in memcache. Memcache needs the actual, unmarshalled, integer value to be stored. The code above should be changed to:
require 'rubygems'
require 'memcache-client'
m = MemCache.new 'localhost'
# set the raw value initially by passing in a fourth argument of true
m.set('counter', 0, 0, true)
# increment the raw integer value
m.incr('counter')
# you can now decrement the raw integer value as well
m.decr('counter')
The fix is simple, but not noted anywhere (I can find it) in the memcache-client documentation. Besides a few mentions on Google-groups sans solution, I couldn’t find any references to this issue elsewhere on the world wide intertubes. I find the atomic incr/decr functionality in memcache to be quite useful; I hope this can help alleviate any issues others might be having with this problem.
HBaseRB: A Ruby HBase Library
Posted 01 Aug 2011 to hbase, ruby, hadoop and has CommentsI recently upgraded the HBaseRb library I wrote a few months ago. HBaseRB provides a means for Ruby to interact with HBase using a Thrift interface. Most other libraries (like hbase-ruby, for instance) use the REST interface provided by HBase. This may work in many situations, but for our applications at LivingSocial we wanted the benefit of using a binary protocol without the overhead of XML parsing.
Some Google searching elucidated the fact that HBaseRb is a bit hard to find, so I thought I’d mention it here.
Changing Namenode Hostname Breaks Hive
Posted 18 Jul 2011 to hive, hadoop and has CommentsHive is a great piece of software - but there are still some major issues. I ran into one recently when I changed the hostname of the Hadoop namenode. I couldn’t figure out why hive was using the old hostname, even after changing all of the config files in the $HADOOP_HOME to use the new one and testing other map/red jobs.
Apparently, Hive stores all partition information with full references to the location (for instance, hdfs://host:9000/user/hive/warehouse/some/path). This makes lookups faster in the metastore, but makes it impossible to easily change the hostname of your namenode.
The best way I could find to do this was the following:
- mysqldump the metadata database to a local file
- Edit the dump and do a global search and replace on any instances of the old hostname
- Reimport the dump
If the location was saved in a separate table (w/ a one to many relationship between partitions and hosts / locations) it would make this process quite a bit easier.
Good DC Coffee Shops
Posted 19 Jun 2011 to dc, coffee and has CommentsI moved to DC about four months ago, and since then, my weekends have been frequently occupied with one quest: find the best DC coffee shop. When I was in Charleston, the answer used to to be easy (Kudu Coffee, if you’re wondering). In Baltimore, it was even easier (Red Emma’s Bookstore Coffeehouse). In The District, however, I’ve had a much harder time. There are many meretricious options to choose from, and few are real winners. There are quite a few convenience stores, bars, and restaurants that call themselves a “cafe” and really shouldn’t.
What’s a winner? Admittedly, it has a lot to do with a place that I can break out a laptop, drink some coffee, do some work, and be just distracted enough by nearby conversation that I don’t mind the fact that I’m working. Here are the metrics I take into consideration:
- outdoor seating
- free wifi
- ample seating
- power outlets
- eavesdropping payoffs (audible interesting conversations, often philosophical in nature)
- quality music or live performances
- good collaborative space (big tables, etc)
- proximity to public transportation
So here are some top performers on this list, with a final entry of what I believe to be the winner.
Ebenezer’s Coffeehouse
This is the first place I went to in DC. It’s right next to Union Station, so it’s quite accessible. I didn’t realize it at first, but this establishment is owned and operated by a Christian church. This, naturally, leads to a rather homogeneous clientele makeup, which often consists of small Bible study groups and prayer groups. Seating is generally available, the coffee is alright, and there is free timed wifi (with a purchase) - but don’t expect an interesting space, interesting characters, or any stimulating conversation.
Tryst
This place is more of a restaurant / cafe. It’s generally completely packed on the weekends with hungover college students looking for food and coffee. This is not a good work place, even if you decide to wait for a seat.
Big Bear Cafe
A great location with plenty of hits on my list of important qualities. There’s outdoor seating, free wifi, good collaborative space, great music, and more. The disadvantages are major, though - seating is impossible on the weekends and there’s no nearby metro stop.
Chinatown Coffee Co.
Excellent coffee can be found here. There’s generally enough seating, free wifi, and it’s right next to the Chinatown metro stop. You’re not likely to overhear any juicy conversations though, most stick to themselves at tables meant for one or two.
Filter Coffeehouse and Espresso Bar
Great coffee here, too, and it’s a short walk from the Dupont metro stop. There’s outdoor seating as well, though that and all of the few seats indoors are generally taken. With better seating options or fewer patrons, this place would be a real winner.
MidCity Caffe
The winner at this point is MidCity. They always have enough seating (though all seats are really close to each other, so you’ll probably make a friend), free wifi, great coffee, and excellent music. I’ve even seen a live performance or two there. It’s not too far from the U St metro stop. Another great thing about this place is the owners have made a special effort to put power strips everywhere.
There are plenty of mediocre places I’ve left off (Jolt n Bolt Coffee & Tea House, Windows Cafe & Market, and many more not worth mentioning), so this short list is by no means comprehensive. I’ll add to it if I find any other locations worth a plug.