Calculon: Aggregate Time Functions in ActiveRecord
Posted 27 Apr 2013 to activerecord, rails, ruby, gems and has Comments
While Rails does have the ability to run aggregate functions over certain columns in ActiveRecord (like sum, average, min, max, etc), there’s no way to do this easy while grouping by time buckets. For instance, it’s often the case that you want to know not just the sum of some value between two points in type, but also bucketed by minute (or hour/day/week/month/year/etc).
Using just active record, this can be a bit nasty looking and error prone. For instance, if I want to get the sum over a column named ‘a column’ by hour for today, the code would end up looking something like this:
grouping = "concat(date(created_at) ' ', hour(created_at), ':00:00')"
SomeModel.sum('a_column').group(grouping).where(['date(created_at) = ?', Date.today)
Which is a PITA. ActiveRecord gives so many shortcuts for doing all kinds of things - why not time based groupings?
I think I know the answer to that question now - after going through the trouble of implementing a gem to do this. The gem is named calculon, after the famous actor. It’s even more of a PITA to deal with all of the various time manipulation functions in each database (for instance, sqlite has only 5 date and time functions) - so I understand why no one has seemingly tackled this problem before.
I started with only MySQL support, and it’s made my life quite a bit easier. Using calculon, the example above becomes:
SomeModel.by_hour(:a_column => :sum).on(Date.today)
You can even make shortcuts if there are certain values you want to perform a given aggregate function over routinely. For instance, let’s say you have a class named Game that has points for team a and team b, and you want to get their average scores.
class Game
attr_accessible :team_a_points, :team_b_points
calculon_view :points, :team_a_points => :avg, :team_b_points => :avg
end
# get avg points by day
Game.points_by_day.each { |game|
puts "#{game.time_bucket}: A Avg: #{game.team_a_points}, B Avg: #{game.team_b_points}"
}
# these work too
Game.points_by_hour
Game.points_by_month
Game.points_by_year
Check out the code/further docs at github.com/opbandit/calculon.
Gestating in the Old Gray Lady
Posted 07 Apr 2013 to OpBandit, NYTimes, TimeSpace and has Comments
The OpBandit team just finished our first week in the New York Times timeSpace. So far, everything has been going really well. We’ve been able to get some great feedback from some exceptionally talented people at the Times.
When describing the program, though, it can be tough because it’s really neither an accelerator nor an incubator. From the timeSpace website:
The Times will not seek equity in your company as part of the program. If and when you raise an institutional round of financing, The New York Times Company will separately consider participating if invited… We may become one of your customers during or after your time here. But this is not the purpose of timeSpace. You may call it an accelerator or an incubator; right now we are calling it an experiment.
While “accelerator” and “incubator” are frequently used interchangeably, there are distinct differences between the two terms. An accelerator helps early stage startups bring an idea to a prototype and then hopefully to market. They’re typically intensive, last only for a short window, and provide small amounts of capital for single digit equity. An incubator typically provides space, business services, and mentorship to foster gradual growth over a much longer period for a larger share of equity.
Unlike either of these, however, timeSpace takes no equity. It’s the duration of an accelerator with the services and support of an incubator. Rather than “an experiment,” though, I’d like to propose a new term. Since the New York Times is essentially “carrying” three companies during these early stages, it’s basically just gestation.
Gestator: A program hosted by an established company to provide young startups with short term space, mentorship, business development support, and optional capital in exchange for ideas and exposure (but no equity).
There are an increasing number of examples (Nike+, BBC Worldwide Labs, Kaplan EdTech, etc) of larger companies welcoming smaller ones into their space for this type of embedded gestation (unlike timeSpace, though, many of these do provide funding). It reminds me of an “startup in residence” type program - the larger entity gets new ideas, fresh perspectives, and a drive to shake things up a bit, while the startups get legitimacy and support.
Hopefully, this trend will continue. I think it’s great.
Asynchronous Array Mapping in Ruby
Posted 14 Mar 2013 to ruby, eventmachine and has CommentsEventMachine is great, but may require a few fights to get started.
Recently, I wanted to be able to get the sizes of each image in an array using FastImage. It’s fast, but it’s not that fast when you’ve got tens or hundreds of images you’re trying to size at once.
What I needed was an asynchronous map method for Arrays, like this:
images = [ "http://example.com/one.png", "http://example.com/two.png", ... ]
images.async_map { |img| FastImage.size(img) }
EventMachine did not make this easy. After some fighting, here’s what I came up with:
class Array
def async_map(&block)
results = nil
EventMachine.run do
operation = proc { |item, iter|
EventMachine.defer(proc { block.call(item) }, proc { |r| iter.return(r) })
}
callback = proc { |rs|
results = rs
EventMachine.stop
}
EventMachine::Iterator.new(self, length).map(operation, callback)
end
results
end
end
It works (10x speed increase!), but I’d like to believe that there’s got to be an easier way to do this.
The Role of Fear: A Startup Retrospective
Posted 25 Feb 2013 to livingsocial, startups, deep thoughts and has CommentsWhile there are plenty of emotions that can affect your psyche while at a startup, I think that fear is one of the most interesting. Unlike passion/ego/etc., I think it is unique in that the role it plays undergoes a dramatic change over time. While that change is decidedly gradual, the mutation is easily noticed, and is quite possibly preventable (which, I contend, is a worthy effort).
Fear as a Motivator
When I began my two year stay at LivingSocial the engineering team had fewer than 10 people (I was number 6). Changes were made to the product (just daily deals at that point) and email systems many times each day. New ideas were constantly being tried to decrease latency, increase efficiency, increase signups, decrease bounces, and everything else you can imagine. There was an understanding that a failure to succeed and dominate could quite easily mean we’d all be looking for new jobs in the near future. That fear of failure was a fantastic motivator to continuously perturb our system with new code, new products, and new approaches in an attempt to find the maximal “steady state” for our product.
That low level continual fear feels somewhat healthy. Forgetting the reality of what a “payoff” may actually mean, knowing that your small group of people has the ability to steer the boat to either rocky shoals or sandy beaches of success is fantastic. The rocks are large and sometimes loom ominously on the horizon - but that’s the adventure. Success may never come in a financial sense, but the possibility of being a part of something awesome can lead you to channel the fear of potential failure into a scrappy tenacity to find success.
It’s not just fear of failure that is the motivator; there’s clearly a fear of what a partial success could mean in the face of failing to “go big” when you had the chance. No one wants to look back months or years later wondering, “Did we do the most we could have with what we had, or did we leave too much on the table?” Fear of future regret can produce a powerful incentive to make sure you’re doing everything you can at every point to take advantage of opportunities.
Fear as the Mind Killer
Eventually, the positive aspects of fear diminish and the negative ones begin to dominate. As the boat gets bigger, the value of the cargo increases, and so does the number of people who think their opinions should help steer the organization (incidentally, this is also the point at which you probably refer to your company as “the organization” or “the org”). Fear of not succeeding is replaced by the fear of losing the success that has already been gained. New hires have never known the fear that their jobs may not be around in a few months unless they and a small group excel at their jobs (though new hires at LivingSocial may now arguably have some of that fear).
Failure is the norm in the beginning. So many ideas are tried that a majority of them are going to naturally be terrible. Once the product reaches that “steady state”, however, and those major changes are no longer being made, the culture shifts. No one wants to fail at anything, especially when so many people are watching. It leads to a type of paralysis that eventually favors group decisions, spread blame/responsibility, and then action only after consensus. No one (understandably) wants to be responsible for breaking anything (which is natural in the beginning), especially when millions of dollars could be on the line.
This culture shift is reinforced internally by demands for consensus. Product changes that once would have just been implemented by an engineer testing something eventually required a room full of product/project/design managers to argue about every change over the course of multiple meetings. Adaptation and maneuvering are simply painful at this point. People become individually tied to specific long-term projects and initiatives, which makes it hard for them to admit when their specific endeavor is failing to meet expectations. Instead of dropping it quickly and moving on to the next idea, they hang on and keep pushing for as long as possible. In the beginning, global success is the most important (without it, no one has a job) so people aren’t so much scared about the success of their own ideas as the success of the company in general. It’s easy to move on to the next idea when all you care about is that some idea ends up being successful, not just your idea’s success. Once the company is stable and established, people will focus more on their own questionable success given that the company will likely still be around even if they fail. This completely shifts the global view of “the antagonist” from external forces (chief competitors, inability to raise money, lack of market penetration, etc) to internal ones (how can I make sure my idea is completed instead of Suzy’s so I can get the raise and promotion).
Mitigating the Mutation
Maintaining flexibility and fluidity is tough once the boat gets big. The shift is natural and may be largely unavoidable. Here are some ideas, though, that may be worth trying:
- Make sure that your technology choices give you the freedom to make rapid changes and evaluate their effectiveness. The faster you can try/test an idea and branch and bound early, the better. If changes can be made that only affect 1% of the user base, then it’s less likely that multiple meetings of many product/project/design/etc managers will be necessary.
- Be honest internally. Pick honest metrics for measuring success and stick with them for each internal project. Kill or pivot anything that doesn’t meet goals.
- Make it easy teams to form into inner startups and rekindle some of the useful, initial fear. Let these teams pitch their ideas (effectively having to “sell” them), give them necessary KPI’s that they have to hit, and then give them (almost) complete autonomy. This is the same model adapted to startups that visionaries like Paul Romer are trying to use to help correct failed states (for a great overview of the process, see the This American Life episode).
- Artificially perturb the system. You might have big ship with exceptionally valuable cargo - but if you can regularly try crazy ideas (“I know we’ve always done X, but what if we stopped…”) with a small subset of your users/clients/etc then you may be able to keep teams from becoming paralyzed.
- Know the difference between maintainers and builders in your workforce, and keep the builders happy and a solid percent of the total workforce (“builder” doesn’t necessarily mean “build from scratch” - could be “create some new feature”). It takes a different kind of person and a different set of skills to prototype a new product than you want working on maintaining your primary/static product.
These are a few ideas, but they’re certainly not exhaustive. The role of fear in a startup is an interesting and useful one - but one that has to be monitored and shaped carefully over the long term.
Reading an SSL Cert in Ruby Over a Socket
Posted 13 Jan 2013 to ruby, ssl and has CommentsI couldn’t find any examples of reading a SSL certificate from a socket connection, so I thought I’d share my approach here. Ultimately, I just wanted to get the X509 certificate information for a few websites that have HTTPS - which this bit of code will do.
require 'socket'
require 'openssl'
tcp_client = TCPSocket.new "example.com", 443
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client
ssl_client.connect
cert = OpenSSL::X509::Certificate.new(ssl_client.peer_cert)
ssl_client.sysclose
tcp_client.close
certprops = OpenSSL::X509::Name.new(cert.issuer).to_a
issuer = certprops.select { |name, data, type| name == "O" }.first[1]
results = {
:valid_on => cert.not_after,
:valid_until => cert.not_before,
:issuer => issuer,
:valid => (ssl_client.verify_result == 0)
}
The last line creates a hash that contains when the cert became valid, when it will become invalid, who issued it, and whether or not it actually is valid.
Showoff: Github Repo Lists on Pages
Posted 22 Oct 2012 to github, moonshinedevco and has CommentsI was looking around for a decent way to embed github repository information in web pages but couldn’t find anything. All I wanted was the ability to show the same things that are listed for each repo on a user’s homepage, but I wanted to be able to show repos from across accounts (I have code under a few different organizations). I couldn’t find anything, so I created a little Javascript library called showoff.
It’s pretty simple to showoff two repos, each under a different organization. You just give a div you want to fill, and the usernames and repos you want to see.
$(function() {
SHOWOFF.load('examplerepos', {
'bmuller': [ 'sexmachine' ],
'moonshinedevco': [ 'showoff' ]
});
});
This results in the following display:
It’s not as pretty as the one github shows, but they have their own octicons. If you want to use it, check out the showoff github repo to play.
Sample Size Calculations in Ruby
Posted 17 Aug 2012 to statistics, ruby and has CommentsI really love the abba tool. It’s great. So is R. So are Zed Shaw’s rants on statistics.
What really sucks is the complete lack of basic statis libraries in Ruby. After spending more time than I’d like to admit going over some reference implementations in R, I added both sample size and confidence interval calculations to ABAnalyzer. Here are some excerpts from the docs.
Sample Size Calculations
Let’s say you want to determine how large your sample size needs to be for an A/B test. Let’s say your baseline is 10%, and you want to be able to determine if there’s at least a 10% relative lift (1% absolute) to 11%. Let’s assume you want a power of 0.8 and a significance level of 0.05 (that is, an 80% chance of that you’ll fail to recognize a difference when there is one, and a 5% chance of a false negative).
require 'rubygems'
require 'abanalyzer'
ABAnalyzer.calculate_size(0.1, 0.11, 0.05, 0.8)
=> 14751
This means that you will need at least 14,751 people in each group sample. You can see this same example with R at on the 37 signals blog.
Confidence Intervals
You can also get a confidence interval. Let’s say you have the results of a test where there were 711 successes out of 4000 trials. To get a 95% confidence interval of the “true” value of the conversion rate, use:
ABAnalyzer.confidence_interval(711, 4000, 0.95)
=> [0.1659025512617185, 0.1895974487382815]
This means (roughly) that if you ran this experiment over and over, 95% of the time the resulting proportion would be between 17% and 19%.
You can also determine what the relative confidence intervals would be. Let’s say that your old conversion rate was 13%, and you wanted to know what sort of relative lift you could get.
ABAnalyzer.relative_confidence_interval(711, 4000, 0.13, 0.95)
=> [0.27617347124398833, 0.45844191337139606]
This means (roughly) that if you ran this experiment over and over, 95% of the time the resulting proportion would be a relative lift of between 28% and 46%. Go buy yourself a beer!
Genderator: Gender from First Name with Python
Posted 23 Jun 2012 to python and has CommentsI couldn’t find a Python library that would give me a guess for gender based on first name that would handle international names as well, so I started an interface to one written in C. That was taking too long (due to the added complexity around turning the C code into something suitable for a library), so I just wrote a parser for the data file in that project. The code currently doesn’t handle country selection (yet), but it should be suitable enough for most needs. Here’s some basic use:
from genderator.detector import *
d = Detector()
d.getGender('Bob') == MALE # True
d.getGender('Sally') == FEMALE # True
d.getGender('Pauley') == ANDROGYNOUS # True
I18N is fully supported:
d.getGender(u'\301lfr\372n') == FEMALE # True
The code can be found on github.
txyam: Yet Another Memcached Twisted Client
Posted 09 Jun 2012 to twisted, python, memcache and has CommentsThere are a number of number of memcached client libraries written for Python Twisted (like twisted-memcached, txconnpool, etc). None of them did everything I wanted, though. Here’s what I needed:
- A reconnecting client: if a connection is closed the client should keep trying to reconnect
- Partitioning: You should be able to use as many memached servers as you’d like and partition the keys between them
- Pickling/Compression: You should be able to effortlessly store objects (and have them compressed if you’d like)
Naturally, I went ahead and wrote a new client that does all of this. I’m calling it txyam, as in “Yet Another Memcached” client. Here’s some example usage:
# import the client
from txyam.client import YamClient
# create a new client - hosts are either hostnames (default port of 11211 will be used) or host/port tuples
hosts = [ 'localhost', 'otherhost', ('someotherhost', 123) ]
client = YamClient(hosts)
# Run some commands. You can use all of the typical get/add/replace/etc
# listed at http://twistedmatrix.com/documents/current/api/twisted.protocols.memcache.MemCacheProtocol.html
client.set('akey', 'avalue').addCallback(someHandler)
# Additionally, you can set / add / get picked objects
client.addPickled('anotherkey', { 'dkey': [1, 2, 3] }, compress=True)
client.getPickled('anotherkey', uncompress=True)
# get stats for all servers
def printStats(stats):
for host, statlist in stats.items():
print host, statslist['bytes']
client.stats().addCallback(printStats)
Free, Automatic Twisted Error Reporting
Posted 30 Apr 2012 to python, twisted, error reporting, airbrake and has CommentsWhen you have a Twisted server running somewhere in the cloud, there really aren’t that many options for automatic notifications when there’s an error. If you’re using Rails, then there are a number of options to do this sort of thing (New Relic, Airbrake, etc). Many of these will work for WSGI applications, but there isn’t much in the way of automatic error reporting for Python otherwise.
I’ve gotten used to using Airbrake with Rails, so I wanted to figure out how to integrate a Python application with it. It turns out, there’s been at least one attempt, but it’s synchronous (which could potentially stall a Twisted application if it can’t make a connection to the Airbrake server).
Enter txairbrake. It’s written specifically for Twisted applications and is non-blocking (using twisted.web.client.getPage). It’s also dead simple to use:
# import the observer
from txairbrake.observers import AirbrakeLogObserver
# Create observer. Params are api key, environment, and use SSL. The last two are optional.
ab = AirbrakeLogObserver("mykey", "production", True)
# start observing errors
ab.start()
Any uncaught exceptions will then be reported to the Airbrake server, where you can set up email notifications.
Additionally, if you’re tight on cash and don’t want to shell out tons of money to Airbrake, consider setting up Errbit instead. It’s free, Airbrake API compliant, and you can run it for free on Heroku. If you go this route, just pass a new constructor argument of airbrakeHost to the AirbrakeLogObserver with the location of your Errbit server.
There, now you have free, automatic error reporting from within a Twisted application.