A/B Test Analysis Library for Ruby
There are quite a few A/B testing libraries out there now for Rails and other frameworks. Most of these, I’ve noticed, do not provide any sort of analysis component for validating differences in the results. While most people are simply content to “eyeball” results, this process can be subjective and provide misleading interpretations.
For instance, let’s say you start an email campaign and have two subject lines you wish to test. You pick some number of receipients for each test, and 300 people in Group A end up receiving one subject line and 335 in Group B receive the other. You then wait a few days and look at the results, which are in the table below.
|Group A||Group B|
|Open / Not||0.5||0.675|
If you simply “eyeball” the results, you might conclude based on a comparison of the ratios that Group B performed almost 20% better than Group A. Here’s where the problem lies; the apparent difference isn’t an actual one (at least in a statistically significant sense). It turns out that there is not a statistical difference between the two results (based on either a G-Test or a Pearson’s chi-square test - you can verify the lack of statistical difference online here). In that case, the change should be actually just be abandoned or more tests run.
Misinterpretation is unfortunately an easy result of the common A/B split testing design. When results can be easily misread and the resulting effort to implement changes wasted, it is important to determine with exactness whether or not a change will really matter. With so many existing libraries that already make the actual testing part easy, I decided to create a Ruby library to handle simply analyzing the results in as simple a manner as possible. The result, called abanalyzer, provides a dirt-simple method for determining whether or not there is a statistically significant difference within categorical data (A/B testing).
To install the gem:
Usage is as simple as possible. Here is an example ruby script using the numbers above for the test:
The library itself is pretty simple; my hope is that it will be used to introduce some statistical rigour to what seems to currently be mostly guesswork and conjecture. I’d hate to see wasted effort devoted to implementing changes that will not produce any actual differences in opens/clicks/views/etc.