Fork me on GitHub

Gender and Data: Survey Results

Posted 10 Dec 2014 to ruby and has Comments

As a followup to the post describing my thoughts around renaming the gender_detector gem, I wanted to release the results of the survey at the end (if you haven’t read the original post, you should definitely go back to get a sense of context).

First, there were way more responses then I hoped for (545 total responses!). 15% were female (83), 81% male (444), and 3% (18) provided another gender (which were mostly antagonistic, like “squirrel”). Just over half (283) provided additional comments.

I’ve got some summary stats below - and while I’d love to provide some inferences based on the responses, I’d much rather see what analysis others may do with the data. Leave a comment below if you do a write up, and I’ll add a link in this post.

Question Summaries

When I first heard the name of the gem SexMachine I felt…

For the male respondents, 80% mentioned indifference or happiness (355) and 11% mentioned feeling uncomfortable (50). For females, 77% mentioned indifference or happiness (64) and 14% mentioned feeling uncomfortable (12).

On a scale of 0 to 5, how strongly do you feel that the name should have been changed.

Average for the males was 1.1 and females was 0.7.

Additional comments

About half of the males (226) and half the females (44) provided additional comments. 3% of the males (12) and 2% of females (2) used some form of “thanks”.


  1. The responses are self-reported. There’s no way to tell how many people provided honest responses.
  2. I didn’t clean up the data. For instance, duplicates and invalid responses were not removed (I’m assuming the few squirrels were actually the same squirrel responding multiple times).
  3. While 545 responses was about 500 higher than I was expecting, it’s still a miniscule representation of the entire coding world. Typical warnings about a small n apply.


You can download the response data from google docs. Leave a comment below if you do a write up, and I’ll add a link in this post.

Why I’m Renaming A Gem

Posted 17 Nov 2014 to ruby and has Comments

Almost two and a half years ago, I made a gem that determines someone’s most likely gender based on their first name. I named it SexMachine. It’s a machine that detects the gender (most often, the sex) of a person. The name seemed to fit, and it’s also the name of a song everyone has heard.

Two and a half years later, someone stepped forward to say that the name I chose was harmful.

While the comments on the Github pull request have inevitably spiraled into fatuous inanity, I do think there are a few items that are absolutely worth consideration.

First, to clarify, I don’t think it’s healthy to view the word “sex” with the prudishness that most Americans still seem to posses. Sex is not shameful, and it is not degrading. As to how appropriate of a name it is, I think that depends completely on the context. The question of whether or not the name is appropriate for a work environment isn’t one I think matters that much in this case. I’m not your coworker and my personal projects (made in my spare time with my personal resources) should not be conflated with anyone’s workplace. Github and rubygems are more like flea markets full of free art where you are welcome to take whatever you think is useful, but there’s no guarantee that you’re not going to stumble across a tasteless painting of someone doing something naughty with a banana. That’s part of what makes open source so wonderful - because someone may actually be looking for that exact painting, even if it’s not something you’re willing to hang in your corporate boardroom.

That said, I think there’s a much more convincing argument (that unfortunately has become conflated with workplace fit). I think the best argument basically boils down to a question of whether or not I, as an open source developer, want to be as welcoming as possible to a group that has long felt marginalized and uncomfortable. This is the one that I believe is worth substantial and thoughtful consideration. If enough people from that group claim that some words that I chose make it harder for them, who am I to question their feelings. I believe there is genuine sincerity in the request, and this means that I need to listen.

This is why I’m going to rename the gem. I’ve received a ton of emails and comments from both sides, and I want to make it clear that my decision has nothing to do with feeling the pressure of one group over the other. It’s because of comments like this, from a woman engineer that I corresponded with:

While the idealist in me would love to aim for a world where sex was treated more equally and openly, the unfortunate reality of tech is that it has been a haven for misogynistic men and the environment is heavily male dominated. While in an ideal world the name SexMachine would be something that both genders could joke about, the reality is that the tech community is not ready or capable of that today.

I want to bring about that day as quickly as possible. This is my contribution.

PS - If you have a moment, please take a second to fill out this short, anonymous survey. I’m a scientist and can’t help try to collect some data.

Hubot as an AWS Gatekeeper

Posted 20 Aug 2014 to hubot, aws and has Comments


On AWS, it’s easy to fall into a situation when you’re using the equivalent of a poor man’s VPN. Each time you want to access a server, you go into the AWS console online and authorize some ports in a security group for your current IP address. Then you forget to remove those rules, naturally. What if there was a way to just externally monitor your online presence, and then open access when you show up online and remove that access when you sign off… Like some sort of presence protocol, perhaps with messaging… That’s extensible too.. Like some sort of Extensible Messaging and Presence Protocol

But wait, such a thing exists. And it’s not constrained to XMPP - most other chat protocols have a way to track presence, and most of them can be accessed by Github’s Hubot. So I decided to just build a Hubot script that will open ports in security groups when I sign online, and close them when I leave. Easy.

Installation and usage is easy - see the hubot-aws-sesame github page for more info. The way Hubot can get your IP when you sign online is a bit tricky, but it works. When you sign online, Hubot sends you an image URL for an image serviced by Hubot’s built in web server. Most chat clients will automaticaly load the image, giving Hubot access to your IP. Wooo - easy poort man’s auto-connecting VPN!

Github repo is here, NPM package here.

Stunning: Determining Your Public IP

Posted 17 May 2014 to internet, ruby and has Comments

Programmatically fetching your public IP address (aka, internet visible IP) can be tough. Most often, I’ve done something silly like fetching and then parsing the page. That introduces a pretty bad dependency.

Fortunately, there’s actually a protocol for asking for your internet visible IP called Session Traversal Utilities for NAT (or STUN). It’s used by services that require peer-to-peer connection negotiation (like Skype, Google hangouts, etc).

Here’s an example Ruby script that will hit up a few public STUN servers (starting with Google’s) and return your found IP:

The Internet, 3.0

Posted 18 Mar 2014 to internet, the future and has Comments

So, you think you have a great idea for an online product? Nice. In the brave new world of the new Internet - the steps are easy:

Get a Permit

Submit an application to the governance board. You’ll need a permit, so make sure you describe what the service is / will do in detail. Also, you’ll need to guarantee that you won’t allow hateful language, harassment, or intimidation on your site. Government issued photo id will, of course, be required. Also, it’s a big investment (you must purchase access for 5 year chunks at a time), so make sure your domain is a good one! Finally, you should feel good that a huge chunk of the $10K you’re paying for the domain is going to a Universal Service Fund to help pay for internet infrastructure in remote regions of the world.

Find a Hosting Company

If your application is approved, and the domain you requested isn’t taken, then you’ll be able to pay your fee and will be assigned the domain. Here’s where the fun begins! You now need to choose a hosting provider. Good news though! You have lots of choices - Verizon or Comcast/Time Warner. You could go with an independent host (like AWS) if you really want to, but they will tack on steep fees to cover their costs to connect to the ISPs. Those fees, though, don’t cover your actual bandwidth costs - just the rent that the ISP’s charge so the tubes are actually connected to AWS. The other problem is that there really aren’t too many of them - the ISP’s aren’t very keen on the idea of letting AWS take away it’s own cloud computing (this already exists, no joke), so they try to recoup via “access charges” that are passed on to you the customer.

Submit your domain ownership certifications along with your application and application fees. Once you’re accepted, you’ll be online in no time!

Get Connected

You now need to pick an ISP to provide access to your domain. If you went with a ISP as a host already, then the decision has been made. If you went with an independent hosting provider, the great news is that you have lots of choices! You can choose from Verizon or Comcast/Time Warner. Actually, your choice may be more limited, because ISP’s really prefer exclusive contracts with their hosting providers, so that decision may have already been made for you too (wasn’t that easy?).

Submit your hosting provider’s ID (HPID) along with your unniversal connection number (UCN) to the ISP, along with your access application and associated fees. You’ll need to select your bandwidth class (make sure you pick a big enough pipe - especially if you’re idea is as great as you think!) and access period. Be careful of overage fees if you end up with a huge burst of traffic - this is where the ISP makes a lot of their money (just like overage minutes on your phone bill) - but it’s your fault for not choosing a high enough bandwidth class in the first place. You’ll also need to choose a delivery preference level - this is what affects connection speed and is based on the number of other clients in higher preference levels and the traffic they’re getting. With limited resources - it makes sense that sites that pay more should have less latency.

Just select which contries you’d like to be able to access your site (some may be off-limits depending on the type of content you’re providing), and you’re almost done.

If you want to serve up images, video, or sound - good news! Your ISP has invested in some great media hosting services that you get to utilize (usage is mandatory for all “rich media assets”). Just make sure you picked a good bandwidth class and delivery preference level - media files can be big and really expensive to deliver.

Finally - if you want to accept in-site payments, there’s good news for that too! Your ISP contract gives you the ability (requirement, really) to utilize some great payment gateway infrastructure. There’s a tiered fee structure, but you’ll never pay more than 10% per transaction, and you’ll never have to worry about complicated payment gateways.


Finally, you’re online! Your site won’t load as fast as Facebook’s, but with your awesome idea, you may one day have enough money to kick that speed up a few tiers.

The number of websites will, thankfully, shrink down to a manageable number. Free content sites will be a thing of the past, but with publishing platforms available online (who better to host your content than the same service that delivers it?) you can find something within your price range.

Some public service sites (non-profits, etc.) will get reduced tier access for free or small administrative costs (if this is you, just be warned that application approval times range between 6 months to 2 years).

It’s a brave new world, but if you follow these easy steps, your site will be online in no time.

So You Think You Found a Technical Co-founder

Posted 18 Feb 2014 to business, startups, opbandit, not science and has Comments

A friend of mine is about to launch a new startup, and thinks he found a potential technical co-founder. There’s lots of advice for what to do if you’re a non-technical person looking for a technical co-founder (earn one, stop looking, date, pay, give up and learn to code, etc.). It’s not clear, though, what you should do when you think you’ve found that special someone to share in your folie à deux.

I think there are some basic questions that you need to ask yourself and a different set of questions you should ask your new potential co-founder. As the co-founder of a current startup, these are the questions I asked of my fellow founder and that I asked myself; it’s been almost 2 years and we still haven’t killed each other, so I figure these may be a good place to start.

For the Potential Tech Co-founder

  1. How much time would he be able to devote to something he has a major stake in?
  2. How long can he go at that rate without taking a paycheck?
  3. What do his current obligations look like - kids? wife? parents he takes care of? Obviously, the fewer the better, though one of the most prolific engineers I know has 9 kids.
  4. Will you be physically working together in the same space? If not, how often can you skype/hangout/etc, and do you already have a good working relationship? While distributed teams can certainly succeed, the process of brainstorming / creating can be much easier if you’re face to face.
  5. Does he think your idea is fucking awesome and brilliant and so cool that he can’t wait to start building?
  6. Can he give examples of projects (either in a company or solo) that he thought were brilliant in the start and that he eventually worked on for more than a year? How did he feel at the end of a year? Enthusiasm certainly fades, but excitement needs to have sustainable cycles to keep him (and you!) motivated in the long term.
  7. How does he feel about uncertainty? Could he work on a project for a year without knowing for sure that it will ever be successful (and maintain his sanity)?
  8. Has he ever had a job that lasted more than 6 months/a year/2 years where he worked with the same small group of people every day?
  9. Has he failed at a startup before? A “yes” answer is better than “no” - but it should come with thoughful reasoning about why the company failed.

For Yourself

  1. Do you get along with this guy well enough that you would trust him with the details of your bank account? Your passwords to every service online? If not, what would it take to get to that point?
  2. Can you communicate well enough that you both clearly understand each other (at least most of the time)?
  3. Is he able to hack things together (done right now is better than perfect later) and JSIO? Bascially, is he a motherfucking programmer? Does his github account have more than 10 repositories? If not, then maybe this person would be better at a later stage (when you need a manager who knows a little about engineering).
  4. Does your potential tech co-foudner have the sense to know when he’s out of his depth? Many skillsets are needed when you only have a handful (or 2!) people, but eventually specialization will be necessary (if you’re successful). Will he know when it’s time to bring in someone else to help scale your systems, for instance?

These are, of course, not exhaustive, but hopefully they provide a good starting point.

Good luck!

Kademlia: A DHT in Python

Posted 14 Feb 2014 to python, kademlia, dht and has Comments

A distributed hash table (DHT) is a decentralized dictionary that is comprised of many nodes that each store a portion of a key/value lookup table. Any participating node can write to and read from the entire hash table.

The Kademlia distributed hash table is one of the better known DHT descriptions, and it’s used by BitTorrent for trackerless torrents and by the Gnutella network (originally “LimeWire”).

I couldn’t find a good implementation in Python (that followed the paper and wasn’t buggy), so I wrote one. Naturally, it uses Twisted to provide asynchronous communication. The nodes communicate using RPC over UDP to communiate, meaning that it is capable of working behind a NAT.

The library aims to be as close to a reference implementation of the Kademlia paper as possible.

Check out the code and examples here -

Leaders on Horses

Posted 20 Dec 2013 to not science and has Comments <h2 id="putin">Putin</h2>





Gurbanguli (President of Turkmenistan)

Toquen: Capistrano 3, Chef-solo, and AWS

Posted 16 Dec 2013 to sysadmin, aws, capistrano, chef and has Comments

A Toque

Toquen combines Capistrano 3, Chef, and AWS instance tags into one bundle of joy. Instance roles are stored in AWS tags and Toquen can suck those out, put them into data bags for chef, and create stages in capistrano. You can then selectively run chef on individual servers or whole roles that contain many servers with simple commands.

A Toque is a chef’s cap. Chef + cap = Toque.


Toquen is a gem - and simply extends capistrano with tasks to make AWS tag information (roles, names, etc) available both within Chef as well as in capistrano as stages. You can then run chef-solo on single servers, all servers with a given role, or on all servers. For instance, the following command will create all relevant stages for capistrano as well as create a servers data bag:

cap update_roles

And then will run chef-solo on all machines:

cap all cook

There’s also a bootstrapping feature that can be run to initialize a server. The process will:

  1. Update all packages
  2. Sets the hostname to be whatever you set as value for the Name tag in AWS
  3. Set ruby 1.9.3 as the default ruby
  4. Install rubygems
  5. Install the chef and bundler gems
  6. Reboot

Right now the only supported distributions are Ubuntu and Debian, but alternatives like Redhat could easily be added by creating additional templates for the bootstrapping script.


Before beginning, you should already understand how chef-solo works and have some cookbooks, roles defined, and at least a folder for data_bags (even if it’s empty). The rest of this guide assumes you have these ready as well as an AWS PEM key and access credentials.

Generally, it’s easiest if you start off in an empty directory. First, create a file named Gemfile that contains these lines:

source ''
gem 'toquen'

Then, create a file named Capfile that contains the following line:

require 'toquen'

And then on the command line execute:

cap toquen_install

This will create a config directory with a file named deploy.rb. Edit this file, setting the location of your AWS key, AWS credentials, and chef cookbooks/data bags/roles.

Then, in AWS, create an AWS instance tag named “Roles” for each instance, using a space separated list of chef roles as the value. The “Name” tag must also be set or the instance will be ignored.

Then, run:

cap update_roles

This will create a data_bag named servers in your data_bags path that contains one item per server name, as well as create stages per server and role for use in capistrano.

At this point you can run chef-solo using the cook task:

# one server
cap server-<server name> cook

# Or a all the servers with a given role
cap <role name> cook

# Or on all servers
cap all cook


There are a few alternatives (including toque and other toque) out there - but most haven’t yet moved to the magic available in capistrano 3 and none can pull roles out of AWS. Toquen is small and delightful and will play nice if you already have a ton of cap tasks.

To see the rest of the docs, check out the toquen github page.

Apache mod_auth_openid 0.8

Posted 22 Oct 2013 to mod auth openid project and has Comments

It’s been such a long time, and I finally got around to releasing a new version of mod_auth_openid (downloadable here). It has dual support for both Apache 2.2 and 2.4. Apache 2.4 support was a long time coming (thanks to osteenbergen for the help).