Those of you who follow me on Twitter no doubt noticed that yesterday was no fun at all when it came to this site. Mostly due to some unexpected actions by my web host, the venerable Pair.

Red Alert #1
The first sign of trouble was a biggie. At 1:42PM local time I discovered that this site was completely down. As in, every page would produce an HTTP 500 "Internal Server Error". This is the kind of thing that can make a small ISV just about wet himself. I depend on the site to sell my software, and I depend on software sales to keep my house and buy food and stuff like that.

Now I've just done a completely new site design, with a whole mess of PHP code, and I had been fixing a couple of things in the morning. Naturally, I assumed the problem was my fault, and so I started desperately trying to figure out what I had broken.

This is where I first ran into trouble with Pair. If you're getting internal server errors, the first step is to look at the server error log and see what it's teling you. On Pair you can't get raw logs on a shared server, but their Account Control Center is supposed to show you errors related to your sites. But there was nothing there. As far as the ACC was concerned my site was fine, even though the site kept spewing 500s.

Left with no clues to go on, all I knew was "something's broke!". I checked my local copy of the site (the one that runs on my Mac) versus the live files. I made sure to undo every change I had made since I last knew things were working. I investigated whether the database had been hacked. Nothing.

Asking friends in #macsb for help, one pointed out that Pair had announced a PHP update yesterday, about 90 minutes before I noticed the site was broken:

php5.cgi will be upgraded today to PHP 5.2.2. This upgrade is necessary to patch a security vulnerability that was recently announced.

This will only affect you if you are currently using scripts with .php5 extensions or using php5.cgi through an '.htaccess' file for PHP-CGIWrap.

That's me right there. I run PHP as a CGI rather than an Apache module because it means I don't have to make my site files writable to everyone on the same shared server. Unix gave me file permissions, and I'm going to use them. The above was followed by instructions on how to upgrade to the latest PHP CGI.

It didn't help, though. I got on the phone to Pair. Eventually they managed to get me a working copy of PHP. The site came back up.

Normally I trust Pair to be reliable and not to break things unexpectedly. In this case I think they blew it in a major way. Their PHP upgrade immediately broke my web site, and this was done with no more than 90 minutes notice (the time from when they made the upgrade until I noticed something amiss). The announcement of the change was made on an internal Pair usenet server and on an RSS feed. You can't reasonably expect all of your customers to be constantly watching those. With this sudden, drastic change you need to be calling people's cell phones to alert them. And you can't make a change that's going to break people's web sites without telling them it's going to break the site. The announcement made it sounds like the upgrade was something I should do, not something I must immediately do to prevent my site from going offline. Geez, what if I'd been out of town? The site could have been down for weeks. I'm just lucky I was online to catch this before it went on too long.

Red Alert #2
It turned out that not all was well.

For Chimey and MondoMouse I use Aquatic Prime to generate license files. Aquatic Prime requires some server support if you want these be generated automatically. For me the normal process is:

  • Someone orders the software through my eSellerate-powered web store.
  • eSellerate sends my server an HTTP POST which contains information on the sale.
  • If the POST data looks good, Aquatic Prime generates a license file and emails it to the customer.

And of course, I'm using a PHP implementation of Aquatic Prime.

I soon discovered that although the site was up, my Aquatic Prime code was broken. Gaah! This meant that if someone ordered my software, they wouldn't get a registration code! Oh shit!

This led to much near-panicked research involving having eSellerate send some "preview" mode test sales to the server and new debug code in my copy of Aquatic Prime. I discovered that while eSellerate was correctly sending the POST data, said data was not actually making it into my PHP code. No sale data, no license file, no email.

The stock Aquatic Prime code for working with eSellerate (contributed to the project by yours truly) grabs the POST data out of PHP's global $HTTP_RAW_POST_DATA variable. Esellerate sends XML instead of key/value pairs, so that's a convenient way to get at the XML. But now $HTTP_RAW_POST_DATA was always an empty string.

I tried running a phpinfo() on the new PHP, and discovered that the always_populate_raw_post_data setting was "Off". That certainly explained what I was seeing, because without that you don't get $HTTP_RAW_POST_DATA. But why would it suddenly be off? I guessed that Pair had changed the PHP configuration file without warning, a charge they denied (although it did take several emails back and forth before I could even get someone from support to understand what I was asking about).

Having no useful clues from Pair, or from PHP's release notes or changelog, I proceeded to find a work-around to get the license code going again. My fix is as follows:

if ($_SERVER["REQUEST_METHOD"] == "POST") {
    $HTTP_RAW_POST_DATA = file_get_contents("php://input");
}

This bypasses the always_populate_raw_post_data setting to get at the POST data using a PHP URL wrapper. I'm not sure this is the best solution but it does the job.

That got me running again, but left me with one glaring question: WHY had this happened? If Pair didn't change their PHP config, and if the PHP changelog made no mention of this, why was my script suddenly broken after running correctly for so long?

Fortunately nobody tried to buy my software while the license system was down. But you know you're having a really bad day when you start a sentence with "Fortunately nobody tried to buy my software...", regardless of how you finish the sentence.

Apparently this is a known bug in PHP 5.2.2. And it's not just me who's affected. XML-RPC is broken by this for many people, such as WordPress's implementation (they've arrived at more or less the same solution as me). Drupal's implementation of XML-RPC was already using the php://input approach and is therefore unaffected.

Technically I guess I have to lay this problem on the PHP developers. They seem to have a problem with $HTTP_RAW_POST_DATA, because the changelog indicates that this exact bug was fixed previously in version 5.0.2 and version 5.1. Now it's back for a third round.

But a big part of the reason I'm with Pair in the first place is that I trust them not to stick me with buggy code. They don't always have the latest versions of everything, and that's just fine with me if it means I'm trading currency of releases for stability. I don't need the latest features of everything, I need my site to run reliably. Of course you're unlikely to get a release of something like PHP without there being some known bugs. At the same time, XML-RPC is an enormously popular system, and I would have expected a PHP release that broke it to also have failed Pair's vetting process.

What to do about all this? I've hosted with Pair since 2002. This is the first time I've had any serious trouble with my site since some time in 2004, and even that turned out not to be Pair's fault. And of course they still beat the crap out of places like Dreamhost for reliability. At the same time this is a very disappointing and worrying failure on their part. I'm not looking at switching just yet but I'm a lot less confident in my current setup than I was a week ago.