Poor-guys CDN (ish)


For fun, I setup a couple of Squid proxies in reverse-proxy fashion to see how it performed. Overall, I am happy with the result but a key thought behind the idea is to provide front-end resilience to the resources they publish. To that end, I made what can be best described as a poor-guys CDN. It’s not truly a CDN in the sense of global presence nor the ability to choose a front-end server closer to the end-user (not least because some of that functionality is patent-encumbered), but it does provide some degree of resilience.

Once you have a reverse-proxy setup (which I am not covering here) the key elements are testing the front-end servers and then updating a dynamic-DNS entry with the results. Then you may feel the need to pre-load your caches. This is all simple to achieve – it can be done with some shell scripting, a tiny bit of Perl and the BIND nsupdate utility.

Testing front-end availability

To probe each front-end reverse-proxy we use the GET tool that is installed with the LWP package in Perl. This provides a very simple no-bells mechanism to perform an HTTP request. We configure ]GET with a proxy server, that of the reverse-proxy we wish to test, and the URL of a resource that it should succeed in fetching. Since we want to run this often, we choose something small and with minimal CPU impact. You do need to decide whether to fetch a cacheable object or one that will always force a fetch from the origin-server – this depends on how deeply you want to test the system. For this example, we’ll just fetch this blogs favicon.ico.

Yes, I am using fake IP addresses there.

Fixing LWP for IPv6

Unfortunately, LWP doesn’t handle IPv6 very well. There are two workarounds needed to make it work. Firstly, the code that parses HTTP proxy configuration doesn’t understand the URI form http://[2001:0db8::1]:80/ – anything with square brackets makes it croak with Bad http proxy specification. This is a simple regexp fix, per the diff below. Hopefully this will be fixed in CPAN sometime.

The second problem is more fundamental, but also easier to fix. LWP doesn’t know that IPv6 connections use a different module library from IPv4 (IPv6 in Perl is approximately broken in general as a result). Thankfully someone has a workaround for this, the module Net::INET6Glue::INET_is_INET6 (available in Ubuntu/Debian as package libnet-inet6glue-perl) which does some low-level Perl hackery to make the normal socket routines work for IPv6 also.

Armed with a modified UserAgent.pm and this INET6Glue module, we can do this:

and achieve the desired result.

Dynamic DNS

There are several aspects to updating a DNS record dynamically. You need to have a DNS server that is authoritative for a zone, is the primary authoritative server for that zone, and it must be configured to allow updates for a record, or records, or a sub-zone.
Then you can use a client utility to cause DNS records to be updated with the results of the tests performed above.

DNS server configuration

My setup uses BIND (for better or for worse) and I keep my dynamic records under the zone “dyn.flirble.org“. The record for this reverse-proxy setup has the name “ac“.

The configuration for this zone looks a bit like this:

There are other, stronger, crypto schemes, but this works for my purposes. You can generate a key simply with dnssec-keygen as follows:

Delete the two files generated when you’re done. That string kmL...Q== is the one you want. It’s random and you can generate it however you like.

Sending in updates

We’ll use another BIND tool for this, nsupdate. It works by collecting together a batch of commands and them sending them as a unit to the name server. As a result, the operation is (probably) atomic, meaning you can simply erase the existing records and add the new set without worrying about a period of time when you return no A or AAAA records.

I do this as follows:

Putting it all together

Here’s the entire script:

Then you just CNAME your resources to this dynamic entry. Presto, site resilience. I run this script every two minutes from cron.

Pre-loading the cache

Finally, I also have another simple script that I use to pre-load the contents of the caches. This is as simple as recursively iterating the page structure downloading the contents – in this case using wget. There are two options to ensure the cache is loaded: A forced load which always fetches from the origin server, or one that validates the cached object and fetches from the origin only if it’s out of date or not loaded.

This script specifically takes only the reverse-proxies currently listed as available by the dynamic DNS method above. At the moment it skips IPv6 addresses since they point to the same servers as the IPv4 addresses and would be redundant.

Other thoughts

And that’s my poor-guys CDN. It does have limitations, of course. It does not replace origin-server resilience. The reverse-proxy, if it follows RFCs, will frequently re-validate cached objects and if it cannot reach the origin server, will say so. There are tweaks to overcome this but they can have side effects. And there’s of course the issue that the net bandwidth benefit from using a reverse-proxy only comes from cacheable objects, but since images are generally the larger items and generally static, the benefit should be there.

If you’re using something like WordPress and one of the caching plugins, they will make pages look like static html to unregistered users and you may get a caching benefit there, but note that this means that any other plugins have to work well with a static page (server-side functionality is limited on a static cached page!)

It works for me, for now. YMMV.