For fun, I setup a couple of Squid proxies in reverse-proxy fashion to see how it performed. Overall, I am happy with the result but a key thought behind the idea is to provide front-end resilience to the resources they publish. To that end, I made what can be best described as a poor-guys CDN. It’s not truly a CDN in the sense of global presence nor the ability to choose a front-end server closer to the end-user (not least because some of that functionality is patent-encumbered), but it does provide some degree of resilience.
Once you have a reverse-proxy setup (which I am not covering here) the key elements are testing the front-end servers and then updating a dynamic-DNS entry with the results. Then you may feel the need to pre-load your caches. This is all simple to achieve – it can be done with some shell scripting, a tiny bit of Perl and the BIND
nsupdate
utility.
Testing front-end availability
To probe each front-end reverse-proxy we use the GET
tool that is installed with the LWP
package in Perl. This provides a very simple no-bells mechanism to perform an HTTP request. We configure ]GET
with a proxy server, that of the reverse-proxy we wish to test, and the URL of a resource that it should succeed in fetching. Since we want to run this often, we choose something small and with minimal CPU impact. You do need to decide whether to fetch a cacheable object or one that will always force a fetch from the origin-server – this depends on how deeply you want to test the system. For this example, we’ll just fetch this blogs favicon.ico
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#!/bin/sh target_url="http://blog.flirble.org/favicon.ico" target_proxies="10.1.0.0.1 10.2.0.0.2" for proxy in ${target_proxies}; do echo "Testing proxy: \"${proxy}\"..." GET -t 10 -P -p "http://${proxy}:${proxy_port}/" -d "${target_url}" && \ final_proxies="${final_proxies} ${proxy}" done # Strip leading space final_proxies=$(echo "${final_proxies}" | sed -e 's/^ //') echo "Final proxies: \"${final_proxies}\"" |
Yes, I am using fake IP addresses there.
Fixing LWP for IPv6
Unfortunately, LWP
doesn’t handle IPv6 very well. There are two workarounds needed to make it work. Firstly, the code that parses HTTP proxy configuration doesn’t understand the URI form http://[2001:0db8::1]:80/
– anything with square brackets makes it croak with Bad http proxy specification
. This is a simple regexp fix, per the diff below. Hopefully this will be fixed in CPAN sometime.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
*** UserAgent-orig.pm Tue Oct 19 10:28:43 2010 --- UserAgent.pm Tue Oct 19 10:15:47 2010 *************** *** 914,920 **** my $url = shift; if (defined($url) && length($url)) { Carp::croak("Proxy must be specified as absolute URI; '$url' is not") unless $url =~ /^$URI::scheme_re:/; ! Carp::croak("Bad http proxy specification '$url'") if $url =~ /^https?:/ && $url !~ m,^https?://\w,; } $self->{proxy}{$key} = $url; $self->set_my_handler("request_preprepare", \&_need_proxy) --- 914,920 ---- my $url = shift; if (defined($url) && length($url)) { Carp::croak("Proxy must be specified as absolute URI; '$url' is not") unless $url =~ /^$URI::scheme_re:/; ! Carp::croak("Bad http proxy specification '$url'") if $url =~ /^https?:/ && $url !~ m,^https?://[\w\[],; } $self->{proxy}{$key} = $url; $self->set_my_handler("request_preprepare", \&_need_proxy) |
The second problem is more fundamental, but also easier to fix. LWP doesn’t know that IPv6 connections use a different module library from IPv4 (IPv6 in Perl is approximately broken in general as a result). Thankfully someone has a workaround for this, the module Net::INET6Glue::INET_is_INET6 (available in Ubuntu/Debian as package libnet-inet6glue-perl
) which does some low-level Perl hackery to make the normal socket routines work for IPv6 also.
Armed with a modified UserAgent.pm and this INET6Glue module, we can do this:
1 2 |
perl -MNet::INET6Glue::INET_is_INET6 $(which GET) -t 10 -P -p "http://[${proxy}]:${proxy_port}/" -d "${target_url}" |
and achieve the desired result.
Dynamic DNS
There are several aspects to updating a DNS record dynamically. You need to have a DNS server that is authoritative for a zone, is the primary authoritative server for that zone, and it must be configured to allow updates for a record, or records, or a sub-zone.
Then you can use a client utility to cause DNS records to be updated with the results of the tests performed above.
DNS server configuration
My setup uses BIND (for better or for worse) and I keep my dynamic records under the zone “dyn.flirble.org“. The record for this reverse-proxy setup has the name “ac“.
The configuration for this zone looks a bit like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
key "ac-key" { algorithm hmac-md5; secret "encryptedkeytexthere=="; }; ... zone "dyn.flirble.org" { type master; file "dynamic/dyn.flirble.org"; allow-update { key ac-key; }; also-notify { 1.2.3.5; ... }; }; |
There are other, stronger, crypto schemes, but this works for my purposes. You can generate a key simply with dnssec-keygen
as follows:
1 2 3 4 5 6 7 |
$ dnssec-keygen -a HMAC-MD5 -b 128 -n HOST ac-key Kac-key.+157+53816 $ cat Kac-key.+157+53816.key ac-key. IN KEY 512 3 157 kmLKD48bOaodPm0vkUyLqQ== $ |
Delete the two files generated when you’re done. That string kmL...Q==
is the one you want. It’s random and you can generate it however you like.
Sending in updates
We’ll use another BIND tool for this, nsupdate
. It works by collecting together a batch of commands and them sending them as a unit to the name server. As a result, the operation is (probably) atomic, meaning you can simply erase the existing records and add the new set without worrying about a period of time when you return no A or AAAA records.
I do this as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
ns_ttl=60 ns_server=1.2.3.4 ns_zone=dyn.flirble.org ns_hostname=ac.${ns_zone} nu_key="ac-key:encryptedkeytexthere==" nu_cmd="/usr/bin/nsupdate -v -y ${nu_key}" ... { echo server ${ns_server} echo zone ${ns_zone} echo update delete ${ns_hostname} A echo update delete ${ns_hostname} AAAA for proxy in ${final_proxies}; do echo update add ${ns_hostname} ${ns_ttl} A ${proxy} done for proxy in ${final_proxies6}; do echo update add ${ns_hostname} ${ns_ttl} AAAA ${proxy} done echo send echo } | ${nu_cmd} |
Putting it all together
Here’s the entire script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
#!/bin/sh ns_ttl=60 ns_server=1.2.3.4 ns_zone=dyn.flirble.org ns_hostname=ac.${ns_zone} nu_key="ac-key:encryptedkeytexthere==" nu_cmd="/usr/bin/nsupdate -v -y ${nu_key}" target_url="http://blog.flirble.org/favicon.ico" target_proxies="10.1.0.1 10.2.0.2" target_proxies6="2001:0db8::1 2001:0db8::2" proxy_port=80 final_proxies="" final_proxies6="" for proxy in ${target_proxies}; do echo "Testing proxy: \"${proxy}\"..." GET -t 10 -P -p "http://${proxy}:${proxy_port}/" -d "${target_url}" && \ final_proxies="${final_proxies} ${proxy}" done for proxy in ${target_proxies6}; do echo "Testing proxy: \"${proxy}\"..." # libnet-inet6glue-perl perl -MNet::INET6Glue::INET_is_INET6 $(which GET) -t 10 -P \ -p "http://[${proxy}]:${proxy_port}/" -d "${target_url}" && \ final_proxies6="${final_proxies} ${proxy}" done final_proxies=$(echo "${final_proxies}" | sed -e 's/^ //') final_proxies6=$(echo "${final_proxies6}" | sed -e 's/^ //') echo "Final proxies: \"${final_proxies}\" \"${final_proxies6}\"" if [ -z "${final_proxies}" ]; then # Oh dear. Just point at them all, in case they come back. final_proxies="${target_proxies}" fi { echo server ${ns_server} echo zone ${ns_zone} echo update delete ${ns_hostname} A echo update delete ${ns_hostname} AAAA for proxy in ${final_proxies}; do echo update add ${ns_hostname} ${ns_ttl} A ${proxy} done for proxy in ${final_proxies6}; do echo update add ${ns_hostname} ${ns_ttl} AAAA ${proxy} done echo send echo } | ${nu_cmd} |
Then you just CNAME your resources to this dynamic entry. Presto, site resilience. I run this script every two minutes from cron
.
Pre-loading the cache
Finally, I also have another simple script that I use to pre-load the contents of the caches. This is as simple as recursively iterating the page structure downloading the contents – in this case using wget
. There are two options to ensure the cache is loaded: A forced load which always fetches from the origin server, or one that validates the cached object and fetches from the origin only if it’s out of date or not loaded.
This script specifically takes only the reverse-proxies currently listed as available by the dynamic DNS method above. At the moment it skips IPv6 addresses since they point to the same servers as the IPv4 addresses and would be redundant.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
#!/bin/sh force=no quiet=yes args= while [ ! -z "$1" ]; do case "$1" in --force|-f) force=yes ;; --noforce) force=no ;; --quiet|-q) quiet=yes ;; --verbose|-v) quiet=no ;; --help|-h|-*) cat << EOT Usage: $0 [options] <url> ... Options: --force Force cache refresh --verbose Be noisy EOT exit 1 ;; *) if [ -z "${args}" ]; then args="$1" else args="${args} $1" fi ;; esac shift done set -- ${args} if [ -z "$1" ]; then echo "You need to give a base url on the command line!" exit 1 fi host=ac.dyn.flirble.org proxies=$(host ${host}. | sort -u | grep -v IPv6 | awk '/address/{print $4;}') if [ -z "${proxies}" ]; then echo "Can't resolve proxies, ${host} is empty!" exit 1 fi echo Proxies to refresh: ${proxies} export no_proxy= export NO_PROXY= export http_proxy= export HTTP_PROXY= export ftp_proxy= export FTP_PROXY= cache_opt="--header=Pragma:max-age=0 --header=Cache-Control:max-age=0" [ "${force}" = yes ] && cache_opt="--no-cache" quiet_opt= [ "${quiet}" = yes ] && quiet_opt="--no-verbose --progress=dot" other_opt="--user-agent=tfo-prefetch --recursive --no-directories --no-parent --delete-after" for url in $*; do for proxy in ${proxies}; do export http_proxy=http://${proxy}:80/ export HTTP_PROXY=${http_proxy} tmpdir=/tmp/force-cache-load rm -rf "${tmpdir}" mkdir -p "${tmpdir}" cd "${tmpdir}" wget ${cache_opt} ${quiet_opt} ${other_opt} "${url}" done done |
Other thoughts
And that’s my poor-guys CDN. It does have limitations, of course. It does not replace origin-server resilience. The reverse-proxy, if it follows RFCs, will frequently re-validate cached objects and if it cannot reach the origin server, will say so. There are tweaks to overcome this but they can have side effects. And there’s of course the issue that the net bandwidth benefit from using a reverse-proxy only comes from cacheable objects, but since images are generally the larger items and generally static, the benefit should be there.
If you’re using something like WordPress and one of the caching plugins, they will make pages look like static html to unregistered users and you may get a caching benefit there, but note that this means that any other plugins have to work well with a static page (server-side functionality is limited on a static cached page!)
It works for me, for now. YMMV.