Cache Hierarchies _ _ _


Getting your caches to talk to each other can be confusing, boggling, occasionally painful and sometimes extremely worthwhile. Hierarchies may not be what you need, but in some cases they help tremendously. Here are a few examples along with the basic theory.

When a cache gets a query from any other machine (be it a browser or a client) it checks to see if the object (ie the page, graphic, executable or whatever the client requested) is on it's disk.

If it is, it checks if the document is up to date and then proceeds to feed it back to the client. If the object isn't there, or is out of date, it checks to see if any of it's parents or siblings has it (I will explain the difference in a moment). It does this by simultaneously (or as simultaneously as possible) sending UDP packets to these machines with the URL in them.

The other servers then check if they have the object on their disks, and then send a message back saying "I have it" (or "I don't have it" as the case may be).

The original machine waits for the answers to come back and then decides which cache it should get the object from, or if it should just go straight off and get it itself.

Parent's vs Siblings

If you configure your cache to have (only) siblings your cache will send UDP queries to the list of siblings, and if they don't have the page squid will connect directly to the remote (origin) web/ftp server.

If you configure your cache to have a parent it means that if you don't have the object, and none of your siblings do, it will open a TCP connection to the parent server and ask it to get the page on your behalf. Since it's a TCP connection, this parent will possibly go through the motions of checking it's parents and siblings if it doesn't have the page on disk. (sending a UDP query to another cache doesn't get it to check with any parents or siblings - it only checks the on disk cache) What can make things confusing is having multiple parents and siblings... you can have 2 parent caches... how does squid know which one to connect to to get a page?

Obviously, if only one parent has a page it will download it from there, but if none have it your cache will give the request to the machine that responded fastest - believing that machine to be the least loaded or to have the best link. You can weight parents too - give a parent a high weight and it will get the most requests, unless it becomes excessivly slow. You can also set up a default parent, so that if none of your parents have the object it will use the default to forward the query to.

In this example:

cache_host parent 3128 3130
there is only one cache that you are talking to, and it is unlikely to be down. In this case sending a UDP query can just waste time, so a possibly better configuration could be:
cache_host parent 3128 3130 no-query default
That means that your cache won't send ICP (UDP) packets to it to see if it's up, and will connect to it for all queries. This is almost identical to the hierarchy system that the Netscape proxy supports. (Microsoft's cache server doesn't even do this, let alone the other stuff below)

Another very useful feature of ICP is the capability of siblings. Say you don't want to spend a huge amount of money on a single large cache machine, but would rather load balance between more than one machine, but don't want duplicate copies of the pages on every machine. You can set up your machines to speak to one another with the "proxy-only" flags like this:

On cache1

cache_host sibling 3128 3130 proxy-only
On cache2
cache_host sibling 3128 3130 proxy-only
This means that if a query comes to cache1 for a page that's not on it's disk it will send a ICP query to cache2... if it has it it will then download the page from there, and NOT save it on it's disk... it will just get it from cache2 when it needs it again. This can effectively double your disk space without you having to spend huge amounts of money on a raid disk system that will allow for many gigs of disk. Netscape cache can't do this.

The Squid Users guide is copyright Oskar Pearson

If you like the layout (I do), I can only thank William Mee and hope he forgives me for stealing it