Saving a Request

This chapter is for the system administration course only

Table 18 Connotation of Saving a Request
  Rescue Economization Protection
Directors x x  
Health Checks x    
Grace Mode x x  
Retry a Request x    
Saint Mode x    
Tune Backend Properties     x
Access Control Lists (ACL)     x
Compression   X  

Varnish offers many mechanisms to save a request. By saving a request we mean:

  1. Rescue: mechanisms to handle requests when backends are in problematic situations.
  2. Economization: mechanisms to spend less resources, i.e., send less requests to the backend.
  3. Protection: mechanisms to restrict access cache invalidation from unauthorized entities.

Table 18 shows how different mechanisms are mapped to their saving meaning. This chapter explains how to make your Varnish setup more robust by using these mechanisms.


  • Loadable VMOD
  • Contains 1 or more backends
  • All backends must be known
  • Selection methods:
    • round-robin
    • fallback
    • random
      • seeded with a random number
      • seeded with a hash key

Round-robin director example:

vcl 4.0;

import directors;    // load the directors VMOD

backend one {
    .host = "localhost";
    .port = "80";

backend two {
    .host = "";
    .port = "81";

sub vcl_init {
    new round_robin_director = directors.round_robin();

    new random_director = directors.random();
    random_director.add_backend(one, 10);  # 2/3 to backend one
    random_director.add_backend(two, 5);   # 1/3 to backend two

sub vcl_recv {
    set req.backend_hint = round_robin_director.backend();

Varnish can have several backends defined, and it can set them together into clusters for load balancing purposes. Backend directors, usually just called directors, provide logical groupings of similar web servers by re-using previously defined backends. A director must have a name.

There are several different director selection methods available, they are: random, round-robin, fallback, and hash. The next backend to be selected depends on the selection method. You can specify the timeout before unused backend connections are closed by setting the backend_idle_timeout parameter. How to tune this and other parameters is further explained in the Tuning section.

A round-robin director takes only a backend list as argument. This director type picks the first backend for the first request, then the second backend for the second request, and so on. Once the last backend have been selected, backends are selected again from the top. If a health probe has marked a backend as sick, a round-robin director skips it.

A fallback director will always pick the first backend unless it is sick, in which case it would pick the next backend and so on. A director is also considered a backend so you can actually stack directors. You could for instance have directors for active and passive clusters, and put those directors behind a fallback director.

Random directors are seeded with either a random number or a hash key. Next section explains their commonalities and differences.


Health probes are explain in the Health Checks section.


Directors are defined as loadable VMODs in Varnish 4. See the vmod_directors man page for more information and examples.


If you declare backend servers, but do not use them, varnishd returns error by default. You can avoid this situation by turning off the runtime parameter vcc_err_unref. However, this practice is strongly discouraged. Instead, we advise to declare only what you use.

Random Directors

  • Random director: seeded with a random number
  • Hash director: seeded with hash key typically from a URL or a client identity string

Hash director that uses client identity for backend selection

sub vcl_init {
    new h = directors.hash();
    h.add_backend(one, 1);   // backend 'one' with weight '1'
    h.add_backend(two, 1);   // backend 'two' with weight '1'

sub vcl_recv {
    // pick a backend based on the cookie header of the client
    set req.backend_hint = h.backend(req.http.cookie);

The random director picks a backend randomly. It has one per-backend parameter called weight, which provides a mechanism for balancing the selection of the backends. The selection mechanism of the random director may be regarded as traffic distribution if the amount of traffic is the same per request and per backend. The random director also has a director-wide counter called retries, which increases every time the director selects a sick backend.

Both, the random and hash director select a backend randomly. The difference between these two is the seed they use. The random director is seeded with a random number, whereas the hash director is seeded with a hash key.

Hash directors typically use the requested URL or the client identity (e.g. session cookie) to compute the hash key. Since the hash key is always the same for a given input, the output of the hash director is always the same for a given hash key. Therefore, hash directors always select the same backend for a given input. This is also known as sticky session load balancing. You can learn more about sticky sessions in

Hash directors are useful to load balance in front of other Varnish caches or other web accelerators. In this way, cached objects are not duplicated across different cache servers.


In Varnish 3 there is a client director type, which is removed in Varnish 4. This client director type is a special case of the hash director. Therefore, the semantics of a client director type are achieved using hash.backend(client.identity).

Health Checks

  • Poke your web server every N seconds
  • Affects backend selection
  • std.healthy(req.backend_hint)
  • Set using .probe
  • Varnish allows at most .threshold amount of failed probes within a set of the last .window probes
  • varnishlog: Backend_health
backend server1 {
    .host = "";
    .probe = {
        .url = "/healthtest";
        .timeout = 1s;
        .interval = 4s;
        .window = 5;
        .threshold = 3;

You can define a health check for each backend. A health check defines a probe to verify whether a backend replies on a given URL every given interval.

The above example causes Varnish to send a request to every 4 seconds. This probe requires that at least 3 requests succeed within a sliding window of 5 request.

Varnish initializes backends marked as sick. .initial is another variable of .probe. This variable defines how many times the probe must succeed to mark the backend as healthy. The .initial default value is equal to .threshold 1.

When Varnish has no healthy backend available, it attempts to use a graced copy of the cached object that a request is looking for. The next section Grace Mode explains this concept in detail.

You can also declare standalone probes and reuse them for several backends. It is particularly useful when you use directors with identical behaviors, or when you use the same health check procedure across different web applications.

import directors;

probe www_probe {
    .url = "/health";

backend www1 {
    .host = "localhost";
    .port = "8081";
    .probe = www_probe;

backend www2 {
    .host = "localhost";
    .port = "8082";
    .probe = www_probe;

sub vcl_init {
    new www = directors.round_robin();


Varnish does not send a Host header with health checks. If you need that, you can define an entire request using .request instead of .url.

backend one {
    .host = "";
    .probe = {
        .request =
            "GET / HTTP/1.1"
            "Connection: close";


The healthy function is implemented as VMOD in Varnish 4. req.backend.healthy from Varnish 3 is replaced by std.healthy(req.backend_hint). Do not forget to include the import line: import std;

Analyzing health probes

  • Backend_health tag in varnishlog -g raw -i Backend_health

    # varnishlog -g raw -i Backend_health
    0 Backend_health - default Still healthy 4--X-RH 5 3 5 0.012166 0.013693 HTTP/1.0 200 OK
  • varnishadm in Varnish 4.0 or varnishadm backend.list -p in Varnish 4.1:

    Backend default is Healthy
    Current states  good:  5 threshold:  3 window:  5
    Average responsetime of good probes: 0.016226
    Oldest                                                    Newest
    44444444444444444444444444444444444444444444--44----444444444444 Good IPv4
  • varnishadm backend.list:

    Backend name                   Refs   Admin      Probe
    default(,,8081)       1      probe      Healthy 4/5

Every health test is recorded in the shared memory log with 0 VXID (see Transactions). If you want to see Backend_health records in varnishlog, you have to change the default grouping by XVID to raw:

varnishlog -g raw -i Backend_health

Backend_health records are led by 0, which is the VXID number. The rest of the probe record is in the following format:

Backend_health - %s %s %s %u %u %u %f %f %s
                 |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |  +- Probe HTTP response
                 |  |  |  |  |  |  |  +---- Average response time
                 |  |  |  |  |  |  +------- Response time
                 |  |  |  |  |  +---------- Probe window size
                 |  |  |  |  +------------- Probe threshold level
                 |  |  |  +---------------- Number of good probes in window
                 |  |  +------------------- Probe window bits
                 |  +---------------------- Status message
                 +------------------------- Backend name

Most of the fields are self-descriptive, but we clarify next the Probe window bits and Status message.

The Probe window bits field details the last probe with the following format:

%c %c %c %c %c %c %c
|  |  |  |  |  |  |
|  |  |  |  |  |  +- H -- Happy
|  |  |  |  |  +---- R -- Good Received (response from the backend received)
|  |  |  |  +------- r -- Error Received (no response from the backend)
|  |  |  +---------- X -- Good Xmit (Request to test backend sent)
|  |  +------------- x -- Error Xmit (Request to test backend not be sent)
|  +---------------- 6 -- Good IPv6
+------------------- 4 -- Good IPv4

Status message is a two word state indicator, which can be:

  • Still healthy
  • Back healthy
  • Still sick
  • Went sick

Note that Still indicates unchanged state, Back and Went indicate a change of state. The second word, healthy or sick, indicates the present state.

Another method to analyze health probes is by calling varnishadm in Varnish 4.0 or varnishadm backend.list -p in Varnish 4.1. This command presents first data from the last Backend_health log:

Backend default is Healthy
Current states  good:  5 threshold:  3 window:  5
Average responsetime of good probes: 0.016226

and the last 64 window bits of probes:

Oldest                                                    Newest
44444444444444444444444444444444444444444444--44----444444444444 Good IPv4

Demo: Health Probes

See the power of health probes!

Suggested steps for the demo:

  1. Configure a probe as shown in Health Checks.
  2. For Varnish 4.0, run watch -n.5 varnishadm in one terminal
  3. For Varnish 4.1, run watch -n.5 varnishadm backend.list -p in one terminal
  4. Start and stop your backend For this, you might want to simulate very quickly a backend with the command python -m SimpleHTTPServer [port].
  5. The watch command makes the effect of an animated health prober!

Grace Mode

  • A graced object is an object that has expired, but is kept in cache for a given grace time
  • Grace mode is when Varnish uses a graced object
  • Grace mode is a feature to mitigate the accumulation of requests for expired objects
  • Grace mode allows Varnish to build responses from expired objects
  • beresp.grace defines the time that Varnish keeps an object after beresp.ttl has elapsed

The main goal of grace mode is to avoid requests to pile up whenever a popular object has expired in cache. To understand better grace mode, recall Fig. 2 which shows the lifetime of cached objects. When possible, Varnish delivers a fresh object, otherwise Varnish builds a response from a stale object and triggers an asynchronous refresh request. This procedure is also known as stale-while-revalidate.

The typical way to use grace is to store an object for several hours after its TTL has elapsed. In this way, Varnish has always a copy to be delivered immediately, while fetching a new object asynchronously. This asynchronous fetch ensures that graced objects do not get older than a few seconds, unless there are no available backends.

The following VCL code illustrates a typical use of grace:

sub vcl_hit {
    if (obj.ttl >= 0s) {
        # Normal hit
        return (deliver);
   }  elsif (std.healthy(req.backend_hint)) {
        # The backend is healthy
        # Fetch the object from the backend
        return (fetch);
    } else {
        # No fresh object and the backend is not healthy
        if (obj.ttl + obj.grace > 0s) {
            # Deliver graced object
            # Automatically triggers a background fetch
            return (deliver);
        } else {
            # No valid object to deliver
            # No healthy backend to handle request
            # Return error
            return (synth(503, "Backend is down"));

Graced objects are those with a grace time that has not yet expired. The grace time is stored in obj.grace, which default is 10 seconds. You can change this value by three means:

  1. by parsing the HTTP Cache-Control field stale-while-revalidate that comes from the backend,
  2. by setting the variable beresp.grace in VCL, or
  3. by changing the grace default value with varnishadm param.set default_grace <value>.

Varnish 4.1 parses stale-while-revalidate automatically from the Cache-Control header field. For example, when receiving "Cache-Control: max-age=5, stale-while-revalidate=30", Varnish 4.1 sets obj.ttl=5 and obj.grace=30 automatically. To see a working example on how Varnish works with Cache-Control, see the VTC in Understanding Grace using varnishtest.


obj.ttl and obj.grace are countdown timers. Objects are valid in cache as long as they have a positive remaining time equal to obj.ttl + obj.grace.

Timeline Example

Backend response HTTP Cache-Control header field:

"Cache-control: max-age=60, stale-while-revalidate=30"

or set in VCL:

set beresp.ttl = 60s;
set beresp.grace = 30s;
  • 50s: Normal delivery
  • 62s: Normal cache miss, but grace mode possible
  • 80s: Normal cache miss, but grace mode possible
  • 92s: Normal cache miss, object is removed from cache
In this timeline example, it is assumed that the object is never refreshed. If you do not want that objects with a negative TTL are delivered, set beresp.grace = 0. The downside of this is that all grace functionality is disabled, regardless any reason.

Exercise: Grace

  1. Copy the following CGI script in /usr/lib/cgi-bin/test.cgi:

    sleep 10
    echo "Content-type: text/plain"
    echo "Cache-control: max-age=10, stale-while-revalidate=20"
    echo "Hello world"
  2. Make the script executable.

  3. Issue varnishlog -i VCL_call,VCL_return in one terminal.

  4. Test that the script works outside Varnish by typing http http://localhost:8080/cgi-bin/test.cgi in another terminal.

  5. Send a single request, this time via Varnish, to cache the response from the CGI script. This should take 10 seconds.

  6. Send three requests: one before the TTL (10 seconds) elapses, another after 10 seconds and before 30 seconds, and a last one after 30 seconds.

  7. Repeat until you understand the output of varnishlog.

  8. Play with the values of max-age and stale-while-revalidate in the CGI script, and the beresp.grace value in the VCL code.

With this exercise you should see that as long as the cached object is within its TTL, Varnish delivers the cached object as normal. Once the TTL expires, Varnish delivers the graced copy, and asynchronously fetches an object from the backend. Therefore, after 10 seconds of triggering the asynchronous fetch, an updated object is available in the cache.

retry Return Action

  • Available in vcl_backend_response and vcl_backend_error
  • Re-enters vcl_backend_fetch
  • Any changes made are kept
  • Parameter max_retries safe guards against infinite loops
  • Counter bereq.retries registers how many retries are done
sub vcl_backend_response {
    if (beresp.status == 503) {
        return (retry);

The retry return action is available in vcl_backend_response and vcl_backend_error. This action re-enters the vcl_backend_fetch subroutine. This only influences the backend thread, the client-side handling is not affected.

You may want to use this action when the backend fails to respond. In this way, Varnish can retry the request to a different backend. For this, you must define multiple backends.

You can use directors to let Varnish select the next backend to try. Alternatively, you may use bereq.backend to specifically select another backend.

return (retry) increments the bereq.retries counter. If the number of retries is higher than max_retries, control is passed to vcl_backend_error.


In Varnish 3.0 it is possible to do return (restart) after the backend response failed. This is now called return (retry), and jumps back up to vcl_backend_fetch.

Saint Mode

  • Saint mode is implemented as a backend director with the following capabilities:
    • Fine-grained health checks; maintains a blacklist of relations between objects and backends
    • Objects have a blacklist TTL
    • Backends in the blacklist have a threshold of related objects
      • Backends with objects below the threshold can be selected to serve other objects
      • Backends with objects above the threshold are marked as sick for all objects
  • Available in Varnish Cache 4.1 or later

Saint mode complements regular Health Checks by marking backend sicks for specific object. Saint mode is a VMOD that maintains a blacklist of objects and related backends. Each blacklisted object has a TTL, which denotes the time it stays in the blacklist.

If the number of blacklisted objects for a backend are below a threshold, the backend is considered partially sick. Requests for blacklisted objects might be sent to another backend. When the number of blacklisted objects for a backend exceeds a threshold, the backend is marked as sick for all requests.

vcl/saintmode.vcl below is typical usage of saint mode. In this example, a request with a 500 response status would be retried to another backend.

vcl 4.0;

import saintmode;
import directors;

backend server1 { .host = ""; .port = "80"; }
backend server2 { .host = ""; .port = "80"; }

sub vcl_init {
        # create two saint mode backends with threshold of 5 blacklisted objects
        new sm1 = saintmode.saintmode(server1, 5);
        new sm2 = saintmode.saintmode(server2, 5);

        # group the backends in the same cluster
        new fb = directors.fallback();

sub vcl_backend_fetch {
        # get healthy backend from director
        set bereq.backend = fb.backend();

sub vcl_backend_response {
        if (beresp.status > 500) {
                # the failing backend is blacklisted 5 seconds
                # retry request in a different backend
                return (retry);

An alternative is to build the response with a stale object. For that, you would return(abandon), restart the request in vcl_synth, check for req.restarts in vcl_recv. To get a better idea on how to do it, please take a look the stale-if-error snippet in

The fine-grained checks of saint mode help to spot problems in malfunctioning backends. For example, if the request for the object foo returns 200 OK HTTP response without content (Content-Length = 0), you can blacklist that specific object for that specific backend. You can also print the object with std.log and filter it in varnishlog.


For more information, please refer to its own documentation in

Tune Backend Properties

backend default {
    .host = "localhost";
    .port = "80";
    .connect_timeout = 0.5s;
    .first_byte_timeout = 20s;
    .between_bytes_timeout = 5s;
    .max_connections = 50;

If a backend has not enough resources, it might be advantageous to set max_connections. So that a limited number of simultaneous connections are handled by a specific backend. All backend-specific timers are available as parameters and can be overridden in VCL on a backend-specific level.


Varnish only accepts hostnames for backend servers that resolve to a maximum of one IPv4 address and one IPv6 address. The parameter prefer_ipv6 defines which IP address Varnish prefer.

Access Control Lists (ACLs)

  • An ACL is a list of IP addresses
  • VCL programs can use ACLs to define and control the IP addresses that are allowed to purge, ban, or do any other regulated task.
  • Compare with client.ip or server.ip
# Who is allowed to purge....
acl local {
    "localhost";      /* myself */
    ""/24; /* and everyone on the local network */
    !"";  /* except for the dialin router */

sub vcl_recv {
    if (req.method == "PURGE") {
        if (client.ip ~ local) {
            return (purge);
        } else {
            return (synth(405));

An Access Control List (ACL) declaration creates and initializes a named list of IP addresses and ranges, which can later be used to match client or server IP addresses. ACLs can be used for anything. They are typically used to control the IP addresses that are allowed to send PURGE or ban requests, or even to avoid the cache entirely.

You may also setup ACLs to differentiate how your Varnish servers behave. You can, for example, have a single VCL program for different Varnish servers. In this case, the VCL program evaluates server.ip and acts accordingly.

ACLs are fairly simple to create. A single IP address or hostname should be in quotation marks, as "localhost". ACL uses the CIDR notation to specify IP addresses and their associated routing prefixes. In Varnish’s ACLs the slash “/” character is appended outside the quoted IP address, for example ""/24.

To exclude an IP address or range from an ACL, and exclamation mark “!” should precede the IP quoted address. For example !"". This is useful when, for example, you want to include all the IP address in a range except the gateway.


If you declare ACLs, but do not use them, varnishd returns error by default. You can avoid this situation by turning off the runtime parameter vcc_err_unref. However, this practice is strongly discouraged. Instead, we advise to declare only what you use.


  • Where to compress? backend or Varnish?
  • Parameter to toggle: http_gzip_support
  • VCL variable beresp.do_gzip to zip and beresp.do_gunzip to unzip
sub vcl_backend_response {
    if (beresp.http.content-type ~ "text") {
       set beresp.do_gzip = true;
  • Avoid compressing already compressed files
  • Works with ESI

It is sensible to compress objects before storing them in cache. Objects can be compressed either at the backend or your Varnish server, so you have to make the decision on where to do it. Factors that you should take into consideration are:

  • where to store the logic of what should be compressed and what not
  • available CPU resources

Also, keep in mind that files such as JPEG, PNG, GIF or MP3 are already compressed. So you should avoid compressing them again in Varnish.

By default, http_gzip_support is on, which means that Varnish follows the behavior described in and If you want to have full control on what is compressed and when, set the http_gzip_support parameter to off, and activate compression based on specific rules in your VCL code. Implement these rules in vcl_backend_response and then set beresp.do_gzip or beresp.do_gunzip as the example above.

If you compose your content using Edge Side Includes (ESI), you should know that ESI and gzip work together. Next chapter explains how to compose your content using Varnish and Edge Side Includes (ESI).


Compression in Varnish uses and manipulates the Accept-Encoding and Content-Encoding HTTP header fields. Etag validation might also be weakened. Refer to and for all details about compression.