Cache Invalidation

  • Cache invalidation is an important part of your cache policy
  • Varnish automatically invalidates expired objects
  • You can proactively invalidate objects with Varnish
  • You should define your cache invalidation rules before caching objects specially in production environments

There are four mechanisms to invalidate caches in Varnish:

  1. HTTP PURGE
  • Use the vcl_purge subroutine
  • Invalidate caches explicitly using objects’ hashes
  • vcl_purge is called via return(purge) from vcl_recv
  • vcl_purge removes all variants of an object from cache, freeing up memory
  • The restart return action can be used to update immediately a purged object
  1. Banning
  • Use the built-in function ban(regex)
  • Invalidates objects in cache that match the regular-expression
  • Does not necessarily free up memory at once
  • Also accessible from the management interface
  1. Force Cache Misses
  • Use req.hash_always_miss in vcl_recv
  • If set to true, Varnish disregards any existing objects and always (re)fetches from the backend
  • May create multiple objects as side effect
  • Does not necessarily free up memory at once
  1. Surrogate keys
  • For websites with the need for cache invalidation at a very large scale
  • Varnish Software’s implementation of surrogate keys
  • Flexible cache invalidation based on cache tags
  • Available as hashtwo VMOD in Varnish Plus 4.0
  • Available as xkey VMOD in Varnish Cache 4.1 and later

Purge - Bans - Cache Misses - Surrogate Keys

Which and when to use?

Table 17 Comparison Between: Purge, Softpurge, Bans, Force Cache Misses and Surrogate keys (hashtwo/xkey)
  Purge Soft Purge Bans Force Cache Misses Surrogate keys
Targets Specific object (with all its variants) Specific object (with all its variants) Regex patterns One specific object (with all its variants) All objects with a common hashtwo key
Frees memory Immediately After grace time After pattern is checked and matched No Immediately
Scalability High High High if used properly High High
CLI No No Yes No No
VCL Yes Yes Yes Yes Yes
Availability Varnish Cache Varnish Cache Varnish Cache Varnish Cache Hashtwo VMOD in Varnish Plus 4.0 or xkey VMOD in Varnish Cache 4.1

Whenever you deal with caching, you have to eventually deal with the challenge of cache invalidation, or content update. Varnish has different mechanisms to addresses this challenge, but which one to use?

There is rarely a need to pick only one solution, as you can implement many of them. However, you can try to answer the following questions:

  • Am I invalidating one or many specific objects?
  • Do I need to free up memory or just replace the content?
  • How long time does it take to replace the content?
  • Is this a regular or a one-off task?

or follow these guidelines:

  • If you need to invalidate more than one item at a time, consider using bans or hashtwo/xkey.
  • If it takes a long time to pull content from the backend into Varnish, consider forcing cache misses by using req.hash_always_miss.

The rest of the chapter teaches you more about these cache invalidation mechanisms.

Note

Purge and hashtwo/xkey work very similar. The main difference is that they act on different hash keys.

HTTP PURGE

  • If you know exactly what to remove, use HTTP PURGE
  • Frees up memory, removes all Vary:-variants of the object
  • Leaves it to the next client to refresh the content
  • Often combined with return(restart);
  • As easy as handling any other HTTP request

A purge is what happens when you pick out an object from the cache and discard it along with its variants. A resource can exist in multiple Vary:-variants. For example, you could have a desktop version, a tablet version and a smartphone version of your site, and use the Vary HTTP header field in combination with device detection to store different variants of the same resource.

Usually a purge is invoked through HTTP with the method PURGE. A HTTP PURGE is another request method just as HTTP GET. Actually, you can call the PURGE method whatever you like, but PURGE has become the de-facto naming standard. Squid, for example, uses the PURGE method name for the same purpose.

Purges apply to a specific object, since they use the same lookup operation as in vcl_hash. Therefore, purges find and remove objects really fast!

There are, however, two clear down-sides. First, purges cannot use regular-expressions, and second, purges evict content from cache regardless the availability of the backend. That means that if you purge some objects and the backend is down, Varnish will end up having no copy of the content.

VCL – vcl_purge

  • You may add actions to be executed once the object and its variants is purged
  • Called after the purge has been executed
sub vcl_purge {
    return (synth(200, "Purged"));
}

Note

Cache invalidation with purges is done by calling return (purge); from vcl_recv in Varnish 4. The keyword purge; from Varnish 3 has been retired.

Example: PURGE

vcl/purge.vcl

sub vcl_recv {
    if (req.method == "PURGE"){
       return (purge);
    }
}

In the example above, return (purge) ends execution of vcl_recv and jumps to vcl_hash. When vcl_hash calls return(lookup), Varnish purges the object and then calls vcl_purge.

You can test this code with HTTPie by issuing:

http -p hH --proxy=http:http://localhost PURGE www.example.com

Alternatively, you can test it with varnishtest as in the subsection PURGE in varnishtest.

In order to control the IP addresses that are allowed to send PURGE, you can use Access Control Lists (ACLs). A purge example using ACLs is in the Access Control Lists (ACLs) section.

Exercise: PURGE an article from the backend

  • Send a PURGE request to Varnish from your backend server after an article is published.
    • Simulate the article publication.
    • The result is that the article is evicted in Varnish.

You are provided with article.php, which fakes an article. It is recommended to create a separate php file to implement purging.

article.php

<?php
header("Cache-Control: max-age=10");
$utc = new DateTimeZone("UTC");
$date = new DateTime("now", $utc);
$now = $date->format( DateTime::RFC2822 );
?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head></head>
    <body>
        <h1>This article is cached for 10 seconds</h1>

        <h2>Cache timestamp: <?php echo $now; ?></h2>
        <a href="<?=$_SERVER['PHP_SELF']?>">Refresh this page</a>
    </body>
</html>

If you need help, see Solution: PURGE an article from the backend.

Tip

Remember to place your php files under /var/www/html/.

PURGE with restart return action

  • Start the VCL processing again from the top of vcl_recv
  • Any changes made are kept
acl purgers {
    "127.0.0.1";
    "192.168.0.0"/24;
}

sub vcl_recv {
    # allow PURGE from localhost and 192.168.0...
    if (req.method == "PURGE") {
        if (!client.ip ~ purgers) {
            return (synth(405, "Purging not allowed for " + client.ip));
        }
        return (purge);
    }
}

sub vcl_purge {
    set req.method = "GET";
    return (restart);
}

The restart return action allows Varnish to re-run the VCL state machine with different variables. This is useful in combination with PURGE, in the way that a purged object can be immediately restored with a new fetched object.

Every time a restart occurs, Varnish increments the req.restarts counter. If the number of restarts is higher than the max_restarts parameter, Varnish emits a guru meditation error. In this way, Varnish safe guards against infinite loops.

Warning

Restarts are likely to cause a hit against the backend, so do not increase max_restarts thoughtlessly.

Softpurge

  • Sets TTL to 0
  • Allows Varnish to serve stale content to users if the backend is unavailable
  • Asynchronous and automatic backend fetching to update object

Softpurge is cache invalidation mechanism that sets TTL to 0 but keeps the grace value of a cached object. This is useful if you want to build responses using the cached object while updating it.

Softpurge is a VMOD part of varnish-modules https://github.com/varnish/varnish-modules. For installation and usage details, please refer to its own documentation https://github.com/varnish/varnish-modules/blob/master/docs/vmod_softpurge.rst.

Tip

The xkey VMOD has the softpurge functionality too.

Banning

  • Use ban to invalidate caches on cache hits

  • Frees memory on ban patterns matching

  • Examples in the varnishadm command line interface:

    • ban req.url ~ /foo
    • ban req.http.host ~ example.com && obj.http.content-type ~ text
    • ban.list
  • Example in VCL:

    • ban("req.url ~ /foo");
  • Example of VCL code to act on HTTP BAN request method:

    sub vcl_recv {
        if (req.method == "BAN") {
            ban("req.http.host == " + req.http.host +
                " && req.url == " + req.url);
            # Throw a synthetic page so the request won't go to the backend.
            return(synth(200, "Ban added"));
        }
    }
    

Banning in the context of Varnish refers to adding a ban expression that prohibits Varnish to serve certain objects from the cache. Ban expressions are more useful when using regular-expressions.

Bans work on objects already in the cache, i.e., it does not prevent new content from entering the cache or being served. Cached objects that match a ban are marked as obsolete. Obsolete objects are expunged by the expiry thread like any other object with obj.ttl == 0.

Ban expressions match against req.* or obj.* variables. Think about a ban expression as; “the requested URL starts with /sport”, or “the cached object has a header field with value matching lighttpd”. You can add ban expressions in three ways: 1) VCL code, 2) use a customized HTTP request method, or 3) issuing commands in the varnishadm CLI.

Ban expressions are inserted into a ban-list. The ban-list contains:

  • ID of the ban,
  • timestamp when the ban entered the ban-list,
  • counter of objects that have matched the ban expression,
  • a C flag for completed that indicates whether a ban is invalid because it is duplicated,
  • the ban expression.

To inspect the current ban-list, issue the ban.list command in the CLI:

0xb75096d0 1318329475.377475    10      obj.http.x-url ~ test0
0xb7509610 1318329470.785875    20C     obj.http.x-url ~ test1

Varnish tests bans whenever a request hits a cached object. A cached object is checked against bans added after the last checked ban. That means that each object checks against a ban expression only once.

Bans that match only against obj.* are also checked by a background worker thread called the ban lurker. The parameter ban_lurker_sleep controls how often the ban lurker tests obj.* bans. The ban lurker can be disabled by setting ban_lurker_sleep to 0.

Bans can be free memory in a very scalable manner if used properly. Bans free memory only after a ban expression hits an object. However, since bans do not prevent new backend responses to be inserted in the cache, client requests that trigger the eviction of an object will most likely insert a new one. Therefore, ban lurker banning is more effective when freeing memory, as we shall see next.

Note

You should avoid ban expressions that match against req.*, because these expressions are tested only by client requests, not the ban lurker. In other words, a req.* ban expression will be removed from the ban list only after a request matches it. Consequently, you have the risk of accumulating a very large number of ban expressions. This might impact CPU usage and thereby performance.

Therefore, we recommend you to avoid req.* variables in your ban expressions, and to use obj.* variables instead. Ban expressions using only obj.* are called lurker-friendly bans.

Note

If the cache is completely empty, only the last added ban stays in the ban-list.

Tip

You can also execute ban expressions via the Varnish Administration Console (VAC).

../_images/vac_bans.png

Fig. 26 Executing ban expressions via the Varnish Administration Console (VAC).

Lurker-Friendly Bans

  • Ban expressions that match only against obj.*
  • Evaluated asynchronously by the ban lurker thread
  • Similar to the concept of garbage collection

Ban expressions are checked in two cases: 1) when a request hits a cached object, or 2) when the ban lurker wakes up. The first case is efficient only if you know that the cached objects to be banned are frequently accessed. Otherwise, you might accumulate a lot of ban expressions in the ban-list that are never checked. The second case is a better alternative because the ban lurker can help you keep the ban-list at a manageable size. Therefore, we recommend you to create ban expressions that are checked by the ban lurker. Such ban expressions are called lurker-friendly bans.

Lurker-friendly ban expressions are those that use only obj.*, but not req.* variables. Since lurker-friendly ban expressions lack of req.*, you might need to copy some of the req.* contents into the obj structure. In fact, this copy operation is a mechanism to preserve the context of client request in the cached object. For example, you may want to copy useful parts of the client context such as the requested URL from req to obj.

The following snippet shows an example on how to preserve the context of a client request in the cached object:

sub vcl_backend_response {
   set beresp.http.x-url = bereq.url;
}

sub vcl_deliver {
   # The X-Url header is for internal use only
   unset resp.http.x-url;
}

Now imagine that you just changed a blog post template that requires all blog posts that have been cached. For this you can issue a ban such as:

$ varnishadm ban 'obj.http.x-url ~ ^/blog'

Since it uses a lurker-friendly ban expression, the ban inserted in the ban-list will be gradually evaluated against all cached objects until all blog posts are invalidated. The snippet below shows how to insert the same expression into the ban-list in the vcl_recv subroutine:

sub vcl_recv {
   if (req.method == "BAN") {

   # Assumes the ``X-Ban`` header is a regex,
      # this might be a bit too simple.

      ban("obj.http.x-url ~ " + req.http.x-ban);
      return(synth(200, "Ban added"));
   }
}

Exercise: Write a VCL program using purge and ban

  • Write a VCL program that handles the PURGE and BAN HTTP methods.
  • When handling the BAN method, use the request header fields req.http.x-ban-url and req.http.x-ban-host
  • Use Lurker-Friendly Bans
  • To build further on this, you can also use the REFRESH HTTP method that fetches new content, using req.hash_always_miss, which is explained in the next subsection

To test this exercise, you can use HTTPie:

http -p hH PURGE http://localhost/testpage
http -p hH BAN http://localhost/ 'X-Ban-Url: .*html$' \
                                 'X-Ban-Host: .*\.example\.com'
http -p hH REFRESH http://localhost/testpage

For information on cache invalidation in varnishtest, refer to the subsection Cache Invalidation in varnishtest. If you need help, see Solution: Write a VCL program using purge and ban.

Force Cache Misses

  • set req.hash_always_miss = true; in vcl_recv
  • Causes Varnish to look the object up in cache, but ignore any copy it finds
  • Useful way to do a controlled refresh of a specific object
  • If the server is down, the cached object is left untouched
  • Useful to refresh slowly generated content

Setting a request in pass mode instructs Varnish to always ask a backend for content, without storing the fetched object into cache. The vcl_purge removes old content, but what if the web server is down?

Setting req.has_always_miss to true tells Varnish to look up the content in cache, but always miss a hit. This means that Varnish first calls vcl_miss, then (presumably) fetches the content from the backend, cache the updated object, and deliver the updated content.

The distinctive behavior of req.hash_always_miss occurs when the backend server is down or unresponsive. In this case, the current cached object is untouched. Therefore, client requests that do not enable req.hash_always_miss keep getting the old and untouched cached content.

Two important use cases for using req.hash_always_miss are when you want to: 1) control who takes the penalty for waiting around for the updated content (e.g. a script you control), and 2) ensure that content is not evicted before it is updated.

Note

Forcing cache misses do not evict old content. This means that causes Varnish to have multiple copies of the content in cache. In such cases, the newest copy is always used. Keep in mind that duplicated objects will stay as long as their time-to-live is positive.

Hashtwo/Xkey (Varnish Software Implementation of Surrogate Keys)

  • Hashtwo or xkey are the Varnish Software’s implementation of surrogate keys
  • Hashtwo is available in Varnish Cache Plus 3.x and 4.0 only
  • Xkey is open source and is available in Varnish Cache 4.1 or later
  • Cache invalidation based on cache tags
  • Adds patterns easily to be matched against
  • Highly scalable

The idea is that you can use any arbitrary string for cache invalidation. You can then key your cached objects on, for example, product ID or article ID. In this way, when you update the price of a certain product or a specific article, you have a key to evict all those objects from the cache.

So far, we have discussed purges and bans as methods for cache invalidation. Two important distinctions between them is that purges remove a single object (with its variants), whereas bans perform cache invalidation based on matching expressions. However, there are cases where none of these mechanisms are optimal.

Hashtwo/xkey creates a second hash key to link cached objects based on cache tags. This hash keys provide the means to invalidate cached objects with common cache tags.

In practice, hashtwo/xkey create cache invalidation patterns, which can be tested and invalidated immediately just as purges do. In addition, hashtwo/xkey is much more efficient than bans because of two reasons: 1) looking up hash keys is much more efficient than traversing ban-lists, and 2) every time you test a ban expression, it checks every object in the cache that is older than the ban itself.

The hashtwo and xkey VMOD are pre-built for supported versions and can be installed using regular package managers from the Varnish Software repositories. Once your repository is properly configured, as indicated in Solution: Install Varnish, issue the following commands to install the hashtwo VMOD:

On Debian or Ubuntu:

apt-get install libvmod-hashtwo

On Red Hat Enterprise Linux:

yum install libvmod-hashtwo

Finally, you can use this VMOD by importing it in your VCL code:

import hashtwo;

Xkey is a part of varnish-modules https://github.com/varnish/varnish-modules. For installation and usage details, please refer to its own documentation https://github.com/varnish/varnish-modules/blob/master/docs/vmod_xkey.rst.

Tip

The xkey VMOD has a softpurge function as well.

Example Using Hashtwo or Xkey

  • Use case: E-commerce site

  • Same logic for hashtwo and xkey

  • HTTP response header from web page containing three products: 8155054, 166412 and 234323:

    HTTP/1.1 200 OK
    Server: Apache/2.2.15
    X-HashTwo: 8155054
    X-HashTwo: 166412
    X-HashTwo: 234323
    
  • HTTP request header to purge pages containing product 166412:

    GET / HTTP/1.1
    Host: www.example.com
    X-HashTwo-Purge: 166412
    
  • VCL example code for hashtwo:

    import hashtwo;
    
    sub vcl_recv {
      if (req.http.X-HashTwo-Purge) {
        if (hashtwo.purge(req.http.X-HashTwo-Purge) != 0) {
           return (purge);
        } else {
          return (synth(404, "Key not found"));
        }
      }
    }
    

On an e-commerce site the backend application adds the X-HashTwo HTTP header field for every product that is included in a web page. The header for a certain page might look like the one above. If you use xkey instead of hashtwo, you should rename that header so you do not get confused.

Normally the backend is responsible for setting these headers. If you were to do it in VCL, it will look something like this:

sub vcl_backend_response {
  set beresp.http.X-HashTwo = "secondary_hash_key";
}

In the VCL code above, the hashtwo key to be purged is the value in the X-HashTwo-Purge HTTP header. In order to keep the web pages in sync with the database, you can set up a trigger in your database. In that way, when a product is updated, an HTTP request towards Varnish is triggered. For example, the request above invalidates every cached object with the matching hashtwo header in hashtwo.purge(req.http.X-HashTwo-Purge) or xkey.purge(req.http.X-Key-Purge) for the xkey VMOD.

After purging, Varnish should respond something like:

HTTP/1.1 200 Purged
Date: Thu, 24 Apr 2014 17:08:28 GMT
X-Varnish: 1990228115
Via: 1.1 Varnish

The objects are now cleared.

Warning

You should protect purges with ACLs from unauthorized hosts.