VCL Subroutines

  • Typical subroutines to customize: vcl_recv, vcl_pass, vcl_backend_fetch, vcl_backend_response, vcl_hash, vcl_hit, vcl_miss, vcl_deliver, and vcl_synth
  • If your VCL subroutine does return, you skip the built-in VCL subroutine
  • The built-in VCL subroutines are always appended to yours

This chapter covers the VCL subroutines where you customize the behavior of Varnish. VCL subroutines can be used to: add custom headers, change the appearance of the Varnish error message, add HTTP redirect features in Varnish, purge content, and define what parts of a cached object is unique. After this chapter, you should know where to add your custom policies and you will be ready to dive into more advanced features of Varnish and VCL.

Note

It is strongly advised to let the default built-in subroutines whenever is possible. The built-in subroutines are designed with safety in mind, which often means that they handle any flaws in your VCL code in a reasonable manner.

Tip

Looking at the code of built-in subroutines can help you to understand how to build your own VCL code. Built-in subroutines are in the file /usr/share/doc/varnish/examples/builtin.vcl.gz or {varnish-source-code}/bin/varnishd/builtin.vcl. The first location may change depending on your distro.

VCL – vcl_recv

  • Normalize client input
  • Pick a backend web server
  • Re-write client-data for web applications
  • Decide caching policy based on client input
  • Access Control Lists (ACL)
  • Security barriers, e.g., against SQL injection attacks
  • Fixing mistakes, e.g., index.htlm -> index.html

vcl_recv is the first VCL subroutine executed, right after Varnish has parsed the client request into its basic data structure. vcl_recv has four main uses:

  1. Modifying the client data to reduce cache diversity. E.g., removing any leading “www.” in the Host: header.
  2. Deciding which web server to use.
  3. Deciding caching policy based on client data. For example; no caching POST requests but only caching specific URLs.
  4. Executing re-write rules needed for specific web applications.

In vcl_recv you can perform the following terminating actions:

pass: It passes over the cache lookup, but it executes the rest of the Varnish request flow. pass does not store the response from the backend in the cache.

pipe: This action creates a full-duplex pipe that forwards the client request to the backend without looking at the content. Backend replies are forwarded back to the client without caching the content. Since Varnish does no longer try to map the content to a request, any subsequent request sent over the same keep-alive connection will also be piped. Piped requests do not appear in any log.

hash: It looks up the request in cache.

purge: It looks up the request in cache in order to remove it.

synth - Generate a synthetic response from Varnish. This synthetic response is typically a web page with an error message. synth may also be used to redirect client requests.

It’s also common to use vcl_recv to apply some security measures. Varnish is not a replacement for intrusion detection systems, but can still be used to stop some typical attacks early. Simple Access Control Lists (ACLs) can be applied in vcl_recv too.

For further discussion about security in VCL, take a look at the Varnish Security Firewall (VSF) application at https://github.com/comotion/VSF. The VSF supports Varnish 3 and above. You may also be interested to look at the Security.vcl project at https://github.com/comotion/security.vcl. The Security.vcl project, however, supports only Varnish 3.x.

Tip

The built-in vcl_recv subroutine may not cache all what you want, but often it’s better not to cache some content instead of delivering the wrong content to the wrong user. There are exceptions, of course, but if you can not understand why the default VCL does not let you cache some content, it is almost always worth it to investigate why instead of overriding it.

Revisiting built-in vcl_recv

sub vcl_recv {
    if (req.method == "PRI") {
        /* We do not support SPDY or HTTP/2.0 */
        return (synth(405));
    }
    if (req.method != "GET" &&
      req.method != "HEAD" &&
      req.method != "PUT" &&
      req.method != "POST" &&
      req.method != "TRACE" &&
      req.method != "OPTIONS" &&
      req.method != "DELETE") {
        /* Non-RFC2616 or CONNECT which is weird. */
        return (pipe);
    }

    if (req.method != "GET" && req.method != "HEAD") {
        /* We only deal with GET and HEAD by default */
        return (pass);
    }
    if (req.http.Authorization || req.http.Cookie) {
        /* Not cacheable by default */
        return (pass);
    }
    return (hash);
}

Example: Basic Device Detection

One way of serving different content for mobile devices and desktop browsers is to run some simple parsing on the User-Agent header. The following VCL code is an example to create custom headers. These custom headers differentiate mobile devices from desktop computers.

sub vcl_recv {
    if (req.http.User-Agent ~ "iPad" ||
        req.http.User-Agent ~ "iPhone" ||
        req.http.User-Agent ~ "Android") {

        set req.http.X-Device = "mobile";
    } else {
        set req.http.X-Device = "desktop";
    }
}

You can read more about different types of device detection at https://www.varnish-cache.org/docs/trunk/users-guide/devicedetection.html

This simple VCL will create a request header called X-Device which will contain either mobile or desktop. The web server can then use this header to determine what page to serve, and inform Varnish about it through Vary: X-Device.

It might be tempting to just send Vary: User-Agent, but that requires you to normalize the User-Agent header itself because there are many tiny variations in the description of similar User-Agents. This normalization, however, leads to loss of detailed information of the browser. If you pass the User-Agent header without normalization, the cache size may drastically inflate because Varnish would keep possibly hundreds of different variants per object and per tiny User-Agent variants. For more information on the Vary HTTP response header, see the Vary section.

Note

If you do use Vary: X-Device, you might want to send Vary: User-Agent to the users after Varnish has used it. Otherwise, intermediary caches will not know that the page looks different for different devices.

Exercise: Rewrite URL and Host Header Fields

  1. Copy the Host header field (req.http.Host) and URL (req.url) to two new request headers: req.http.x-host and req.http.x-url.
  2. Ensure that www.example.com and example.com are cached as one, using regsub().
  3. Rewrite all URLs under http://sport.example.com to http://example.com/sport/. For example: http://sport.example.com/index.html to http://example.com/sport/index.html.
  4. Use HTTPie to verify the result.
  • Extra: Make sure / and /index.html are cached as one object.
  • Extra 2: Make the redirection work for any domain with sport. at the front. E.g: sport.example.com, sport.foobar.example.net, sport.blatti, etc.

For the first point, use set req.http.headername = "value"; or set req.http.headername = regsub(...);.

In point 2, change req.http.host by calling the function regsub(str, regex, sub). str is the input string, in this case, req.http.host. regex is the regular-expression matching whatever content you need to change. Use ^ to match what begins with www, and \. to finish the regular-expression, i.e. ^www.. sub is what you desire to change it with, an empty string "" can be used to remove what matches regex.

For point 3, you can check host headers with a specific domain name, for example: if (req.http.host == "sport.example.com"). An alternative is to check for all hosts that start with sport, regardless the domain name: if (req.http.host ~ "^sport\."). In the first case, setting the host header is straight forward: set req.http.host = "example.com". In the second case, you can set the host header by removing the string that precedes the domain name set req.http.host = regsub(req.http.host,"^sport\.", ""); Finally, you rewrite the URL in this way: set req.url = regsub(req.url, "^", "/sport");.

To simulate client requests, you can either use HTTPie or varnishtest. If you need help, see Solution: Rewrite URL and Host Header Fields.

Tip

Remember that man vcl contains a reference manual with the syntax and details of functions such as regsub(str, regex, sub). We recommend you to leave the default VCL file untouched and create a new file for your VCL code. Remember to update the location of the VCL file in the Varnish configuration file and reload it.

VCL – vcl_pass

  • Called upon entering pass mode
sub vcl_pass {
    return (fetch);
}

The vcl_pass subroutine is called after a previous subroutine returns the pass action. This actions sets the request in pass mode. vcl_pass typically serves as an important catch-all for features you have implemented in vcl_hit and vcl_miss.

vcl_pass may return three different actions: fetch, synth, or restart. When returning the fetch action, the ongoing request proceeds in pass mode. Fetched objects from requests in pass mode are not cached, but passed to the client. The synth and restart return actions call their corresponding subroutines.

hit-for-pass

  • Used when an object should not be cached
  • hit-for-pass object instead of fetched object
  • Has TTL

Some requested objects should not be cached. A typical example is when a requested page contains the Set-Cookie response header, and therefore it must be delivered only to the client that requests it. In this case, you can tell Varnish to create a hit-for-pass object and stores it in the cache, instead of storing the fetched object. Subsequent requests are processed in pass mode.

When an object should not be cached, the beresp.uncacheable variable is set to true. As a result, the cacher process keeps a hash reference to the hit-for-pass object. In this way, the lookup operation for requests translating to that hash find a hit-for-pass object. Such requests are handed over to the vcl_pass subroutine, and proceed in pass mode.

As any other cached object, hit-for-pass objects have a TTL. Once the object’s TTL has elapsed, the object is removed from the cache.

VCL – vcl_backend_fetch

sub vcl_backend_fetch {
    return (fetch);
}

vcl_backend_fetch can be called from vcl_miss or vcl_pass. When vcl_backend_fetch is called from vcl_miss, the fetched object may be cached. If vcl_backend_fetch is called from vcl_pass, the fetched object is not cached even if obj.ttl or obj.keep variables are greater than zero.

A relevant variable is bereq.uncacheable. This variable indicates whether the object requested from the backend may be cached or not. However, all objects from pass requests are never cached, regardless the bereq.uncacheable variable.

vcl_backend_fetch has two possible terminating actions, fetch or abandon. The fetch action sends the request to the backend, whereas the abandon action calls the vcl_synth subroutine. The built-in vcl_backend_fetch subroutine simply returns the fetch action. The backend response is processed by vcl_backend_response or vcl_backend_error depending on the response from the server.

If Varnish receives a syntactically correct HTTP response, Varnish pass control to vcl_backend_response. Syntactically correct HTTP responses include HTTP 5xx error codes. If Varnish does not receive a HTTP response, it passes control to vcl_backend_error.

VCL – vcl_hash

  • Defines what is unique about a request.
  • vcl_hash is always visited after vcl_recv or when another subroutine returns the hash action keyword.
sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    return (lookup);
}

vcl_hash defines the hash key to be used for a cached object. Hash keys differentiate one cached object from another. The default VCL for vcl_hash adds the hostname or IP address, and the requested URL to the cache hash.

One usage of vcl_hash is to add a user-name in the cache hash to identify user-specific data. However, be warned that caching user-data should only be done cautiously. A better alternative might be to hash cache objects per session instead.

The vcl_hash subroutine returns the lookup action keyword. Unlike other action keywords, lookup is an operation, not a subroutine. The next state to visit after vcl_hash depends on what lookup finds in the cache.

When the lookup operation does not match any hash, it creates an object with a busy flag and inserts it in cache. Then, the request is sent to the vcl_miss subroutine. The busy flag is removed once the request is handled, and the object is updated with the response from the backend.

Subsequent similar requests that hit busy flagged objects are sent into a waiting list. This waiting list is designed to improve response performance, and it is explain the Waiting State section.

Note

One cache hash may refer to one or many object variations. Object variations are created based on the Vary header field. It is a good practice to keep several variations under one cache hash, than creating one hash per variation.

VCL – vcl_hit

  • Executed after the lookup operation, called by vcl_hash, finds (hits) an object in the cache.
sub vcl_hit {
    if (obj.ttl >= 0s) {
        // A pure unadultered hit, deliver it
        return (deliver);
    }
    if (obj.ttl + obj.grace > 0s) {
        // Object is in grace, deliver it
        // Automatically triggers a background fetch
        return (deliver);
    }
    // fetch & deliver once we get the result
    return (fetch);
}

The vcl_hit subroutine typically terminate by calling return() with one of the following keywords: deliver, restart, or synth.

deliver returns control to vcl_deliver if the TTL + grace time of an object has not elapsed. If the elapsed time is more than the TTL, but less than the TTL + grace time, then deliver calls for background fetch in parallel to vcl_deliver. The background fetch is an asynchronous call that inserts a fresher requested object in the cache. Grace time is explained in the Grace Mode section.

restart restarts the transaction, and increases the restart counter. If the number of restarts is higher than max_restarts counter, Varnish emits a guru meditation error.

synth(status code, reason) returns the specified status code to the client and abandon the request.

VCL – vcl_miss

  • Subroutine called if a requested object is not found by the lookup operation.
  • Contains policies to decide whether or not to attempt to retrieve the document from the backend, and which backend to use.
sub vcl_miss {
    return (fetch);
}
The subroutines vcl_hit and vcl_miss are closely related. It is rare that you customize them, because modification of HTTP request headers is typically done in vcl_recv. However, if you do not wish to send the X-Varnish header to the backend server, you can remove it in vcl_miss or vcl_pass. For that case, you can use unset bereq.http.x-varnish;.

VCL – vcl_deliver

  • Common last exit point for all request workflows, except requests through vcl_pipe
  • Often used to add and remove debug-headers
sub vcl_deliver {
    return (deliver);
}

The vcl_deliver subroutine is simple, and it is also very useful to modify the output of Varnish. If you need to remove a header, or add one that is not supposed to be stored in the cache, vcl_deliver is the place to do it.

The variables most useful and common to modify in vcl_deliver are:

resp.http.*
Headers that are sent to the client. They can be set and unset.
resp.status
The status code (200, 404, 503, etc).
resp.reason
The HTTP status message that is returned to the client.
obj.hits
The count of cache-hits on this object. Therefore, a value of 0 indicates a miss. This variable can be evaluated to easily reveal whether a response comes from a cache hit or miss.
req.restarts
The number of restarts issued in VCL - 0 if none were made.

VCL – vcl_synth

  • Used to generate content within Varnish
  • Error messages can be created here
  • Other use cases: redirecting users (301/302 redirects)

vcl/default-vcl_synth.vcl:

sub vcl_synth {
    set resp.http.Content-Type = "text/html; charset=utf-8";
    set resp.http.Retry-After = "5";
    synthetic( {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + resp.status + " " + resp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + resp.status + " " + resp.reason + {"</h1>
    <p>"} + resp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + req.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"} );
    return (deliver);
}

You can create synthetic responses, e.g., personalized error messages, in vcl_synth. To call this subroutine you do:

return (synth(status_code, "reason"));

Note that synth is not a keyword, but a function with arguments.

You must explicitly return the status code and reason arguments for vcl_synth. Setting headers on synthetic response bodies are done on resp.http.

Note

From vcl/default-vcl_synth.vcl, note that {" and "} can be used to make multi-line strings. This is not limited to the synthetic() function, but one can be used anywhere.

Note

A vcl_synth defined object is never stored in cache, contrary to a vcl_backend_error defined object, which may end up in cache. vcl_synth and vcl_backend_error replace vcl_error from Varnish 3.

Example: Redirecting requests with vcl_synth

sub vcl_recv {
    if (req.http.host == "www.example.com") {
        set req.http.location = "http://example.com" + req.url;
        return (synth(750, "Permanently moved"));
    }
}

sub vcl_synth {
    if (resp.status == 750) {
        set resp.http.location = req.http.location;
        set resp.status = 301;
        return (deliver);
    }
}

Redirecting with VCL is fairly easy – and fast. Basic HTTP redirects work when the HTTP response is either 301 Moved Permanently or 302 Found. These response have a Location header field telling the web browser where to redirect.

Note

The 301 response can affect how browsers prioritize history and how search engines treat the content. 302 responses are temporary and do not affect search engines as 301 responses do.

Exercise: Modify the HTTP response header fields

  • Add a header field holding the string HIT if the requested resource was found in cache, or MISS otherwise
  • “Rename” the Age header field to X-Age

Exercise: Change the error message

  • Make the default error message more friendly.
If you need help, see Solution: Change the error message.