Getting started

In this chapter, we will:

  • Install and test a backend
  • Install Varnish
  • Make Varnish and the backend-server work together
  • Cover basic configuration

You want to use packages for your operating system whenever possible.

If the computer you will be using throughout this course has Varnish 3.0.0 or more recent available through the package system, you are encouraged to use that package if you do not feel you need the exercise in installing from source.

We will be using Apache as a web server.

This course is about Varnish, but we need an operating system to test. For the sake of keeping things simple, the course uses Debian as a platform. You will find several references to differences between Debian and Red Hat where they matter the most, but for the most part, this course is independent on the operating system in use.

Configuration

Varnish has two categories of configuration:

  • Command line configuration and tunable parameters
  • VCL

To re-load Varnish configuration, you have several commands:

Command Result
service varnish restart Completely restarts Varnish, using the operating system mechanisms. Your cache will be flushed.
service varnish reload Only reloads VCL. Cache is not affected.
varnishadm vcl.load .. and varnishadm vcl.use .. Can be used to manually reload VCL. The service varnish reload command does this for you automatically.
varnishadm param.set ... Can be used to set parameters without restarting Varnish.

Using the service commands is recommended. It’s safe and fast.

Tunable parameters and command line arguments are used to define how Varnish should work with operating system and hardware in addition to setting some default values, while VCL define how Varnish should interact with web servers and clients.

Almost every aspect of Varnish can be reconfigured without restarting Varnish. Notable exceptions are cache size and location, the username and group that Varnish runs as and hashing algorithm.

While you can change the values, some changes might require restarting the child to take effect (modifying the listening port, for instance) or might not be visible immediately. Changes to how long objects are cached, for instance, usually only take effect after the currently cached objects expire and are fetched again. Issuing param.show <parameter> will give you a description of the parameter, when and how it takes effect and the default and current value.

Command line configuration

-a <[hostname]:port>
 listen address
-f <filename> VCL
-p <parameter=value>
 set tunable parameters
-S <secretfile>
 authentication secret for management
-T <hostname:port>
 Management interface
-s <storagetype,options>
 where and how to store objects

All the options that you can pass to the varnishd binary are documented in the varnishd(1) manual page (man varnishd). You may want to take a moment to skim over the options mentioned above.

The only option that is strictly needed to start Varnish is the -f to specify a VCL file.

Though they are not strictly required, you almost always want to specify a -s to select a storage backend, -a to make sure Varnish listens for clients on the port you expect and -T to enable a management interface, often referred to as a telnet interface.

Both for -T and -a, you do not need to specify an IP, but can use :80 to tell Varnish to listen to port 80 on all IPs available. Make sure you don’t forget the colon, as -a 80 will tell Varnish to listen to the IP with the decimal-representation “80”, which is almost certainly not what you want. This is a result of the underlying function that accept this kind of syntax.

You can specify -p for parameters multiple times. The workflow for tuning Varnish parameters usually means that you first try the parameter on a running Varnish through the management interface to find the value you want, then store it in a configuration file that will pass it to Varnish with -p next time you start it up. We will look at these files later on.

The -S option specifies a file which contains a secret to be used for authentication. This can be used to authenticate with varnishadm -S as long as varnishadm can read the same secret file - or rather the same content: The content of the file can be copied to another machine to allow varnishadm to access the management interface remotely.

Note

It is possible to start Varnish without a VCL file using the -b option instead of -f:

-b <hostname:port>
 backend address

Since the -b option is mutually exclusive with the -f option, we will only use the -f option. You can use -b if you do not intend to specify any VCL and only have a single web server.

Configuration files

Most Varnish-installations use two configuration-files. One of them is used by the operating system to start Varnish, while the other contains your VCL.

File Usage
/etc/default/varnish Used for parameters and command line arguments. When you change this, you need to run service varnish restart for the changes to take effect. On RedHat-based OS’s, this is kept in /etc/sysconfig/varnish instead.
/etc/varnish/default.vcl The VCL file. You can change the file name by editing /etc/default/varnish if you want to, but it’s normal to use the default name. This contains your VCL and backend-definitions. After changing this, you can run either service varnish reload, which will not restart Varnish, or you can run service varnish restart, which empties the cache.

There are other ways to reload VCL and make parameter-changes take effect, mostly using the varnishadm tool. However, using the service varnish reload and service varnish restart commands is a good habit.

Note

If you want to know how the service varnish-commands work, you can always look at the script that runs behind the scenes. If you are used to UNIX-like systems, it will come as no surprise that the script can be found in /etc/init.d/varnish.

Warning

The script-configuration (located in /etc/sysconfig or /etc/default) is directly sourced as a shell script. Pay close attention to any backslashes (\) and quotation marks that might move around as you edit the DAEMON_OPTS environmental variable.

Defining a backend in VCL

/etc/varnish/default.vcl

backend default {
   .host = "localhost";
   .port = "8080";
}

In Varnish terminology, a backend-server is whatever server Varnish talks to to fetch content. This can be any sort of service as long as it understands HTTP. Most of the time, Varnish talks to a web server or an application frontend server.

You almost always want to use VCL so we might as well get started.

The above example defines a backend named default. The name default is not special, and the real default backend that Varnish will use is the first backend you specify.

You can specify many backends at the same time, but for now, we will only specify one to get started.

Exercise: Installation

You can install packages on Debian with apt-get install <package>. E.g: apt-get install apache2. For Red Hat, the tool would be yum install <package>.

  1. Install apache2 and verify it works by browsing to http://localhost/. You probably want to change localhost with whatever the hostname of the machine you’re working is.
  2. Change Apache’s ports from 80 to 8080, in /etc/apache2/ports.conf and /etc/apache2/sites-enabled/000-default.
  3. Install Varnish
  4. Modify the Varnish configuration file so Varnish listens on port 80, has a management interface on port 1234 and uses 127.0.0.1:8080 as the backend.
  5. Start Varnish using service varnish start.

The end result should be:

Service Result Related config-files
Apache Answers on port 8080 /etc/apache2/ports.conf and /etc/apache2/sites-enabled/000-default
Varnish Answers on port 80 /etc/default/varnish
Varnish Talks to apache on localhost:8080 /etc/varnish/default.vcl

Varnish Software and the Varnish community maintains a package repository for several common GNU/Linux distributions. If your system does not have sufficiently up-to-date packages, visit https://www.varnish-cache.org/releases and find a package for your distribution.

Once you have modified the /etc/default/varnish-file, it should look something like this (comments removed):

NFILES=131072
MEMLOCK=82000
INSTANCE=$(uname -n)
DAEMON_OPTS="-a :80 \
             -T localhost:1234 \
             -f /etc/varnish/default.vcl \
             -s malloc,256m"

Tip

You can get an overview over services listening on TCP ports by issuing the command netstat -nlpt.

Exercise: Fetch data through Varnish

  1. Install libwww-perl
  2. Execute GET -Used http://localhost:80/ (on the command line)
  3. Compare the results from multiple executions.

GET and HEAD is actually the same tool; lwp-request. A HTTP HEAD request tells the web server - or Varnish in this case - to only reply with the HTTP headers, while GET returns everything.

GET -Used tells lwp-request to do a GET-request, print the request headers (U), print the response status code (s), which is typically “200 OK” or “404 File not found”, print the response headers “-e” and finally to not display the content of the response. Feel free to try removing some of the options observe the effect.

GET is also useful to generate requests with custom headers, as you can supply extra headers with -H "Header: value", which can be used multiple times.

You may also be familiar with firebug, an add-on for Firefox used for web development and related affairs. This too can show you the response headers.

Web browsers have their own cache which you may not immediately be able to tell if you’re using or not. It’s often helpful to double-check with GET or HEAD if you are in doubt if what you’re seeing is coming from Varnish or is part of your browser cache.

Log data

Varnish provides a great deal of log data in real-time. The two most important tools to process that log data is:

  • Varnishlog, used to access request-specific data (An extended access log, provides information about specific clients and requests.).
  • varnishstat, used to access global counters (Provides overall statistics, e.g the number of total requests, number of objects and more.).
  • If you have multiple Varnish instances on the same machine, you need to specify -n <name> both when starting Varnish and when starting the corresponding tools.

In addition the varnishncsa-tool is often used to write apache-like log files.

If you look for logging data for Varnish you may discover that /var/log/varnish/ is either non-existent or empty. There’s a reason for that.

Varnish logs all its information to a shared memory log which is overwritten repeatedly every time it’s filled up. To use the log data, you need to use specific tools to parse the content.

The downside is that you don’t have historic data unless you set it up yourself, which is not covered in this chapter, but the upside is that you get an abundance of information when you need it.

Through the course you will become familiar with varnishlog and varnishstat, which are the two most important tools you have at your disposal.

Note

If you want to log to disk you should take a look at /etc/default/varnishlog or /etc/default/varnishncsa (or the syconfig equivalents). This will allow you to run varnishncsa or varnishlog as a service in the background that writes to disk.

Keep in mind that varnishlog generates large amounts of data, though. You may not want to log all of it to disk.

Note

All log tools (and varnishadm) takes an -n option. Varnish itself also takes a -n option. This is used to specify a name for varnishd, or the location of the shared memory log. On most installations -n is not used, but if you run multiple Varnish instances on a single machine you need to use -n to distinguish one varnish-instance from another.

varnishlog

97 ReqStart     c 10.1.0.10 50866 117511506
97 RxRequest    c GET
97 RxURL        c /style.css
97 RxProtocol   c HTTP/1.1
97 RxHeader     c Host: www.example.com
97 VCL_call     c recv lookup
97 VCL_call     c hash hash
97 Hit          c 117505004
97 VCL_call     c hit deliver
97 Length       c 3218
97 VCL_call     c deliver deliver
97 TxProtocol   c HTTP/1.1
97 TxStatus     c 200
97 TxResponse   c OK
97 TxHeader     c Content-Length: 3218
97 TxHeader     c Date: Sat, 22 Aug 2008 01:10:10 GMT
97 TxHeader     c X-Varnish: 117511501 117505004
97 TxHeader     c Age: 2
97 TxHeader     c Via: 1.1 varnish
97 ReqEnd       c 117511501 1227316210.534358978 \
     1227316210.535176039  0.035283089 0.000793934 0.000023127

The above output is a single cache hit, as processed by Varnish. When you are dealing with several thousand requests per second you need filtering.

The displayed data is categorized as follows:

  1. The number on the left is a semi-unique identifier of the request. It is used to distinguish different requests.
  2. Each piece of log information belongs to a tag, as seen on the second left-most column. TxHeader, RxHeader, VCL_call etc. You can use those tags for intelligent filtering.
  3. Varnishlog will try to decipher if a request is related to a client (c), backend (b) or “misc” (-). This can be used to filter the log. The misc-category will contain data related to thread-collection, object expiry and similar internal data.
  4. The tag-specific content. E.g: the actual URL, the name and content of a HTTP header and so on.

Since varnishlog displays all data in the log unless you filter it, there is a lot of data that you can safely ignore, and some data you should focus on. The following table demonstrates some tags and values that are useful. Since the tags them self are somewhat generic, you do not have a “Response header sent to a client”-header, but a “Sent Header” (TxHeader) tag, and it’s up to you to interpret if that means it was sent to a client or a web server.

Varnishlog tag examples

Tag Example value Description
RxURL /index.html Varnish received a URL, the only scenario where Varnish receives a URL is from a client, thus: a client sent us this URL.
TxURL /index.html Varnish sent a URL, the only scenario where Varnish sends a URL is to a backend, thus: this is part of a backend request.
RxHeader Host: www.example.com A received header. Either a request header or a response header backend. Since we know the Host-header is a request header, we can assume it’s from a client.
TxHeader Host: example.com A header Varnish sent. Either a request header or a response header. Since we know the Host-header is a request header we can assume it is a header Varnish sent to a backend.
RxRequest GET Received request method. Varnish only receives requests from clients.
TxStatus 200 Status code Varnish sent. Only sent to clients.
RxStatus 500 Status code Varnish received from a backend.
ReqEnd 1048725851 1352290440.688310385 1352290440.688468695 0.000107288 0.000083208 0.000075102 The “End of request” entry has various timing details for debugging. The first number is the XID, the second is the time the request started and the second is when it finished. The fourth number is time from accepting the connection to processing of the request started. The fifth number is time from request processing started to delivery (e.g: VCL execution and backend fetching). The sixth and last number is how long the delivery itself took.

varnishlog options

-b Only show traffic to backend.
-c Only show traffic to client.
-O Do not group by request.
-m <tag:filter>
 Show requests where the <tag> matches <filter>. Example: varnishlog -m TxStatus:500 to show requests returned to a client with status code 500.
-n <name> The name of the Varnish instance, or path to the shmlog. Useful for running multiple instances of Varnish.
-i <tag[,tag][..]>
 Only show the specified tags.
-I <regex> Filter the tag provided by -i, using the regular expression for -I.

Some examples of useful command-combinations:

Command Description
varnishlog -c -m RxURL:/specific/url/ Only show client-requests for the url /specific/url..
varnishlog -O -i ReqEnd Only show the ReqEnd tag. Useful to spot sporadic slowdown. Watch the last three values of it.
varnishlog -O -i TxURL Only show the URLs sent to backend servers. E.g: Cache misses and content not cached.
varnishlog -O -i RxHeader -I Accept-Encoding Show the Accept-Encoding request header.
varnishlog -b -m TxRequest:POST Show backend requests using the POST method.
varnishlog -O -i TxURL,TxHeader Only shows the URL sent to a backend server and all headers sent, either to a client or backend.

Warning

varnishlog sometimes accept arguments that are technically incorrect, which can have surprising results on filtering. Make sure you double-check the filter logic. You most likely want to specify -b or -c too.

varnishstat

0+00:44:50                                                   foobar
Hitrate ratio:       10      100      175
Hitrate avg:     0.9507   0.9530   0.9532

      574660       241.00       213.63 Client connections accepted
     2525317       937.00       938.78 Client requests received
     2478794       931.00       921.48 Cache hits
        7723         3.00         2.87 Cache hits for pass
      140055        36.00        52.07 Cache misses
       47974        12.00        17.83 Backend conn. success
      109526        31.00        40.72 Backend conn. reuses
       46676         5.00        17.35 Backend conn. was closed
      156211        41.00        58.07 Backend conn. recycles
      110500        34.00        41.08 Fetch with Length
       46519         6.00        17.29 Fetch chunked
         456         0.00         0.17 Fetch wanted close
        5091          .            .   N struct sess_mem
        3473          .            .   N struct sess
       53570          .            .   N struct object
       50070          .            .   N struct objecthead
          20          .            .   N struct vbe_conn

varnishstat gives a good representation of the general health of Varnish, including cache hit rate, uptime, number of failed backend connections and many other statistics.

There are over a hundred different counters available. To increase the usefulness of varnishstat, only counters with a value different from 0 is shown by default.

varnishstat can be executed either as a one-shot tool which simply prints the current values of all the counters, using the -1 option, or interactively. Both methods allow you to specify specific counters using -f field1,field2,... to limit the list.

In interactive mode, varnishstat starts out by printing the uptime(45 minutes, in the example above) and hostname(foobar).

The Hitrate ratio and Hitrate avg are related. The Hitrate average measures the cache hit rate for a period of time stated by hitrate ratio. In the example above, the hitrate average for the last 10 seconds is 0.9507 (or 95.07%), 0.9530 for the last 100 seconds and 0.9532 for the last 175 seconds. When you start varnishstat, all of these will start at 1 second, then grow to 10, 100 and 1000. This is because varnishstat has to compute the average while it is running; there is no historic data of counters available.

The bulk of varnishstat is the counters. The left column is the raw value, the second column is change per second in real time and the third column is change per second on average since Varnish started. In the above example Varnish has served 574660 requests and is currently serving roughly 241 requests per second.

Some counters do not have ‘per second’ data. These are counters which both increase and decrease.

There are far too many counters to keep track of for non-developers, and many of the counters are only there for debugging purposes. This allows you to provide the developers of Varnish with real and detailed data whenever you run into a performance issue or bug. It allows the developers to test ideas and get feedback on how it works in production environments without creating special test versions of Varnish. In short: It allows Varnish to be developed according to how it is used.

In addition to some obviously interesting counters, like cache_hit and client_conn, some counters of note are:

Counter Description
client_drop This counts clients Varnish had to drop due to resource shortage. It should be 0.
cache_hitpass Hitpass is a special type of cache miss. It will be covered in the VCL chapters, but it can often be used to indicate if something the backend sent has triggered cache misses.
backend_fail Counts the number of requests to backends that fail. Should have a low number, ideally 0, but it’s not unnatural to have backend failures once in a while. Just make sure it doesn’t become the normal state of operation.
n_object Counts the number of objects in cache. You can have multiple variants of the same object depending on your setup.
n_wrk, n_wrk_queued, n_wrk_drop Thread counters. During normal operation, the n_wrk_queued counter should not grow. Once Varnish is out of threads, it will queue up requests and n_wrk_queued counts how many times this has happened. Once the queue is full, Varnish starts dropping requests without answering. n_wrk_drop counts how many times a request has been dropped. It should be 0.
n_lru_nuked Counts the number of objects Varnish has had to evict from cache before they expired to make room for other content. If it is always 0, there is no point increasing the size of the cache since the cache isn’t full. If it’s climbing steadily a bigger cache could improve cache efficiency.
esi_errors, esi_warnings If you use Edge Side Includes (ESI), these somewhat hidden counters can be helpful to determine if the ESI syntax the web server is sending is valid.
uptime Varnish’ uptime. Useful to spot if Varnish has been restarted, either manually or by bugs. Particularly useful if a monitor tool uses it.

The management interface

Varnish offers a management interface (Historically called the Telnet interface.), assuming it was started with a -T option. You can use the management interface to:

  • Change parameters without restarting varnish
  • Reload VCL
  • View the most up-to-date documentation for parameters

There are a few other uses too which you can read about using the help-command after you connect to the management interface with varnishadm.

The service varnish reload command uses the management interface to reload VCL without restarting Varnish.

Keep the following in mind when using the management interface:

  1. Any changes you make are done immediately on the running Varnish instance.
  2. Changes are not persistent across restarts of Varnish. If you change a parameter and you want the change to apply if you restart Varnish, you need to also store it in the regular configuration for the boot script.

Because the management interface is not encrypted, only has limited authentication and still allows almost total control over Varnish, it is important to protect it. Using the -S option offers reasonably good access control, but does not protect against more elaborate attacks, like man in the middle attacks – the interface is not encrypted.

The simplest way to protect the management interface is to only have it listen on localhost (127.0.0.1). Combined with the secret file, you can now offer access to the interface on a user-by-user basis by adjusting the read permission on the secret file. The secret file usually lives in /etc/varnish/secret. The content is not a password, but a shared secret (it is never transmitted over the interface).

Note

Newer Varnish-versions will automatically detect the correct arguments for varnishadm using the shared memory log. For older versions, you always had to specify at least the -T-option when using varnishadm.

This automatic detection relies on the -n option since varnishadm needs to find the shared memory log.

For remote access you will always specify -T and -S since a remote varnishadm can’t read the shared memory log.

Exercise: Try out the tools

  1. Run varnishstat and varnishlog while performing a few requests.
  2. Make varnishlog only print client-requests where the RxURL-tag contains /favicon.ico.
  3. Use varnishadm to determine the default value for the default_ttl-parameter, and what it does.

As you are finishing up this exercise, you hopefully begin to see the usefulness of the various Varnish tools. varnishstat and varnishlog are the two most used tools, and are usually what you need for sites that are not in production yet.

The various arguments for varnishlog are mostly designed to help you find exactly what you want, and filter out the noise. On production traffic, the amount of log data that Varnish produces is staggering, and filtering is a requirement for using varnishlog effectively.