Content Composition

This chapter is for the web-developer course only

This chapter teaches you how to glue content from independent sources into one web page.

  • Cookies and how to work with them
  • Edge Side Includes (ESI) and how to compose a single client-visible page out of multiple objects
  • Combining ESI and Cookies
  • AJAX and masquerading AJAX requests through Varnish

A Typical Website

Most websites follow a pattern: they have easily distinguishable parts:

  • A front page
  • Articles or sub-pages
  • A login-box or “home bar”
  • Static elements, like CSS, JavaScript and graphics

To truly utilize Varnish to its full potential, start by analyzing the structure of the website. Ask yourself this:

  • What makes web pages in your server different from each other?
  • Does the differences apply to entire pages, or only parts of them?
  • How can I let Varnish to know those differences?

Beginning with the static elements should be easy. Previous chapters of this book cover how to handle static elements. How to proceed with dynamic content?

An easy solution is to only cache content for users that are not logged in. For news-papers, that is probably enough, but not for web-shops.

Web-shops re-use objects frequently. If you can isolate the user-specific bits, like the shopping cart, you can cache the rest. You can even cache the shopping cart, if you tell Varnish when to change it.

The most important lessons is to start with what you know.

Cookies

  • Be careful when caching cookies!
  • Cookies are frequently used to identify unique users, or user’s choices.
  • They can be used for anything from identifying a user-session in a web-shop to opting for a mobile version of a web page.
  • Varnish can handle cookies coming from two different sources:
  • req.http.Cookie header field from clients
  • beresp.http.Set-Cookie header field from servers

By default Varnish does not cache a page if req.http.Cookie or beresp.http.Set-Cookie are present. This is for two main reasons: 1) to avoid littering the cache with large amount of copies of the same content, and 2) to avoid delivering cookie-based content to a wrong client.

It is far better to either cache multiple copies of the same content for each user or cache nothing at all, than caching personal, confidential or private content and deliver it to a wrong client. In other words, the worst is to jeopardize users’ privacy for saving backend resources. Therefore, it is strongly advised to take your time to write a correct VCL program and test it thoroughly before caching cookies in production deployments.

Despite cookie-based caching being discouraged, Varnish can be forced to cache content based on cookies. If a client request contains req.http.Cookie, use return (hash); in vcl_recv. If the cookie is a Set-Cookie HTTP response header field from the server, use return (deliver); in vcl_backend_response.

Note

If you need to handle cookies, consider using the cookie VMOD from https://github.com/lkarsten/libvmod-cookie. This VMOD handles cookies with convenient parsing and formatting functions without the need of regular-expressions.

Vary and Cookies

  • Used to cache content that varies on cookies
  • By default, Varnish does not store responses when cookies are involved
  • The Vary response header field can be used to store responses that are based on the value of cookies
  • Cookies are widely used, but not Vary: Cookie

Varnish uses a different hash value for each cached resource. Resources with several representations, i.e. variations containing the Vary response header field, share the same hash value in Varnish. Despite this common hash value, caching based on the Vary: Cookie response header is not advised, because of its poor performance. For a more detailed explanation on Vary, please refer to the Vary subsection.

Note

Consider using Edge Side Includes to let Varnish build responses that combine content with and without cookies, i.e. combining caches and responses from the origin server.

Best Practices for Cookies

  • Remove all cookies that you do not need

  • Organize the content of your web site in a way that let you easily determine if a page needs a cookie or not. For example:

    • /common/ – no cookies
    • /user/ – has user-cookies
    • /voucher/ – has only the voucher-cookie
    • etc.
  • Add the req.http.Cookie request header to the cache hash by issuing hash_data(req.http.cookie); in vcl_hash.

  • Never cache a Set-Cookie header. Either remove the header before caching or do not cache the object at all.

  • To ensure that all cached pages are stripped of Set-Cookie, finish vcl_backend_response with something similar to:

    if (beresp.ttl > 0s) {
        unset beresp.http.Set-cookie;
    }
    

Exercise: Handle Cookies with Vary and hash_data with HTTPie

In this exercise you have to use two cache techniques; first Vary and then hash_data(). The exercise uses the Cookie header field, but the same rules apply to any other field. For that, prepare the testbed and test with HTTPie:

  1. Copy the file material/webdev/cookies.php to /var/www/html/cookies.php.

  2. Send different requests in HTTPie changing /cookies.php and user=Alice for /article.html and user=Bob, e.g.:

    http -p hH http://localhost/cookies.php "Cookie: user=Alice"
    

Vary: Part 1:

  1. Write a VCL program to force Varnish to cache client requests with cookies.
  2. Send two client requests for the same URL; one for user Alice and one for user Bob.
  3. Does Varnish use different backend responses to build and deliver the response to the client?
  4. Make cookies.php send the Vary: Cookie response header field, then analyze the response to the client.
  5. Remove beresp.http.Vary in vcl_backend_response and see if Varnish still honors the Vary header.

Vary: Part 2:

  1. Purge the cached object for resource /cookies.php.
  2. Check if it affects all, none or just one of the objects in cache (e.g: change the value of the cookie and see if the PURGE method has purged all of them).

hash_data(): Part 1:

  1. Write another VCL program or add conditions to differentiate requests handled by Vary and hash_data().
  2. Add hash_data(req.http.Cookie); in vcl_hash.
  3. Check how multiple values of Cookie give individual cached objects.

hash_data(): Part 2:

  1. Purge the cache again and check the result after using hash_data() instead of Vary: Cookie.
This exercise is all about Vary and hash mechanisms. These mechanisms can also be tested and learned through varnishtest. If you have time and curious enough, please do the Exercise: Handle Cookies with Vary and hash_data() in varnishtest. After solving these exercises, you will understand very well how Vary and hash_data(); work.

Edge Side Includes

  • What is ESI?
  • How to use ESI?
  • Testing ESI without Varnish
  • ESI has a linear growth complexity
  • Serial ESI available in Varnish Cache
  • Parallel ESI in Varnish Plus only
../_images/esi.png

Fig. 27 Web page assembling using ESI via Varnish

Edge Side Includes or ESI is a small markup language for dynamic web page assembly at the reverse proxy level. The reverse proxy analyses the HTML code, parses ESI specific markup and assembles the final result before flushing it to the client. Fig. 27 depicts this process.

With ESI, Varnish can be used not only to deliver objects, but to glue them together. The most typical use case for ESI is a news article with a most recent news box at the side. The article itself is most likely written once and possibly never changed, and can be cached for a long time. The box at the side with most recent news, however, changes frequently. With ESI, the article can include a most recent news box with a different TTL.

When using ESI, Varnish fetches the news article from a web server, then parses the <esi:include src="/url" /> ESI tag, and fetches the URL via a normal request. Either finding it already cached or getting it from a web server and inserting it into cache.

The TTL of the ESI element can be 5 minutes while the article is cached for two days. Varnish delivers the two different objects in one glued page. Thus, Varnish updates parts independently and makes possible to combine content with different TTL.

Basic ESI usage

Enabling ESI in Varnish is simple enough:

sub vcl_backend_response {
    set beresp.do_esi = true;
}

To include a page in another, the <esi:include> ESI tag is used:

<esi:include src="/url" />

You can also strip off cookies per ESI element. This is done in vcl_recv.

Varnish only supports three ESI tags:

  • <esi:include>: calls the page defined in the src attribute and replaces the ESI tag with the content of src.

  • <esi:remove>: removes any code inside this opening and closing tag.

  • <!--esi ``(content) –>``: Leaves (content) unparsed. E.g., the following does not process the <esi:include> tag:

    <!--esi
        This ESI tag is not processed: <esi:include src="example">
    -->
    

varnishtest is a useful tool to understand how ESI works. The subsection Understanding ESI in varnishtest contains a Varnish Test Case (VTC) using ESI.

Note

Varnish outputs ESI parsing errors in varnishstat and varnishlog.

Example: Using ESI

Copy material/webdev/esi-date.php to /var/www/html/. This file contains an ESI include tag:

<HTML>
<BODY>

<?php
header( 'Content-Type: text/plain' );

print( "This page is cached for 1 minute.\n" );
echo "Timestamp: \n"
. date("Y-m-d H:i:s");
print( "\n" );
?>

<esi:include src="/cgi-bin/date.cgi"/>

</BODY>
</HTML>

Copy material/webdev/esi-date.cgi to /usr/lib/cgi-bin/. This file is a simple CGI that outputs the date of the server:

#! /bin/sh

echo "Content-Type: text/plain"
echo ""
echo "ESI content is cached for 30 seconds."
echo "Timestamp: "
date "+%Y-%m-%d %H:%M:%S"

For ESI to work, load the following VCL code:

sub vcl_backend_response {
    if (bereq.url == "/esi-date.php") {
        set beresp.do_esi = true;   // Do ESI processing
        set beresp.ttl = 1m;        // Sets a higher TTL main object
    } elsif (bereq.url == "/cgi-bin/esi-date.cgi") {
        set beresp.ttl = 30s;       // Sets a lower TTL on
                                    // the included object
    }
}

Then reload your VCL (see Table 6 for reload instructions) and issue the command http http://localhost/esi-date.php. The output should show you how Varnish replaces the ESI tag with the response from esi-date.cgi. Note the different TTLs from the glued objects.

Exercise: Enable ESI and Cookies

  1. Use material/webdev/esi-top.php and material/webdev/esi-user.php to test ESI.
  2. Visit esi-top.php and identify the ESI tag.
  3. Enable ESI for esi-top.php in VCL and test.
  4. Strip all cookies from esi-top.php and make it cache.
  5. Let esi-user.php cache too. It emits Vary: Cookie, but might need some help.

See the suggested solutions of Exercise: Handle Cookies with Vary and hash_data() in varnishtest to get an idea on how to solve this exercise. Try to avoid return (hash); in vcl_recv and return (deliver); in vcl_backend_response as much as you can. This is a general rule to make safer Varnish setups.

During the exercise, make sure you understand all the cache mechanisms at play. You can also try removing the Vary: Cookie header from esi-user.php.

You may also want to try PURGE. If so, you have to purge each of the objects, because purging just /esi-top.php does not purge /esi-user.php.

Testing ESI without Varnish

  • Test ESI Using JavaScript to fill in the blanks.

During development of different web pages to be ESI-glued by Varnish, you might not need Varnish all the time. One important reason for this, is to avoid caching during the development phase. There is a solution based on JavaScript to interpret ESI syntax without having to use Varnish at all. You can download the library at the following URL:

Once downloaded, extract it in your code base, include esiparser.js and include the following JavaScript code to trigger the ESI parser:

$(document).ready( function () { do_esi_parsing(document); });

Masquerading AJAX requests

../_images/ajaxok.png ../_images/ajaxko.png
This works This does not work
With AJAX it is not possible by default to send requests across another domain. This is a security restriction imposed by browsers. If this represents an issue for your web pages, you can be easily solve it by using Varnish and VCL.

Exercise: write a VCL that masquerades XHR calls

material/webdev/ajax.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <script type="text/javascript"
            src="http://ajax.googleapis.com/ajax/libs/jquery/1.4/jquery.min.js">
        </script>
        <script type="text/javascript">
            function getNonMasqueraded()
            {
                $("#result").load( "http://www.google.com/robots.txt" );
            }

            function getMasqueraded()
            {
                $("#result").load( "/masq/robots.txt" );
            }
        </script>
    </head>
    <body>
        <h1>Cross-domain Ajax</h1>
        <ul>
            <li><a href="javascript:getNonMasqueraded();">
                Test a non masqueraded cross-domain request
            </a></li>
            <li><a href="javascript:getMasqueraded();">
                Test a masqueraded cross-domain request
            </a></li>
        </ul>

        <h1>Result</h1>
        <div id="result"></div>
    </body>
</html>

Use the provided ajax.html page. Note that function getNonMasqueraded() fails because the origin is distinct to the google.com domain. Function getMasqueraded() can do the job if a proper VCL code handles it. Write the VCL code that masquerades the Ajax request to http://www.google.com/robots.txt.

If you need help, see Solution: Write a VCL that masquerades XHR calls.