In our first two Chalk Talk video’s we explained what Varnish is and how it works. In this third Chalk Talk video, we look at how you can optimize your content for Varnish. How do make sure that the right pages are being cached and that the pages are being cached the right way? Because that is the only way to provide your websites visitors with an optimal experience. (Hint: at the bottom you can easily turn on Dutch or English subtitles)
The written text of this Chalk Talk, “How to optimize content for Varnish?
Welcome to Chalk Talk, a series of video’s where we explain complex technologies using a single Chalk Board. In this video we take a look under Varnish’s hood, at the heart, the caching-engine.
Optimizing content for Varnish
In a previous video, you saw that there’s a caching part in the middle of the entire workflow. That is where the magic of Varnish happens. To see exactly how this happens, we’ll pretend to surf to this URL. So we’re surfing to nucleus.be/varnish. When you type that into a browser, you’ll see a page about Varnish. What does this look like to Varnish?
To start, the user will make a browser request with a lot of meta data about that specific page. Meta data like “What host do you want to contact?” What is the URL of the page? What protocol will you use to gain access?
In the http-protocol this looks like a simply key value separation of key and data. You have a key: the host header. That is nucleus.be. This one. It’s possible you have cookies because you are logging in and all you cookies are sent along.
You’re asking Varnish for a GET-method. You also add extra headers like an ACCEPT header, for example because you want json responses instead of an html response. All of that, usually a little over 10 headers, is sent to Varnish.
The hashing engine
Varnish will receive everything and put these headers in a sort of hashing engine. That creates a unique hash for that request. That hash will be searched for in the cache.
That is an enormous database of records, where every record has a name, the hash. And when it is found Varnish will extract it and send back to the user.
If it can’t be found we have to send a request to the webserver or the backend. What will actually happen is that your browser sends over ten different headers. But we, and Varnish, can determine which headers differentiate between me and a colleague.
Different cookies, different cache
I might be visiting nucleus.be with a certain amount of cookies. When someone who has the exact same cookies and host-header visits this page, Varnish will answer. If there’s a difference in the values of the host header or cookies, you visit www.nucleus.be instead of nucleus.be, Varnish will send it through its hashing engine and you’ll get a different hash, because the input was different.
The same applies if we would visit Varnish with the Google Analytics Campaign parameters. Those typical UTM-parameters in the URL. It becomes a different URL and that produces a different hash. This means that this page can probably be reached through ten or twenty different URLs by adding a ? or other parameters.
Hints for developers
This creates all sorts of permutations: changes in the cache. What’s important for you as developer or website administrator? Try to keep your URL as clean as possible. Make your host-header uniform. Turn them all into www.nucleus.be or drop the www everywhere. Do this consistently across the entire site. Also rewrite pages that have a different header to this header.
The same is valid for the URL you visit. In your code it may not be case sensitive, but it is to Varnish. Don’t start adding parameters randomly. Keep everything as clean as possible.
But the really annoying part is the cookies. If you use session ID’s, PHP or Ruby, it doesn’t matter, in the end the session ID’s of my colleague and I will still be different and unique.
My session-ID will identify me as a unique visitor. My colleague’s session-ID does the same for him. Because of that session ID’s that are stored in the cookies will create unique hashing results in Varnish. But do you always need a session-ID?
Perhaps you only need it when there’s a shopping cart or when you’re logged in? But if you generate a session ID for every visitor to your site, then every visitor will look like a unique visitor to Varnish.
Only generate session ID’s when they are needed, keep your cookies clean and as limited as possible, because every cookies you add (a language cookie, a preference cookie, etc.) will create a different version of the hash and the Varnish cache.
This part can grow very rapidly because every version is stored. Ideally this is as small as possible as well.
A brief summary
AWhen you have a website with a URL it uses a protocol. This can be HTTP or HTTPS. That doesn’t really matter to Varnish because Varnish doesn’t deal with HTTPS itself. Something will be before Varnish for HTTPS. Technically it’s another proxy before Varnish.
Everything Varnish sees are HTTP requests. Then you have your host-header: what website are you trying to reach? And your URL with a set of cookies.
And from all extra headers that are sent, we can choose those we think are good and those we think aren’t good. If your application has extra headers that are very important to determine the hash and the uniqueness, we can also add or remove them.
Keep this as clean as possible! Keep your cookies as clean as possible! Then you should be getting the same hash for reaching the same page.
Next time on Chalk Talk?
In the next video we’ll take a look at how you can delve deeper and how you solve problems or determine why a certain page is stored and another isn’t. See you soon for a new episode of Chalk Talk!