HTTP Caches and Cookies

Cookies:

HTTP is a stateless protocol. It means it treats each request independently without any knowledge of previous requests.

There is no link between two requests being successively carried out on the same connection

This concept of statelessness becomes problematic when you are on an e-commerce site. This is because once you log in to Amazon or Flipkart, you are required to be logged in until your transaction gets over. Till your transaction gets complete, you won't change your browser(say Chrome). Remember the word Browser.


Here come the cookies. It helps the server remember whether requests it gets from the client are from the same browser.

Therefore HTTP cookie is a small piece of data the server sends to the web browser. The browser stores the cookie and sends it back for each subsequent request.


How do Cookies work?

When the request is received, the server sends a response with the 'Set-Cookie' HTTP header.

Set-Cookie: <cookie-name>=<cookie-value>
Set-Cookie: hello=user

The browser saves the cookie and sends it with the 'Cookie' header.

Cookie:hello=user

In this way, the server knows that requests are from the same browser.


Lifetime of Cookies
Set-Cookie: <cookie-name>=<cookie-value>; Expires=<date>
Set-Cookie: <cookie-name>=<cookie-value>; Max-Age=<number>

Here Expiration date(date format) is with respect to the client and Max-age will be in seconds.


Since cookies are stored in the browser, they can be accessed via javascript [document. cookie API], this creates vulnerability. Therefore we use httponly.

Set-Cookie: hello=user HttpOnly

HTTP Caching:HTTP

The basic evolution point of caching is to reduce the time between request messages and response messages.

Improving faster response time to the user.

Scenarios?

Simple, whenever the responses from the server don't change(it can be the same static web page, same JSON, etc.).

Different requests but the same response.


Source: keycdn.com

It can be implemented in Browser, Intermediate servers(proxy servers), and the Server itself.

All http requests first move through the browser cache whether there is a valid cached response for the particular request. This reduces network time and data costs.

When responses are cached by service providers like Netflix, and Amazon in intermediate servers for reducing the user waiting time, it becomes CDN(at least the basis of CDN).


If the response stored should only be in the browser, we should add HTTP header

Cache-Control: private

Proxy caches(intermediate server caches) are implemented to reduce the network traffic to the server. These are caching servers.

However, HTTPS requests and responses are encrypted. Therefore proxy servers can't be used as caching servers.

To reduce the user waiting time and network load of the original server, service developers found many strategies to cache. These are called managed caches. e.g. CDN.

You can read in-depth about these MDN docs.


Stored/cached response has two states - Fresh and Stale.

Fresh - Stored response is valid

Stale - Store response is invalid.


How to decide whether the response is Stale or Fresh?

Two scenarios -

1)With a shared cache(intermediate server cache)

2)Without a shared cache

Cache-Control: max-age=604800 --> 1 week in seconds --> from response generated time

If more than 1-week --> response is stale.

If less than 1-week --> response is fresh.


If the response from the original server is stored for one day in an intermediate or proxy server, it is necessary to inform this one day.That's why the Age header comes

Cache-Control: max-age=604800
Age: 86400 --> One day in seconds

Therefore 604800 - 86400 = 518400 is used in determining stale or fresh.


Remember:

HTTP cache working is based on a collection of headers and links between them.

Some of the headers are,

Cache-Control

Last Modified

If-Modified-Since

If-None-Match

Etag etc.