The HTTP Protocol

The HTTP protocol (HyperText Transfer Protocol) is one of the underlying technologies that make the internet possible. It is the protocol use to transfer any type of content within the web.

Resources and URLs

Every resource (page, image, video, etc) that is accessible through the web is represented by an identifier called URL or uniform resource locator. A URL is composed of several elements:

1
2
3
4
5
6
7
8
9
http://www.google.com:80/search?q=hello#something
{scheme}://{host}:{port}/{path}?{query}#{fragment}

// scheme => http
// host => www.google.com (can be an IP, a friendly name is usually resolved into an IP by a DNS server)
// port => defaults to 80 for the http protocol
// path => search
// query => q=hello
// fragment => something (only used in the client)

The URI RFC defines which characters are valid in a URI/URL to maximize interoperability. All other characters must be URL encoded before they can be used as part of URL. (this include for instance, a space, the / which is use a delimiter, etc).

Resources accessed using a URL and an HTTP request can have different types of representations. You can use the Accept header to specify the type of representation that you are interested in and the server will respond with an HTTP response with a matching Content-type (if possible). Both Accept and Content-Type headers define the type of resource. The available types are defined by the MIME types or Multipurpose Internet Mail Extensions (also known as Internet Media Types). Some very common content types are text/html, application/json, image/jpg, etc. The act of matching the representation of a resource from Accept types with Content-Type types is known as Content Negotiation.

HTTP Requests

Communication over HTTP usually consists in exchaning HTTP requests and responses. The HTTP standard defines how both HTTP requests and responses must be formatted. An HTTP Request is composed of:

  • An HTTP Verb
    • GET: retrieve resource. Safe (doesn’t change a representation in the server)
    • POST: update or create a resource. Unsafe (changes or creates a representation in the server)
    • PUT: update resource. Unsafe
    • DELETE: delete a resource. Unsafe
    • PATCH: update a resource sending only part of the representation (and not the whole representation like in POST and PUT). Unsafe
    • OPTIONS: get options associated to a resource
    • HEAD: like GET but it only receives the Headers of a response.
    • etc…
  • A URL that identifies the resources we want to access
  • The HTTP version (up until last year HTTP 1.1 was the most common, but last year HTTP 2.0 was released so we’ll probably be seeing much of that in the present and future)
  • A series of HTTP headers. Some of them are:
    • Accept: describes the wanted representation of a resource (as a MIME type)
    • Accept-Language: describe the wanted language
    • Accept-Encoding: describe the wanted encoding (gzip, deflate, etc)
    • Accept-Charset: describe the charset (utf, ascii, iso-xxx, etc)
    • Cookie: cookie information
    • Referer: the url of the referring page
    • User-Agent: describes the agent sending the request (browser, OS, etc)
  • A body (based on verb)
    • GET, DELETE, OPTIONS, HEAD messages don’t have a body
    • POST, PUT, PATCH do

HTTP Responses

An HTTP response consists of

  • The HTTP version
  • An HTTP Status code that represents the result of a response. The status codes belong to the following ranges:
    • 100-199: information
    • 200-299: success
      • 200 OK
      • 201 Created (a resource)
    • 300-399: redirection
      • 301 Moved temporarily
      • 302 Found
      • 304 Not modified
    • 400-499: client error
      • 400 Bad Request
      • 401 Unathorized
      • 403 Forbidden
      • 404 Not Found
    • 500-599: server error
      • 500 Internal Server Error
  • A series of HTTP headers. Some are:
    • Cache-control: describes the cache strategy used for this resoure
    • Content-Type: describes the content type of the body (as a MIME type)
    • Content-Length: describes the size of the body in bytes
  • A body with the contents of the response

References

Comments