10 Http Protocol
Pravin Jain
Introduction
Http is a stateless protocol which works over TCP/IP. In this module we will be talking about Http protocol, Earlier we have talked about networking, we have seen that networking requires host, networking requires socket, serverSocket, there is a connection which is made between a client and a server, ther server listens on a particular port, and the client, whenever he wants to make a connection, he connects on a particular port number. Thats at raw level, raw in the sense, that he is himself is making a stream based connection, but on this connection, some kind data needs to be exchanged, there has to some kind of rules according to which, the data can be exchanged between the client and the server. One of the kind of stateless protocol for communication between client and server over a TCP/IP, is the HTTP protocol, It is the most commonly used protocol on the web, when we are using the browser, most of the time we are using the Http Protocol. So, lets look at this. In this module we will look into the details about, how this Http protocol works, the structure of the Http protocol.
Most of the protocols where we have a clinet server paradigm are based on request and response, A request goes from a client, is initiated by a client goes to a server, and server gives back a response.
Http stands for Hyper Text Transfer Protocol, we normally feel, hyper text to mean hyper text markup language, oh yes, it was most commonly used for transfer of or exchange of hypertext markup language. But Http is not restricted to only transfer of Hyper text markup language, Http is used for any kind of content. It is capable of transporting any kind of content between client and server. Server to client, client ot server, content exchange it is a protocol for exchange of content between client and server. Let us see the details about how this protocol is capable of exchanging any kind of content.
This protocol Http is mainly about exchanging content between client and server. What we will be looking at here will be the request and response formats. Thie standard port number for this protocol is TCP/IP port number 80. So Http works mainly on prot 80. The details of Http protocol are available from RFC 2616. Let us try to look at it in a more simplistic manner. Let us try to understandand Http in a simple manner.
It is a Request-Response protocol in which, a client is the one who will be initiating the connection. Once having made a connection, on the connection, the client sends the request. The most commonly used client in case of Http is the browser. We have a server, who will be receiving the request and after receiving the request can do its own processing whatever is required at the server end and then, create appropriate Http response and send ti back to the client.
The mostly used client is the browser.. what does the browser do after receiving the response. Its normal job is to render the content received in the response, that is the browsers way of handling responses received from the client. But the clinet ccold be some other applicatino which may like to use this information in its own manner and need not necessarily be rendering or displaying the content.
Http Request
Let us see the format, what exactly is sent across, by the client to the server in the form of a request, and what is the format in which the server sends a response to the client. So to understand the request and the response formats let us start with the client side. So, let us say in a browser you type a URL, you are just simply typing a URL, let us say, http://<somehost>/<somepath>/<somefilename>, so that is the kind of URL you are writing. What does the browser do about this, it makes a connection, where does it make the connection, looking at the URL, looks at the host, so it knows that is the host I have to connect to, http default is port 80, unless in the URL, you mention a port number, it would make connection on part 80, makes a connection to a server assuming there is a server working on that host and the port number, it would be successfull in making conneciton, once the connection is done, on this connection, it is sending a request. Let us look at the format of this request.
This request you can break it into 3 parts. The 1st line is the request line, the request line is then followed a number of lines, which we have are plain text lines, pure ASCII lines, these are what we call as headers, and then we have a blank line. The blank line indicates that there are no more headers, and then after the blank line, there can possibliity be a content.
<request-line> 1st line
<header-line> header lines
………..
<header-line>
<optionally content> blank line
Http has a provision of even a client sending some content to a server. We do have such thing happening when we are attaching a file. Its a client sending some file to the server. There are various reasons, when client submits some form. It could be that you sending a content to the server. Let us see the exact details of the form. So the first line, its the request line, what is there in the request line.
The Request line starts with a method (Http method). The most commonly used method would be the GET method. We have the GET method, followed by the resource name. Resource name would be, for eg. When you send http:// in the URL, you specified the path and the file, the path and the file is the one which identifies the resource. We are fetching the resource on the specified host, this is the second part of the request line. So the first line, the request line contains a method, followed by the resource, followed by which http version I am following. There are various headers which are following and in the versions of Http, each of the headers has certain meaning. So there may be new headers added in a newer version of the Http protocol and what is the meaning assigned to the header, that is something which should be known to the server, so this request is being sent to the server according to which protocol. It is an indication also that I will be able to understand headers according to which protocol. So, your response should be according to this protocol. eg. HTTP/1.0 or HTTP/1.1. So, our first request line has method, resource and protocol version.
That is the first request line, in the request. Then there a number of headers in the request. The headers are some additional information. Additional information related to the capabilities of the browser, eg. The client may indicate, that I can accept images, or a text browser may indicate I don’t accept images. So what are the kind of things that I can accept, can be indicated. Different browsers support different things say mozilla may have certain restriction and same case can be with IE. They behave differently for certain tags. So, if the server knows who the client is, it may send decide to send tags accordingly. We have a header called User- Agent to specify which browser version the client is. Like this there are various kinds of headers, putting in additional information, which is related to the browser, the browser’s capability.
There might be information relating to content to be sent. This content, we may call it as the payload, So in our Http protocol we can have some payload, the content which is to be sent across. So we can have information related to the payload. What is the length of the content? What is the type of the content? Whether it is a text content or binary (eg case of image). Information related to what are the preferences of the user. This user is preffering the content to be in particular a type of language.
Say for example the user will prefer content to be in gujarati that could also be set as part of the headers.
We have various headers and purpose of headers is to have additional information being sent across to the server, make it available to the server.
You have information very commonly used related to the cookies. Cookies which are available on the client machine, thie information about the cookies need to be sent to the server. What these cookies? Cookies are nothing but they are pieces of information which the server has sent on a client saying whenever you connect to me next time you please send this additional information to me.
So, cookie is such an information which the server wants the client to keep on sending it again and again till a particular point of time. Cookies have their own format where expiration time can be specified.
So this is some kind of understanding between the http-client and the http-server where server is able to set a cookie on the client, and then onwards whenever a client makes a connection to that particular server it has to remember that these are the cookies which the server has asked me to remember and send it.
So, basically cookies are some pieces of information which are normally somekind of key value pairs. These are the key value pairs which are being maintained by the client for the server.
So server is the one who is sending it and the client has to manage some kind of database for the cookies where it remembers that these are the cookies sent from so and so server/URL. So it needs to manage these cookies server-wise/URL-wise.
So whenever it makes the connection and it looks for any cookie related to it, i need to include that cookie in the header. So, that is the way the cookies are being used here.
Pieces of information set by server on a client which in turn includes in the header whenever the connection is made to the same server/URL.
It is a stateless protocol means a client makes a connection to server, sends a request to the server gets the response and then does not remembers it
So on one connection one kind of a transaction takes place, it is not multiple thing taking place on one connection. The client and server do not remember each other so that way it is stateless.
So, on this stateless protocol, cookies can bring in some kind of a state, so, if an application wants, cookies can used to maintain some kind of state.
There can be headers to indicate that a client can handle compressed content, eg. Content- Encoding, can be used to indicate what kind of compression, it can handle.
Who is the user, if the user has authenticated on the client application, then what kind of authentication was carried out, all such information can be sent thru the headers.
After all the headers there is a blank line, then, if it is a GET Request it will not have a content part. If it is a POST Request it will have a content part and this is the content that follows the blank line.
So syntax of request:
First request line, header lines, end of header indicated by a blank line, and then would be the content, if present.
Http Response
Now coming to the response part:
When the server sees the request it would be able to know all this information get the capabilities of the client, it may or may not be able to use all the information. Depending on whatever is the level of intelligence of the server. eg., according to the browser information, the server decides whether to use certain tags or not. Whether such information is used by the server side that depends upon the server side application.
So server side does its own processing based on the request but ultimately it sends back the response. On this connection when the response is sent back the format is as follows:
The response first line is the response line followed by a number of header lines, then a blank line to indicate the end of the headers and then the content which is being sent.
So, most of the time client is requesting some content.
In the response line the first field will be the protocol version, say HTTP/1.1 and then it is followed by three digit status code, followed by message corresponding to the status code.
Let us say, client asked for the content and the server is sending back the content so most common response code here is 200, and the corresponding message is OK.
We have various categories and ranges of response code as follows:
- 100-199 – Informational response – may not have a whole content part but the partial content.
- 200-299 – Successfully completed
- 300-399 – redirection – It does not provide content but gives the indication from where the content can be obtained.
- 400-499 – Something wrong about the request say 404-page not found 401-unauthorized user. So, there can be various reasons, your request has been rejected.
- 500-599 – Server error – request seems to be ok but when I initiated the process on the server side something went wrong on my side. There is a server error which has taken place, say some component has failed.
These are then followed by headers.
Headers may have the informstion like which OS it is. When it sends the content the meta data related to the content is set as this is the content I am sending in so and so language. What type of content it is even that can be set, content length can be set, compressed or not compressed (Content-Encoding).
Some kind of header like you need to remember this say cookie. Cookies are not the content so cookies need to be set on the header.
Headers are used to indicate some of the information about the server version and name etc.. Then we have blank line followed by the actual content.
For example you can go for HTML content that comes directly through URL and not through some redirection. You can try on the machine where telnet is installed. You can say telnet give the host name/domain/ the server name and the port number (normally is 80). Once you do telnet it will be connected. Once it is connected you can try out with just two lines:
GET /Resource HTTP/1.1 Host: host/domain name.
Host is the header which is compulsory in version 1.1 and then put the blank line and then you will get the HTML content on your screen. These are the minimum things to experiment.
So this is the stateless Http protocol.
you can view video on Http Protocol |
Suggested Reading: