Original in fr Charles vidal
fr to en Frédéric Raynal
fr to en Alexandre Abbes
Chairman of a gastronomical lug in Paris. He likes the philosophy behind GNU and Open Source, because it helps to share knowledge. He would like to have time to play saxophone.
This Article about the most used web server Apache is divided in two parts. In the first part I describe in short the history of the World Wide Web and the second part is an introduction to the HTTP protocol.
Apache is the name of a free
WEB server project. The name Apache has a slightly contested origin,
some say it comes from "a patchy server" because of the
numerous patches in the beginning ( again a Hacker trick :) ),some
others have a much more serious explanation and say that the founders of the
project took this name following in memory of Apache tribe. A tribe
with great adaptability on the land.
It is the most used web server in the Internet. It follows HTTP
protocol (1.1), standardized by the
consortium w3.
A Netcraft survey,
made in June 1999, estimates that 60.05% of the web servers are Apache
servers.
A web server is the "server" side of the client-server model. It
answers queries from "web clients" such
as e.g the lynx web browsers ;-).
It is one line of text divided in 3 parts :
The answer from the server is built with a header and a body, depending on the query type.
>telnet www.linuxfocus.org 80 Trying 195.53.25.18... Connected to nova.linuxfocus.org. Escape character is '^]'. GET / HTTP/1.0 <return> <return> HTTP/1.1 200 OK Date: Mon, 27 Sep 1999 21:23:20 GMT Server: Apache/1.3.3 (Unix) (Red Hat/Linux) Last-Modified: Sun, 26 Sep 1999 16:40:44 GMT ETag: "4b005-1616-37ee4c8c" Accept-Ranges: bytes Content-Length: 5654 Connection: close Content-Type: text/html <PAGE HTML> |
What does this answer say?
The first line shows the protocol used and the return
value of the server (a return value greater than 400 indicates an
error). It is followed by the date, the version of the server, the
date of the last modification of the URL (this allows the client to
know if the files in his cache are still valid). Content-Length is the
length of the answer (queries to CGI scripts do not provide this information) and
the Content-Type tells the web client the MIME type of the answer
(text, html, images ...).
This is not a complete description : some lines are still a mystery
to me ;-)
Let's see what happens when an error occurs :
>telnet www.linuxfocus.org 80 Trying 195.53.25.18... Connected to nova.linuxfocus.org. Escape character is '^]'. get / HTTP/1.0 <return> <return> HTTP/1.1 501 Method Not Implemented Date: Mon, 27 Sep 1999 21:22:03 GMT Server: Apache/1.3.3 (Unix) (Red Hat/Linux) Allow: GET, HEAD, OPTIONS, TRACE Connection: close Content-Type: text/html |
As you can see, the header is talkative enough ;-)
HTTP is a very simple protocol as we will see in these
examples :
>telnet www.linuxfocus.org 80 Trying 195.53.25.18... Connected to nova.linuxfocus.org. Escape character is '^]'. GET / < return > < return > |
What happens inside the Apache server ?
You have been connected with the telnet command to the port 80 of
www.linuxfocus.org (IP adress 195.53.25.1) (the port 80 is the default
port for the http server). The server was waiting for a query and you
wrote GET / followed by 2 carriage return.
Why those 2 carriage returns ?
The empty line just signals the server that this is the
end of the query.
The server answered by sending the requested file
(index.html). The TCP/IP connection is closed at the end of the transfer.
As you can see, the language used between the client and the server is very simple but difficulties arise when you use version 1.1 instead of 1.0 for your query:
GET / HTTP/1.0< return > < return > HTTP/1.1 200 OK Date: Tue, 24 Aug 1999 22:25:11 GMT Server: Apache/1.3.3 (Unix) (Red Hat/Linux) Last-Modified: Sun, 01 Aug 1999 11:50:52 GMT ETag: "4b005-1462-37a4349c" Accept-Ranges: bytes Content-Length: 5218 Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> ....But typing 1.1 gives this:
GET / HTTP/1.1 <return > < return > HTTP/1.1 400 Bad Request Date: Tue, 24 Aug 1999 22:24:59 GMT Server: Apache/1.3.3 (Unix) (Red Hat/Linux) Connection: close Transfer-Encoding: chunked Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>400 Bad Request</TITLE> </HEADBODY> <H1>Bad Request</H1> Your browser sent a request that this server could not understand.<P> client sent HTTP/1.1 request without hostname (see RFC2068 section 9, and 14.23): </P> </BODY></HTML>The query with the new HTTP 1.1 protocol requires more information fields. It is built on several lines. The added lines allow for the transmission of more precise information and therefore improves the quality of the communication.
Example :
GET / HTTP/1.0< return > Host:www.linuxfocus.org< return > < return > [...]As it is done with most of the clients-servers, when the server receive a query :
The web server is an interface between the web client asking for an URL (Uniform Request Locator) - this abbreviation is not the only one used, you can also find URI, URN, It's basically all the same - and the operating system Apache is working on. The web client sends its query and the server answers back the page which corresponds to the requested URL.
Some queries sent by the client can't be directly answered by the server. The server can spawn some programs in order to do the job and returns the results : this is exactly how the CGI-scripts (Common Gateway Interface) are working.