|
This document is available in: English Castellano Deutsch Francais Nederlands Russian Turkce Korean |
by Manuel Soriano <manu(at)europa3.com> Translated to English by: Miguel A Sepulveda <sepulveda(at)linuxfocus.org> Content: |
Introduction to SQL -- Installation of PostgreSQLAbstract:
This short course consists of three parts, in the first part we
explain a few generalities about SQL using a public domain
database server called postgreSQL. In the second part we will
study in more detail the SQL commands. Finally the th ird part
will show us some of the advanced options of the SQL commands,
functions specific of PostgreSQL that might be of interest for
our projects and we will finally review a small C program
putting it all together.
|
Until recently, the access to data items was accomplished through entities that were inter-related by well-defined links of the database scheme. This type of access had advantages, mainly speed of access, but a big disadvantage: we could only access data through an existing link, for example:
country -> states -> countiesbut never :
country -> countieswhere "->" is the link.
If we wish to establish that second relationship we would have to redefine the scheme and compile it again....
In fact, in a hierarchical DB, the relationship among the various entities is static and can only be modified after alteration of the DB scheme and recompilation of the latter.
The basic idea behind relation databases is precisely to link data during the query instance, without the need for a static link, but instead using an identifier that permits to link one register with another.
What I just wrote probably needs an Aspirin :)
Relational database managers do not required static links to allow us go down the hierarchy of entities, instead they use a unique code that identify these entities while establishing a temporary relation as a result to a query.
The identification is nothing but a code. Ex: My phone number is not:
1234567
but :
34 6 1234567
Clearly my phone number is identified by the country code (34), state code (6) and the proper device number (1234567).
Let me set the foundations for the first example that illustrates what I just said.
All counties have a code, belong to
a state and a country.
All states have a code and belong to a
country.
All countries have a code.
To find all the counties in a state we relate the county with the state through the country and county codes; to find all the counties in a country we relate the county to the country by the country code. These relationships are temporary and only exist during the time of my query.
This is a bit arid and hard to understand but with the first few examples I hope to make clearer the concept of code and belonging.
When I send the first query to the DB manager it returns me all the related data items. But what data am I really receiving? The union of country and county items, for every county I am going to receive the related country items.
During my first query a new nameless entity is suddenly created, it contains a replica of countries and counties. This new entity, once again, disappears at the end of my query.
Before we used to name a set of data a "file". This are made of registers and each "register" has a "field". Well, in a relational database, a "file" is known as a table, a table contains rows and each row has columns, this is just a small cosmetic change. ;-)
It is good to mention at this point that some hierarchical DB managers introduce SQL as an access language, but this is only anecdote. SQL language is almost exclusively a property of relational managers.
To illustrate the use of SQL we will use the relational managers PostgreSQL. Although it is not fully compliant with the rules of SQL, it is sufficiently close for our purposes, it is also a very good manager for more heavy duty tasks.
Let me explain only briefly the installation process, given the goal of this article is SQL. First download the sources from www.postgresql.org, as well as any patches available. Extract the sources (tar zxvf) to a directory, cd postgresql-6.3
cd src ./configure --prefix=/the/desired/path make all >& make.log & tail -f make.log export PATH=$PATH:/the/desired/path/pgsql/bin export MANPATH=$MANPATH:/the/desired/path/pgsql/man export PGLIB=/the/desired/path/pgsql/lib export PGDATA=/the/desired/path/pgsql/data initdb createdb test psql test Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute query You are currently connected to the database: postgres test=>This is the prompt for postgres, now we can start executing commands.
mytest=>create table mytest (field1 varchar(10)); CREATE mytest=>insert into mytest values ('hello'); INSERT number 1 mytest=>commit work; NOTICE:EndTransactionBlock and not inprogress/abort state END mytest=>select * from mytest; field1 ------ hello (1 row) mytest=>drop table mytest; DROP mytest=>Ctrl-dWe have already exited the SQL console.
In case of having trouble compiling and installing Postgres95 correctly please refer to the INSTALL file in the root directory for the distribution.
Let me make another side comment; a relational database server is generally made of the following parts:
File: countries.sql create table countries (cod_country integer, name varchar(30)); insert into countries values (1, 'country 1'); insert into countries values (2, 'country 2'); insert into countries values (3, 'country 3'); commit work;
File: states.sql create table states (cod_State int, cod_country int, nam_State varchar(30)); insert into states values (1, 1, 'State 1, Country 1'); insert into states values (2, 1, 'State 2, Country 1'); insert into states values (1, 2, 'State 1, Country 2'); insert into states values (2, 2, 'State 2, Country 2'); insert into states values (1, 3, 'State 1, Country 3'); insert into states values (2, 3, 'State 2, Country 3'); commit work;
File: counties.sql create table counties (cod_country int, cod_state int, cod_county int, nam_county varchar(60)); insert into counties values (1, 1, 1, 'County 1, State 1, Country 1'); insert into counties values (2, 1, 1, 'County 2, State 1, Country 1'); insert into counties values (3, 1, 1, 'County 3, State 1, Country 1'); insert into counties values (1, 2, 1, 'County 1, State 2, Country 1'); insert into counties values (2, 2, 1, 'County 2, State 2, Country 1'); insert into counties values (3, 2, 1, 'County 3, State 2, Country 1'); insert into counties values (1, 3, 1, 'County 1, State 3, Country 1'); insert into counties values (2, 3, 1, 'County 2, State 3, Country 1'); insert into counties values (3, 3, 1, 'County 3, State 3, Country 1'); insert into counties values (1, 1, 2, 'County 1, State 1, Country 2'); insert into counties values (2, 1, 2, 'County 2, State 1, Country 2'); insert into counties values (3, 1, 2, 'County 3, State 1, Country 2'); insert into counties values (1, 2, 2, 'County 1, State 2, Country 2'); insert into counties values (2, 2, 2, 'County 2, State 2, Country 2'); insert into counties values (3, 2, 2, 'County 3, State 2, Country 2'); insert into counties values (1, 3, 2, 'County 1, State 3, Country 2'); insert into counties values (2, 3, 2, 'County 2, State 3, Country 2'); insert into counties values (3, 3, 2, 'County 3, State 3, Country 2'); insert into counties values (1, 1, 3, 'County 1, State 1, Country 3'); insert into counties values (2, 1, 3, 'County 2, State 1, Country 3'); insert into counties values (3, 1, 3, 'County 3, State 1, Country 3'); insert into counties values (1, 2, 3, 'County 1, State 2, Country 3'); insert into counties values (2, 2, 3, 'County 2, State 2, Country 3'); insert into counties values (3, 2, 3, 'County 3, State 2, Country 3'); insert into counties values (1, 3, 3, 'County 1, State 3, Country 3'); insert into counties values (2, 3, 3, 'County 2, State 3, Country 3'); insert into counties values (3, 3, 3, 'County 3, State 3, Country 3'); commit work;
A file with SQL commands can be executed from pgsql like this:
\i file_name
We could also insert the commands using a simple cut & paste.
Let us see next what counties are available:
manu=> select * from counties; cod_country|cod_state|cod_county|nam_county -----------+---------+----------+---------------------------- 1| 1| 1|County 1, State 1, Country 1 2| 1| 1|County 2, State 1, Country 1 3| 1| 1|County 3, State 1, Country 1 1| 2| 1|County 1, State 2, Country 1 2| 2| 1|County 2, State 2, Country 1 3| 2| 1|County 3, State 2, Country 1 1| 3| 1|County 1, State 3, Country 1 2| 3| 1|County 2, State 3, Country 1 3| 3| 1|County 3, State 3, Country 1 1| 1| 2|County 1, State 1, Country 2 2| 1| 2|County 2, State 1, Country 2 3| 1| 2|County 3, State 1, Country 2 1| 2| 2|County 1, State 2, Country 2 2| 2| 2|County 2, State 2, Country 2 3| 2| 2|County 3, State 2, Country 2 1| 3| 2|County 1, State 3, Country 2 2| 3| 2|County 2, State 3, Country 2 3| 3| 2|County 3, State 3, Country 2 1| 1| 3|County 1, State 1, Country 3 2| 1| 3|County 2, State 1, Country 3 3| 1| 3|County 3, State 1, Country 3 1| 2| 3|County 1, State 2, Country 3 2| 2| 3|County 2, State 2, Country 3 3| 2| 3|County 3, State 2, Country 3 1| 3| 3|County 1, State 3, Country 3 2| 3| 3|County 2, State 3, Country 3 3| 3| 3|County 3, State 3, Country 3 (27 rows) manu=>There are 27 rows and pgsql is now waiting the next command, try this one:
manu=> select * from countries, states; cod_country|name |cod_state|cod_country|nam_state -----------+---------+---------+-----------+------------------ 1|country 1| 1| 1|State 1, Country 1 2|country 2| 1| 1|State 1, Country 1 3|country 3| 1| 1|State 1, Country 1 1|country 1| 2| 1|State 2, Country 1 2|country 2| 2| 1|State 2, Country 1 3|country 3| 2| 1|State 2, Country 1 1|country 1| 1| 2|State 1, Country 2 2|country 2| 1| 2|State 1, Country 2 3|country 3| 1| 2|State 1, Country 2 1|country 1| 2| 2|State 2, Country 2 2|country 2| 2| 2|State 2, Country 2 3|country 3| 2| 2|State 2, Country 2 1|country 1| 1| 3|State 1, Country 3 2|country 2| 1| 3|State 1, Country 3 3|country 3| 1| 3|State 1, Country 3 1|country 1| 2| 3|State 2, Country 3 2|country 2| 2| 3|State 2, Country 3 3|country 3| 2| 3|State 2, Country 3 (18 rows)18 rows ??? We inserted 3 countries and 6 states, all identify a single country. How is it possible we get 18 rows?
The last command has performed a union of two tables, we have related the table of countries with the table of counties, since we have not specify any union exclusion rule, pgsql returned ALL possible rows of countries related with ALL rows of states, i.e. 3 for countries times 6 for states for a total of 18. This result is obviously illogic and useless, better if we could have done the following:
manu=> select * from countries, states manu-> where countries.cod_country = states.cod_country; cod_country|name |cod_state|cod_country|nam_state -----------+---------+---------+-----------+------------------ 1|country 1| 1| 1|State 1, Country 1 1|country 1| 2| 1|State 2, Country 1 2|country 2| 1| 2|State 1, Country 2 2|country 2| 2| 2|State 2, Country 2 3|country 3| 1| 3|State 1, Country 3 3|country 3| 2| 3|State 2, Country 3 (6 rows)Well, this begins to appear a bit more reasonable, Six rows, Correct?
Yes, there are six counties and each county is in a country. It is reasonable to get a number of rows identical to the number of counties because country is a qualifier of counties. We just related the table of countries with the table of counties via the country code. Remember that countries have a code and counties have the code of the country they belong to.
Why countries.cod_country = states.cod_country ?
The country code in the table of countries is cod_country and in the table of counties too, therefore:
cod_country = cod_countryis illogical, the interpreter will never now which of the two to use and it would return us an error:
select * from countries, states where cod_country = cod_country; ERROR: Column cod_country is ambiguousNext, we can use aliases for columns:
manu=> select * from countries a, states b manu-> where a.cod_country = b.cod_country; cod_country|name |cod_state|cod_country|nam_state -----------+---------+---------+-----------+------------------ 1|country 1| 1| 1|State 1, Country 1 1|country 1| 2| 1|State 2, Country 1 2|country 2| 1| 2|State 1, Country 2 2|country 2| 2| 2|State 2, Country 2 3|country 3| 1| 3|State 1, Country 3 3|country 3| 2| 3|State 2, Country 3 (6 rows)What does the manager return?: cod_country, name, cod_state, cod_country y nam_state.
Since we query "select * from countries, states", where the * is a wild card that stands for EVERYTHING, we obtained the two columns for countries and the three for counties. Now we would like to be more specific:
manu=> select a.cod_country, cod_state, name, nam_state manu-> from countries a, states b manu-> where a.cod_country = b.cod_country; cod_country|cod_state|name |nam_state -----------+---------+---------+------------------ 1| 1|country 1|State 1, Country 1 1| 2|country 1|State 2, Country 1 2| 1|country 2|State 1, Country 2 2| 2|country 2|State 2, Country 2 3| 1|country 3|State 1, Country 3 3| 2|country 3|State 2, Country 3 (6 rows)In the last command we explicitly asked for the country code, the state code and the name of the country and state. Notice that some column names are qualified (a.cod_country) while others are not (nam_state), this is because cod_country is repeated in both tables while nam_state exists only in states. Unique column names do not need extra qualifiers.
Let us make things more complicated:
manu=> select a.cod_country, cod_state, name, nam_state manu-> from countries a, states b manu-> where a.cod_country = b.cod_country manu-> and a.cod_country = 3; cod_country|cod_state|name |nam_state -----------+---------+---------+------------------ 3| 1|country 3|State 1, Country 3 3| 2|country 3|State 2, Country 3 (2 rows)This time we limited the search to country number 3 only.
select count(*) from states; count ----- 27 (1 row)It returns the number of rows contained in the table of counties, next:
manu=> select cod_country, count(*) from states manu-> group by cod_country; cod_country|count -----------+----- 1| 2 2| 2 3| 2 (3 rows)It returns the number of rows with IDENTICAL country code, this is the reason for using the cod_country.
An even better example:
manu=> select name, count(*) from countries a, states b manu-> where a.cod_country = b.cod_country manu-> group by name; name |count ---------+----- country 1| 2 country 2| 2 country 3| 2 (3 rows)We still obtained the same three rows but this time the returned information is more clear.
Well until now we have only given an introduction, just warming up :-)
We have seen several commands:
CREATE TABLE | This command creates a table with its columns. |
DROP TABLE | Erases a table. |
SELECT |
This command is the foundation of SQL, allows us to
create a temporal table containing the necessary data
items only. SELECT can take as parameters functions or
complex statements, as well as sub_selects:
select count(*) from states where cod_country in (select cod_country from countries); count ----- 27 (1 row) |
BEGIN WORK |
This is another fundamental command. It tells the DB
manager to commit ALL the modifications given since BEGIN
WORK. In our particular DB manager a BEGIN WORK marks the
initialisation of a transaction, in other managers the
beginning of a transaction is mark by the first command
that alters something in the database. In postgreSQL all
commands that alter data will operate directly unless
there was a previous BEGIN WORK.
NOTE: commands that modify the scheme of the database execute a COMMIT WORK, therefore if a transaction is opened and any such commands is executed our transaction will be closed immediately and it will be impossible to launch a ROLLBACK WORK. While a user has an open transaction he can declare the access type to his data by other users:
|
COMMIT WORK | Closes a transaction leaving the committing the modifications introduced. The command ROLLBACK WORK returns the data to their state previous the current transaction. |
manu=> select * from countries; cod_country|name -----------+--------- 1|country 1 2|country 2 3|country 3 (3 rows)There are three rows,
begin work;Begins a transaction
insert into countries values (5, 'Country Not True');We inserted a row, let us next verify that all the rows are there
manu=> select * from countries; cod_country|name -----------+---------------- 1|country 1 2|country 2 3|country 3 5|Country Not True (4 rows)All the rows are there. Next
rollback work;this abandons the transaction.
manu=> select * from countries; cod_country|name -----------+--------- 1|country 1 2|country 2 3|country 3 (3 rows)After checking the number of rows we see it went back to the original 3 rows.
INSERT | We have seen this command already. It inserts data in a table. | ||||||||||||||||||||
CREATE TABLE |
Another important command, the creation of a table and
its columns, let us see now the type of data that can be
handle:
|
||||||||||||||||||||
DELETE | deletes rows of a table | ||||||||||||||||||||
UPDATE | modifies the columns of a row in a table. |
SQL let us build an abstraction layer to our data and permits us manage them according to our needs.
From what we have seen so far, one could ask: How do I use SQL within an application?
The answer will come one step at a time, in our third article we will review a short C application using SQL.
Webpages maintained by the LinuxFocus Editor team
© Manuel Soriano, FDL LinuxFocus.org |
Translation information:
|
2002-11-02, generated by lfparser version 2.34