LinuxFocus July 1998: SQL Tutorial Part II

Manuel Soriano

Contents:
Introduction
Creating a table
Modifying a table
Inserting data into a table
Selecting data
Details of WHERE
UPDATE command
DELETE command
References

SQL Tutorial
Part II

Summary: In part II of the tutorial we review several basic statements of the SQL language

Introduction

This is the second part of the SQL course. In this part, we will focus on various SQL commands from the creation of a table, its modification, and/or deletion.

We will focus, above all, on the SELECT command, which is, in my judgment, the most important of all.

I hope that this second part will be enjoyable and instructive for you.

Creating a table

Like we have seen in the first installment, to create a table, the CREATE command with the TABLE qualifier is used. In effect, the CREATE command serves to create:

users: CREATE USER
tables: CREATE TABLE
indexes: CREATE INDEX
views: CREATE VIEW

The CREATE command indicates to the manager that something is going to be created. Later we will cover what and how.

What we are interested in now is the creation of a table:

Syntax

CREATE TABLE name ( column type [DEFAULT value] [NOT NULL], ...
      [INHERITS (inherits, ...)]
      [CONSTRAINT constraints CHECK (test), CHECK (test)] );

Where:

`name:`	Is the name that is given to the table and how it is referenced by any command
`Column`:	Is the name of the column
`Type`:	Is the type of data(`varchar, char, int, date, time, timestamp`), Postgres has other types of data, but they aren't compatible with ANSI SQL
`Value`:	The value that it will have by default
`Inherits`:	This is Postgres's own. It defines an inheritance from another table. This will create an entity that contains the columns of the table that we are creating and the ones inherited
`Nom_cons`:	This defines a rule of integrity to meet each time that a row is modified
`Test`:	Conditions to check

Example:

CREATE TABLE countries (
         cod_country integer NOT NULL,
         name varchar(30))
   CONSTRAINT cod_raro CHECK (cod_country > 0 AND cod_country < 154);

With this example we have created a table of countries. Each time that we insert a new row, these conditions will be meet:

The country code won't be NULL. Upon trying to put a NULL code, the manager will return the error:
```
ExecAppend: Fail to add null value in not null attribute cod_country
```
The country code will be greater then 0 and less then 154. Upon trying to insert an erroneous country code, the manager will give us an error:
```
ExecAppend: rejected due to CHECK constraint cod_raro
```

Note

What does NULL mean? In SQL there exist two states, data and no data. In effect, we can be interested that a field of rows doesn't have data, like zero (0) as well as the spaces are data. SQL introduces the concept of NULL and works with it. A practical example:

I have a table of bills, with the following fields: customer, value, date_issued, date_paid

When I create the row, I will insert the data: customer, value, date_issued

I will leave the date paid null; in this way I will know which bills need to be paid with the following command:

SELECT * FROM bills WHERE date_paid IS NULL;

One can claim that a zero (0) in the field date_paid would do the same thing. It's true, except that zero (0) is not a date, and it prevents creating date_paid of date type and applying appropriate date functions.

Examples of creating with NULL:

insert into countries values (15, NULL);

Or:

insert into countries (cod_country) values (27);

The absence of the "name" field implies that this will receive a NULL value.

Modifying a table

In PostgreSQL, modification only deals with adding new columns.

ALTER TABLE table ADD name type;

Where:

Table	Name of the table to modify
Name	Name of the column to add
Type	Type of data (see `CREATE TABLE`)

Inserting data into a table

Now we will insert data into our table:

SYNTAX:

INSERT INTO table [(column, column, ...)] VALUES (value-1, value-2,
...)

or else:

INSERT INTO table [(column, column, ...)] SELECT ....

As we have seen, there are two forms of inserting data into a table, either line by line or the result of a sub-select that can return one or more rows.

When we insert lines into a table, we will ALWAYS put data in all the columns including those that we don't mention, these are created with NULL values.

If, in the command, we don't specify which columns we are going to fill, it is understood that we are going to give data to all, example:

INSERT INTO countries VALUES (34, 'Spain');

This would be incorrect:

INSERT INTO countries VALUES (34);

But, this would be correct:

INSERT INTO countries (cod_country) VALUES (34);

I recommend that a command embedded in "C" programs or in database functions ALWAYS specify the columns that we are going to touch, in effect, if we add a new column to the table (ALTER TABLE), the next insert an error will occur, Example:

INSERT INTO countries VALUES (34, 'Spain');

INSERT 18301 1

ALTER TABLE countries add population integer

INSERT INTO countries VALUES (34, 'Spain');

This will give a parser error, given that the population data is missing

Note

PostgreSQL, doesn't generate an error. It creates the line with the "population" field NULL. This is only a particularity of PostgreSQL, any other SQL manager would give an error.

We still have another type of INSERT, which is fed from a sub-select.

This type of insert is used very often to create temporary tables or tables to carry out a concrete task of speculative calculations.

The part replaced is that which touches the data itself, this comes from the SELECT instructions that were carried out previously and the inserting of the data. The instruction SELECT can return one or more rows, this instruction SELECT has the same restrictions of the same SELECT.

Selecting data

I wanted to arrive at this point! :-))

We have covered required SQL commands, the SQL language without SELECT would be like beans without sausage.

The SELECT command allows us to access data, but with the reservation that searches, unions of tables, functions with the data, and with the search rules can be carried out.

An example:

select * from countries;

Another example:

SELECT a.name, SUM(population)
    FROM countries a, states b, counties c
    WHERE b.cod_country = a.cod_country
        AND (c.cod_country = b.cod_country
        AND c.state_code = b.state_code)
        AND population IS NOT NULL
        GROUP BY a.name
        ORDER BY sum ASC;

Let me explain, I requested the populations of all of the countries ordered by the population in an ascending order.

For this I have added a new column (population) to the counties table.

This would be like this:

create table counties (cod_country int, 
                        state_code int, 
                        county_code int,
                        county_name varchar(60),
                        population int);
insert into counties values (1, 1, 1, 'Country 1, State 1, County 1',
5435);
insert into counties values (2, 1, 1, 'Country 2, State 1, County 1',
7832);
insert into counties values (3, 1, 1, 'Country 3, State 1, County 1',
4129);
insert into counties values (1, 2, 1, 'Country 1, State 2, County 1',
76529);
insert into counties values (2, 2, 1, 'Country 2, State 2, County 1',
9782);
insert into counties values (3, 2, 1, 'Country 3, State 2, County 1',
852);
insert into counties values (1, 3, 1, 'Country 1, State 3, County 1',
3433);
insert into counties values (2, 3, 1, 'Country 2, State 3, County 1',
7622);
insert into counties values (3, 3, 1, 'Country 3, State 3, County 1',
2798);
insert into counties values (1, 1, 2, 'Country 1, State 1, County 2',
7789);
insert into counties values (2, 1, 2, 'Country 2, State 1, County 2',
76511);
insert into counties values (3, 1, 2, 'Country 3, State 1, County 2',
98);
insert into counties values (1, 2, 2, 'Country 1, State 2, County 2',
123865);
insert into counties values (2, 2, 2, 'Country 2, State 2, County 2',
886633);
insert into counties values (3, 2, 2, 'Country 3, State 2, County 2',
982345);
insert into counties values (1, 3, 2, 'Country 1, State 3, County 2',
22344);
insert into counties values (2, 3, 2, 'Country 2, State 3, County 2',
179);
insert into counties values (3, 3, 2, 'Country 3, State 3, County 2',
196813);
insert into counties values (1, 1, 3, 'Country 1, State 1, County 3',
491301);
insert into counties values (2, 1, 3, 'Country 2, State 1, County 3',
166540);
insert into counties values (3, 1, 3, 'Country 3, State 1, County 3',
165132);
insert into counties values (1, 2, 3, 'Country 1, State 2, County 3',
0640);
insert into counties values (2, 2, 3, 'Country 2, State 2, County 3',
65120);
insert into counties values (3, 2, 3, 'Country 3, State 2, County 3',
1651462);
insert into counties values (1, 3, 3, 'Country 1, State 3, County 3',
60650);
insert into counties values (2, 3, 3, 'Country 2, State 3, County 3',
651986);
insert into counties values (3, 3, 3, 'Country 3, State 3, County 3',
NULL);
commit work;

Now, we can't do it with ALTER TABLE, but should use UPDATE and I haven't explained it, so use the "cut & paste" and everyone will be happy :-))

Now we can execute the QUERY, and the results should be:

name     |    sum
- ---------+-------
country 1| 705559
country 2|1212418
country 3|2804018
(3 rows)

Now we will verify this:

select sum(population) from counties where cod_country = 1;

Results:

   sum
- ------
791986
(1 row)

!!!!!! One Difference !!!!!!

Let's look at the states table, it is missing the state 3, we do:

INSERT INTO states VALUES (3, 1, 'State 3, Country 1');
INSERT INTO states VALUES (3, 2, 'State 3, Country 2');
INSERT INTO states VALUES (3, 3, 'State 3, Country 3');

and repeat the command, resulting in:

name     |    sum
- ---------+-------
country 1| 791986
country 2|1872205
country 3|3003629

We are missing the state 3 from each country.

Now for those that have been lost, remember that the joins between tables were EXACT, it only extracts data if the condition are exact.

Let's look at the first part of the WHERE: b.cod_country = a.cod_country

This means that I join the country table with the states where the country code is equal, now remember the country data that we inserted:

Don't execute this, is only used as an illustration.

create table countries (cod_country integer, name varchar(30));
insert into countries values (1, 'country 1');
insert into countries values (2, 'country 2');
insert into countries values (3, 'country 3');
commit work;

Now the states data:

create table states (state_code int, 
                        cod_country int, 
                        state_name varchar(30));
insert into states values (1, 1, 'State 1, Country 1');
insert into states values (2, 1, 'State 2, Country 1');
insert into states values (1, 2, 'State 1, Country 2');
insert into states values (2, 2, 'State 2, Country 2');
insert into states values (1, 3, 'State 1, Country 3');
insert into states values (2, 3, 'State 2, Country 3');
commit work;

All of the states 3 from every country are missing, but in the country table the corresponding data of the states 3, so it is normal that we don't add the country data with the states with code 3 to be discarded in the second part where:

        AND (c.cod_country = b.cod_country
        AND c.state_code = b.state_code)

The state exists in the counties table but NOT in the states table.

For those that haven't understood, take an aspirin, go walk your dog (if you don't have a dog, go walk without a dog), breath a little fresh air and comeback to begin from the first exercise.

Is is very important to understand how the joining of data is carried out, without this the development that we do can have unpredictable results.

Let's shift gears and start with the SELECT command syntax.

SELECT [DISTINCT] expression1 [AS nom-attribute] {, expression-i [as
nom-attribute-i]} 
       [INTO TABLE classname] 
       [FROM from-list] 
       [WHERE where-clause] 
       [GROUP BY attr_name1 {, attr_name-i....}] 
       [ORDER BY attr_name1 [ASC | DESC ] [USING op1 ] {,
nom-attribute-i...}] 
       [UNION {ALL} SELECT ...]

Step by step:

`DISTINCT`:	This is to eliminate duplicate row returned
`expression1`:	what we want returned, normally a column for a table from the list FROM
`AS nom-attribute`:	an alias for the name of a column, example: manu=> select cod_country from countries; cod_country - ----------- 1 2 3 (3 rows) Now with the alias: manu=> select cod_country as countr from countries; countr - ------ 1 2 3 (3 rows)
`INTO TABLE`:	allow the resulting row to be inserted directly into another table (see INSERT ... SELECT...)
`FROM`:	list of input tables
`WHERE`:	selection statement (union and selection criteria).
`GROUP BY`:	group criteria, certain functions that are used in expressions might need grouping
`ORDER BY`:	Order criteria of the returned rows, ASC ascending order, DESC descending order, USING if the column that defines the order is not in the list (expression)
`UNION ALL SELECT`:	This says to add to the results of the first SELECT this second SELECT that can be different tables, but return the same number of columns.

We have seen that the commands SELECT not only returns items from the DB but it can also modify them:

SELECT SUM(salary * 1.1) - SUM(salary) AS increment FROM employees;

This will return the pay increment of a pay increase more then 10%.

Let's look at the functions that we have available:

COUNT():	returns the number of rows that aren't NULL
SUM():	returns the total sum of a column of numbers
AVG():	returns the average of a column of numbers
MIN():	returns the minimum value in a column
MAX():	returns the maximum value in a column
FLOAT(int):	returns a `FLOAT8, FLOAT(12345)`
FLOAT4(int):	returns a `FLOAT4, FLOAT4(12345)`
INT(float):	returns an `INT` from a `FLOAT/4, INT(123.456)`
LOWER(text):	returns lower-case text
UPPER(text):	returns upper-case text
LPAD(text, long, char):	fill to the left with `char` the length of `long` the column `text`
RPAD(text, long, char):	fill to the right with `char` the length of `long` the column `text`
LTRIM(text, char):	remove from the left of `text` all of the characters `char`
RTRIM(text, char):	remove from the right of `text` all of the characters `char`
POSITION(string IN text):	extract from `text` the position of `string`, but IT DOESN'T WORK
SUBSTR(text,from[,to]):	extract the substring of `text`, from the position `from` to the position of `to` or the end of the string
DATETIME(date, hour):	converts a datetime format to date (YYYY-MM-DD) and an hour (HH:MM)

These were a few of the functions that exist in SQL, these are those that are defined in ANSI SQL and are also present in Postgres95.

Details of WHERE

Until now, we have seen that the section of WHERE from the SELECT we put things like:

AND column = value

This is a small example of what we put or combine:

AND, OR, NOT, IN, IN ALL, =, !=, >, <, (SELECT....), LIKE also the parenthesis are relevant, example:

WHERE 
   column IN (SELECT DISTINCT column FROM table WHERE ....)
   column IN ('value1','value2','value3',...)
   (column = 'value' and column = 'other_value' OR column != 'value')

!= the same as saying NOT EQUAL

LIKE permits searching for a string within a column with wildcards:

WHERE column LIKE '%Pepito%'

The % is a wildcard, in the example, it's true if "Pepito" is in the string

WHERE column LIKE 'Pepito%'

will be true if "Pepito" is at the beginning of the string

WHERE column LIKE '%Pepito'

will be true if "Pepito" is at the end of the string

There's not enough time here to list all of the possible options used with WHERE, the limit is in the imagination of the programmer or the limits of the individual process parser.

Now we can leave the SELECT command and concentrate on the last two.

UPDATE command

The UPDATE command allows modification of one or more rows, depending on the defined condition in WHERE

SYNTAX:

UPDATE table SET column-1 = expression-1 
                 [, column-i = expression-i] 
       [WHERE condition]

Where:

table:	the table to modify, only one table can be modified at a time
column:	the column that will be modified
expression:	the value that the column will receive. This value can be static or the results of a function
condition:	the condition that define the modification boundaries, here the defined rules are applied for the `SELECT`

DELETE command

The DELETE command allows modification of one or more row of a table.

SYNTAX

DELETE FROM table
       [WHERE condition]

Where:

table:	the table where rows will be deleted, only one table can be deleted from at a time
condition:	the condition that define the modification boundaries, here the defined rules are applied for the `SELECT` NOTE: without the `WHERE` ALL of the rows will be deleted from the table

References

SQL: Introduction to SQL. Installation of PostgreSQL, first article of the series.

Translated by Dallas L. Gale