original in es Manuel Muriel Cordero
The previous article in this series (Basic UNIX commands) gave a general overview over Linux. It was an introduction to the Linux elements, in order to be able to get basic skills and manage the operating system, but the user may want to learn the usual set of Unix commands. Using these commands and the shell you can achieve very efficient file and system management. This article will deal with those advanced, although basic, tools.
Before describing the commands, the reader should know some facts about their history. Kem Thompsom and Dennis Ritchie, when developing Unix at the begin of the seventies, wanted to make an operating system to ease the life of programmers. They decided that the best way to achieve that goal was defining a small number of simple tools extremely good in some specialized tasks. More complicated tasks could be performed just by joining those tools using the output from one as the input for others.
This idea to send the information it is done using the standard input and output (screen and keyboard). Due to existence of pipes and redirection (seen in the previous article) it is possible to combine commands.
It is very easy to demonstrate using an example. A user writes:
$ who | grep pepe
who and grep are two separate programs joined with the pipe "|" . who shows a list with every user logged on the computer at this moment. The typical output could be something like:
$ who manolo tty1 Dec 22 13:15 pepe ps/2 Dec 22 14:36 root tty2 Dec 22 10:03 pepe ps/2 Dec 22 14:37
The output is composed of 4 fields separated with tabs. The fields are the username (login), the login terminal, date and time for connection.
"grep pepe" searchs the lines matching the string "pepe".
And the output is:
$ who | grep pepe pepe ps/2 Dec 22 14:36 pepe ps/2 Dec 22 14:37
Maybe you want something simpler, than knowing if somebody is logged or not. You can check the number of terminals being used in that moment by using the program wc.
wc is a character, words and lines counter. In this case, we only need to know the number of lines, and we use the option -l
$ who | wc -l 4 $ who | grep pepe | wc -l 2
4 people are logged in in total and pepe is logged in at 2 terminals
If we check now for antonio
$ who | grep antonio | wc -l 0
antonio is not logged
Richard Stallman, the founder of the GNU project, raised the discussion about the control over the Unix OS by a few large software companies , which prevented computer science to grow up naturally. After developing the emacs editor while working at MIT, he very disliked the fact that big commercial firms took his work to make proprietary versions. Confronted with this situation, he decided to begin a project where the source code of the software was available to everybody. That was GNU. The long-term target was to make a whole open-source operating system. The first steps were a new open-source version of emacs and a C compiler (gcc), as well as some typical tools for unix systems. These tools are discussed in this article.
Our first example showed the main functionality of grep. Now we will explain it in greater detail
Basic usage of grep is
$ grep [-options] pattern files
And the most used options are
-n prints the line number before the matched lines (useful for search in
big files and to know exactly where the match is located)
-c prints the number of matches found
-v search for non-matching lines (search for lines where the pattern
is not present)
The pattern is any group of characters to search. If there is a blank embeded, the pattern must be double-quoted (") to prevent confusion between the pattern and the files to be searched. For example
$ grep "Hola mundo" file.txt
If we are looking for strings including wildcards, apostrophes, quotes or slashes they must be escaped (preceeded by a backslash) or quoted, to avoid substitution from the shell.
$ grep \*\"\'\?\< file.txt This finds: Esto es una cadena chunga -> *"'?<
grep and other GNU utils are able to perform more advanced searches. That
is
possible with regular expressions. They are similar to shell wildcards, in
the sense that they replace characters or groups of characters.
Under the resource at the end of the article you find also a link to
a separate article explaining regular expressions in detail.
Some
examples:
$ grep c.nsearch for any occurrence of a string with c, any character and t.
$ grep "[Bc]el"search every occurrence of Bel and cel.
$ grep "[m-o]ata"find those lines containing mata, nata or oata.
$ grep "[^m-o]ata"Lines with a string ending in ata and not containing m,n or o as their first letter.
$ grep "^Martin come"Every line beginning with 'Martin come'. As ^ is out of brackets, it means the beginning of a line, not a negation of a group as in the previous example.
$ grep "durmiendo$"All the lines finishing with the string 'durmiendo'. $ remains for the end of line.
$ grep "^Caja San Fernando gana la liga$"Those lines exactly matching the string.
To avoid the special meaning of any of these characters they must be backslashed. For example:
$ grep "E\.T\."search for the string 'E.T.'.
This command is used to find files. Another _LF_ article explains this command, and the best we can do is to point to it.
Within Unix, the information used to be stored in ASCII files with line-records, and fields delimited with some special character, usually a tabulation mark or a colon (:). A typical use case is to take some fields from a file and join them into another one. cut and paste are able to do this work.
Let us use as an example the file /etc/passwd, with the users information. It contains 7 fields, delimited with ":". The fields contain information about login name, encrypted password, user ID, group ID, geco, home directory for the user and his preferred shell.
Here is a typical piece from this file:
root:x:0:0:root:/root:/bin/bash murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh wizardi:x:502:502:Wizard para nethack:/home/wizard:/bin/bash
If we want to pair the user with their shells, we must cut fields 1 and seven. Let's go:
$ cut -f1,7 -d: /etc/passwd root:/bin/bash murie:/bin/bash practica:/bin/ksh wizard:/bin/bashThe option -f specifies the fields to cut, and -d defines the field separator (tabulation mark is the default).
And it is possible to select a range of fields:
$ cut -f5-7 -d: /etc/passwd root:/root:/bin/bash Manuel Muriel Cordero:/home/murie:/bin/bash Usuario de practicas para Ksh:/home/practica:/bin/ksh Wizard para nethack:/home/wizard:/bin/bash
If we have redirected the output using '>' to two different files and we want to join both outputs, we can use the command paste:
$ paste output1 output2 root:/bin/bash:root:/root:/bin/bash murie:/bin/bash:Manuel Muriel Cordero:/home/murie:/bin/bash practica:/bin/ksh:Usuario de practicas para Ksk:/home/practica:/bin/ksh wizard:/bin/bash:Wizard para nethack:/home/wizard:/bin/bash
Letīs assume that we want to sort /etc/passwd using the geco field. To achieve this, we will use sort, the unix sorting tool
$ sort -t: +4 /etc/passwd murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash root:x:0:0:root:/root:/bin/bash
It is very easy to see that the file has been sorted, but using the ASCII table order. If we donīt want to make a difference among capital letter, we can use:
$ sort -t: +4f /etc/passwd murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash root:x:0:0:root:/root:/bin/bash practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash
-t is the option to select the field separator. +4 stands for the number of field to jump before ordering the lines, and f means to sort regardless of upper and lowercase.
A much more complicated sort can be achieved. For example, we can sort using the shell in a first step then sort using the geco:
$ sort -t: +6r +4f /etc/passwd practica:x:501:501:Usuario de practicas para Ksh:/home/practica:/bin/ksh murie:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash root:x:0:0:root:/root:/bin/bash wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash
You have a file with some people you lend money and the amount of money you gave them. Take īdeudas.txtī as an example:
Son Goku:23450 Son Gohan:4570 Picolo:356700 Ranma 1/2:700
If you want to know the first one to īvisitī, you need a sorted list.
Just type
$ sort +1 deudas Ranma 1/2:700 Son Gohan:4570 Son Goku:23450 Picolo:356700which is not the desired result because the number of fields is not the same across the file. The solution is the īnī option:
$ sort +1n deudas Picolo:356700 Son Goku:23450 Son Gohan:4570 Ranma 1/2:700
Basic options for sort are
+n.m jumps over the first n fields and the next m characters before begin
the sort
-n.m stops the sorting when arriving to the m-th character of the n-th field
The following are modification parameters:
-b jumps over leading whitespaces
-d dictionary sort (just using letters, numbers and whitespace)
-f ignores case distinction
-n sort numerically
-r reverse order
As we have seen before, wc is a character, word and line counter. Default output contains the number of lines, words and characters of the input file(s).
The output type is modifiable with the options
-l just lines
-w only show word number
-c display the number of characters
Sometimes we need to know the differences between two versions of the same file. This is especially used in the programming area when various people work on the same project thus modifying source code. To find the variations from a version to the other, these tools are the right ones.
cmp is the most basic one. It compares two files and locates the place where the first difference appears (character number and line of the difference)
$ cmp old new old new differ: char 11234, line 333
comm is a bit more advanced. Its output provides 3 columns. The
first one contains the unique lines of the first file, the second one contains
the unique lines of the second file and the third one contains the common ones.
Numeric parameters allow removal of some of these columns.
-1, -2 and -3 tell comm
not to display the first, second or third column. This example shows the lines
only appearing in the first file and the common ones.
$ comm -2 old new
Last but not least there is diff. It's an essential tool for advanced programming projects. If you already downloaded a kernel to compile it, you know that you can download the source code of the new one or the patch for the previous version, this last being smaller. This patch has a diff suffix, what means it's a diff output. This tool can use editor commands (vi, rcs) to make files identical. This also applies to directories and the archives holding them. The use case is quite obvious : you download less source code (just the changes), you apply the patch and you compile. Without parameters, the output specifies in these formats how to make changes in such a way that the first one becomes equal to the second one, with vi commands.
$ diff old new 3c3 < The Hobbit --- > The Lord of the Rings 78a79,87 >Three Rings for the Elven-kings under the sky, >Seven for the Dwarf-lords in their halls of stone, >Nine for Mortal Men doomed to die, >One for the Dark Lord on his dark throne >In the Land of Mordor where the Shadows lie. >One Ring to rule them all, One Ring to find them, >One Ring to bring them all and in the darkness bind them >In the Land of Mordor where the Shadows lie.
3c3 means at line 3, three line have to be changed, removing "The Hobbit" and replacing it with "The Lord of the Rings". 78a79,87 means you must insert new lines from line 79 to 87.
uniq is a redundancy cleaner. For example, if we want to know the people actually connected to the computer, we must use the commands who and cut.
$ who | cut -f1 -d' ' root murie murie practica
But the output is not completely good. We need to delete the second entry for user murie. And that means
$ who | cut -f1 -d' ' | sort | uniq murie practica root
The line option -d' ' means that the fields separator is the white space, because the output from who use that character instead of tabulation marks.
uniq compares only consecutive lines. In our case the 2 "murie" appeared after each other but it could have been in a different order. It is therefore a good idea to sort the output before giving it to uniq.
sed is one of the most peculiar Unix tools. It means stream editor. Usual editing accepts interactively the modifications that user wants. sed allow us to create small shell scripts similar to batch files from MS-DOS. It give us the ability to modify the content of a file without user interaction. The editor's capabilities are rather complete, and going deeper into the subject will make this article too long. So, we will go for a brief introduction, leaving the man and info pages for the interested user.
sed is usually invoked as:
$ sed 'command' files
Take as example a file where we want to replace every presence of "Manolo" with "Fernando". Let's do it:
$ sed 's/Manolo/Fernando/g' file
And it returns through standard output the modified file. If you want to keep the result, just redirect with ">".
Many users will recognize there the common search & replace vi command. Actually, most of ":" commands (those which call to ex) are commands to sed.
Usually, sed instructions are composed by one or two address (to select
lines) and the command to execute. The address could be a line, a range of
lines or a pattern.
The most widely used commands are:
Command Action ------- ------ a\ adds the line after the addressed lines in the input c\ changes the addressed lines, writing the line d deletes the line(s) g makes global substitutions of the pattern instead of substitute only first appearance i\ insert lines after addressed lines p prints the actual line, even using -n option q finish (quit) when arriving the addressed line r file read a file, appending the contents to the output s/one/two replaces string "one" with string "two" w file copies the actual line to a different file = prints the line number ! command applies the command to the actual line
Using sed you can specify which lines or range of lines you want to act on:
$ sed '3d' filewill delete the third line of the file
$ sed '2,4s/e/#/' filewill substitute the first appearance of character "e" with the character "#" in lines 2 to 4 (including both).
Lines containing a string can be selected using regular expresions described above. For example
$ sed '/[Qq]ueen/d' songswill delete every line including the word "Queen" or "queen".
Itīs very easy to delete empty lines from a file using patterns
$ sed '/^$/d' filealthough those lines containing white spaces will not be deleted. To achieve this, you must use a slightly wider pattern
$ sed '/^ *$/d' filewhere the "*" character means any number of occurrences of the previous character, " " (space) in this case.
$ sed '/InitMenu/a\ > the text to append' file.txtThis example will search for the line containing string "InitMenu" inserting a new line after it. This example works only as shown with bash or sh as shell. You type until a\ then you hit return and type the rest.
Tcsh expands newlines inside quotes in a different way. Therefore you must us in tcsh a double backslash:
$ sed '/InitMenu/a\\ ? the text to append' file.txtThe ? comes from the shell just as the > in the bash example.
Last but not least: awk. Its peculiar name came from their original developers names: Alfred Aho Peter Weinberger and Brian Kernighan.
The awk program is one of the most interesting among Unix utilities. It is an evolved and complex tool that allows, from the command line, to perform a wide variety of actions.
It should be noticed that awk and sed are key pieces of the more complex
shell
scripts. The things you can do without C or any other compiled language is
really impressive. The SlackWare Linux distribution setup as well as many
CGI web programs are
Nowadays, the command line tools have been left aside, as it is too old for the actual window environment and with the arrival of PERL many shell scripts became substituted by perl scripts. It might look like these command line tools will be forgotten. However my own experience say that many applications can be done with a few lines in a shell script (including a small database manager). Apart from that you can be very very productive if you know how to use these commands and the shell.
If you join the power of awk and sed you can perform things very quickly and fast that are usually done with a small database manager plus a spread sheet.
Take a invoice where you find the articles you bought, how many pieces of each one and their prices per product. Let's call this file "sales":
oranges 5 250 peras 3 120 apples 2 360
It's a file with 3 fields, with tabulation marks as field separators. Now you want to define a fourth field with the total price per product.
$ awk '{total=$2*$3; print $0 , total }' sales oranges 6 250 1250 peras 3 120 360 apples 2 360 720
total is the variable which will contain the product of the values stored in the second and third fields. After calculation, the whole input line and the total value are printed.
awk is nearly a programming environment itself, very well suited to the automated work with information from text files. If you are interested in this tool, I encourage you to learn more, using the man and info pages of them.
Shell scripts are system commands sequences stored in a file to be executed.
Shell scripts are similar to batch files from DOS but more powerful. They allow users to make their own commands just combining existing ones.
Shell scripts are able to accept parameters, of course. They are stored in variables $0 (for the command/script name), $1, $2, ... up to $9. All the command line parameters can be referred with $*.
Any text editor can create shell scripts. To execute a script just type:
$ sh shell-scriptOr, much better, you can give execution permission with
$ chmod 700 shell-scriptand execute just typing the name:
$ shell-script
We will finish here this article and the discussion about shell scripts, that will be postponed for the future. The next article will introduce the most common Unix text editors: vi & emacs. Every Linux user should know them well.
This is an introductory article, and readers could learn more details within other _LF_ articles: