original in es Javier Palacios Bermejo
es to en Javier Palacios Bermejo,Ruben Sotillo, Manuel Rodriguez
Javier is involved in a Ph. D. in Astronomy at a Spanish university where he administrates a workstation cluster. The daily work in his department is done on Unix machines. After some initial problems and trials slackware Linux was chosen. Linux turned out to be much better than some other proprietary Unix systems.
This article gives some insight in to the tricks that you can do with AWK. It is not a tutorial but it provides real live examples to use.
Originally, the idea to write this text came to me after reading a couple of articles published in _LF_ that were written by Guido Socher. One of them, about find and related commands, showed me that I was not the only one who used the command line. Pretty GUIs don't tell you how the things are really done (that's the way that Windows went years ago). The other article was about regular expressions. Although regular expressions are only slightly touched in this article, you need to know them to get the maximum from awk and other commands like sed and grep.
The key question is whether this awk command is really useful. The answer is
definitly yes!
It could be useful for a normal user to process text files, re-format them etc... For a system administrator AWK is really a very important utility.
Just
walk around /var/yp/Makefile
or look at the initialization scripts . AWK is used everywhere.
My first news about AWK are old enough for being forgotten. I had a colleague who
needed to work with some really big outputs from a small Cray.
The manual page for awk
on
the Cray was small, but he said that AWK looks very much like the thing
he needs although he did not yet understand how to use it.
A long time later, we are back in my life again. A colleague of mine used
AWK to extract the first column from a file with the command:
|
Once we have learned the lesson on how to extract a column we can do
things such as renaming files (append .new to "files_list"):
ls files_list | awk '{print "mv "$1" "$1".new"}' | sh |
... and more:
ls -1 *old* | awk '{print "mv "$1" "$1}' | sed s/old/new/2 | sh
ls -l * | grep -v drwx | awk '{print "rm "$9}' | sh
ls -l|awk '$1!~/^drwx/{print $9}'|xargs rm
ls -l | grep '^d' | awk '{print "rm -r "$9}' | sh
ls -p | grep /$ | wk '{print "rm -r "$1}'
ls -l|awk '$1~/^d.*x/{print $9}'|xargs rm -r
kill `ps auxww | grep netscape | egrep -v grep | awk '{print $2}'`
As you can see, AWK really helps when the same calculations are repeated over and over ... and apart from that it is much more fun to write an AWK program than doing almost the same thing 20 times manually.
awk
is a little programming language, with a syntax
close to C in many aspects. It is an interpreted language and
the awk
interpreter processes the instructions.
About the syntax of the awk command interpreter itself:
# gawk --help Usage: gawk [POSIX or GNU style options] -f progfile [--] file ... gawk [POSIX or GNU style options] [--] 'program' file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -W compat --compat -W copyleft --copyleft -W copyright --copyright -W help --help -W lint --lint -W lint-old --lint-old -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W version --versionInstead of simply quoting (') the programs in the command line, we can, as you can see above, write the instructions into a file, and call it with the option
-f
.
With command line defined variables using -v var=
Awk is, roughly speaking, a language oriented to manage tables. That is some information which can be grouped inside fields and records. The advantage here is that the record definition (and the field definition) is flexible.
Awk is powerful. It's designed for work with one-line records, but that point could be relaxed. In order to see in some of these aspects, we are going to look at some illustrative (and real) examples.
BEGIN { printf "LaTeX preample" printf "\\begin{tabular}{|c|c|...|c|}" } |
{ printf $1" & " printf $2" & " . . . printf $n" \\\\ " printf "\\hline" } |
END { print "\\end{document}" } |
|
( $1 == "====>" ) { NomObj = $2 TotObj = $4 if ( TotObj > 0 ) { FS = "|" for ( cont=0 ; cont<TotObj ; cont++ ) { getline print $2 $4 $5 $3 >> NomObj } FS = " " } } |
Acutally, the object name was not returned, and it was sligthly more complicated, but this is supposed to be an illustrative example. |
BEGIN { BEGIN_MSG = "From" BEGIN_BDY = "Precedence:" MAIN_KEY = "Subject:" VALIDATION = "[MONTH REPORT]" HEAD = "NO"; BODY = "NO"; PRINT="NO" OUT_FILE = "Month_Reports" } { if ( $1 == BEGIN_MSG ) { HEAD = "YES"; BODY = "NO"; PRINT="NO" } if ( $1 == MAIN_KEY ) { if ( $2 == VALIDATION ) { PRINT = "YES" $1 = ""; $2 = "" print "\n\n"$0"\n" > OUT_FILE } } if ( $1 == BEGIN_BDY ) { getline if ( $0 == "" ) { HEAD = "NO"; BODY = "YES" } else { HEAD = "NO"; BODY = "NO"; PRINT="NO" } } if ( BODY == "YES" && PRINT == "YES" ) { print $0 >> OUT_FILE } } |
Maybe we are administrating a mailing list and from time to time,
some special messages are submitted to the list (for example, monthly reports)
with some specific format (subject as '[MONTH REPORT] month , dept').
Suddenly, we decide at the end of the year put together all these messages,
saving aside the others.
This can be done by processing the mail spool with the awk program on the left. To get each report written to an individual file means three extra lines of code. |
NOTE: This example assumes that the mail spool is structured as I think it is. This programs works for my mail. |
I've used awk for many other tasks (automatic generation of web pages with
information from simple databases) and I know enough about awk programming
to be sure that a lot of things can be done.
Just let your imagination fly.
A problemOne problem is that awk needs perfect tabular information, no holes, awk does e.g not work with fixed width columns. This is not problematic if we create by ourself the awk input: choose something uncommon to separate the fields, later we fix it withFS and we are done!!! If we already have the input
this could be a little more problematic.
For example
a table like this:
1234 HD 13324 22:40:54 .... 1235 HD122235 22:43:12 ....This is difficult to handle this with awk. Unfortunately this is quite common. If we have only one column with this characteristics, we can solve the problem (if anybody knows how to manage more than one column in a generic case, please let me know!). I had to face one of these tables, similar to the one described above. The second column was a name and it included a variable number of spaces. As it usually happens, I had to sort it using the last column. |
... and a solutionI realized that the column I wanted to sort was the last one and awk knows how many fields there are in the current registry. Therefore, it was enough to access the last one (sometimes$4 , and
sometimes $5 , but always NF ).
At the end of the day, the desired result was obtained:
awk '{ printf $NF;$NF = "" ;printf " "$0"\n" }' | sort This just shifts the last colum to the first position and you can sort it. Obviously, this method is easily applied to the third field starting from the end, or to the field which goes after a control field which has always the same value. Just use your ideas and imagination. |
Up to now, nearly all the examples process all the input file lines. But, as also the manual page states, it is possible to process only some of the input lines. One must just preceed the group of commands with the condition the line should meet. The matching condition could be very flexible, variing from a simple regular expression to a check on the contents of some field, with the possibility of grouping conditions with the proper logical operators.
As any other programming language, awk
implements all the
necessary flow control structures, as well as a set of operators and
predefined functions to deal with numbers and strings.
It's possible, of course, to include user defined functions with the keyword function. Apart from the common scalar variables, awk is also able to manage variable sized arrays.
As it happens in any programming language, there are some very common functions
and it becomes uncomfortable to cut and paste pieces of
code. That's the reason why libraries exist. With the GNU
version of awk
, is possible include them within the
awk
program. This is however an outlook to the things which
are possible and outside the scope of this article.
AWK is very appropriate for the purposes for which it was build: Read data line by line and act upon the strings and patterns in the lines.
Files like /etc/password
turn out to be ideal for
reformatting and processing with AWK. AWK is invaluable for such
tasks.
Of course AWK is not alone. Perl is a strong competitor but still it is worthwhile to know some AWK tricks.
This kind of very basic commands and is not very well documented, but you can find something when looking around.
man awk
Usually, all books on unix mention this command, but only some of them treat it in detail. The best we can do, is to browse any book we get into our hands. You never know where useful information can be found.