Perl part III
Abstract:
Perl part I provided a general overview about Perl.
In perl part II the first useful program was written. In part III we will now take a closer look at
arrays.
Arrays
An array consists of a list of variables which can be accessed by an
index. We have seen that "normal variables", also called scalar
variables, start their name with a dollar sign ($). Arrays start with a
@-sign however the data inside the array consists of several scalar variables.
You must therefore again write a dollar sign when you refer to the individual
fields in the array. Let's look at an example:
!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
# declare a new array variable:
my @myarray;
# initialize it with some data:
@myarray=("data1","data2","data3");
# access the first element (at index 0):
print "the first element of myarray is: $myarray[0]\n";
|
As you can see we write @myarray when we refer to the whole thing
and $myarray[0] when we refer to an individual element.
Perl arrays start at index 0. New indices are automatically created
as soon as you assign data. You do not have to know how big your array
will be at declaration time. As you can see above you can
initialize arrays with a whole bunch of data by
listing the data comma separated inside round braces.
("data1","data2","data3")
is really an anonymous array. You can therefore write
("data1","data2","data3")[1]
to get the second element from this anonymous array:
!/usr/bin/perl -w
print "The second element is:"
print ("data1","data2","data3")[1];
print "\n"
|
Loops over arrays
The foreach loop in perl gives you the possibility to iterate over all
the elements of an array. It works as follows:
#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
my @myarray =("data1","data2","data3");
my $lvar;
my $i=0;
foreach $lvar (@myarray){
print "element number $i is $lvar\n";
$i++;
}
|
Running this program produces:
element number 0 is data1
element number 1 is data2
element number 2 is data3
|
The foreach statement takes each element out of the array and puts it
into the loop variable ($lvar in the example above). It is important
to note that the values are not copied out to the array into the loop
variable. Instead the loop variable is some kind of pointer and modifying
the loop variable modifies the elements in the array.
The following program makes all elements in the array upper case.
The perl tr/a-z/A-Z/ is similar to the unix command "tr". It
translates in this case all letters to upper case.
#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
my @myarray =("data1","data2","data3");
my $lvar;
print "before:\n";
foreach $lvar (@myarray){
print "$lvar\n";
$lvar=~tr/a-z/A-Z/;
}
print "\nafter:\n";
foreach $lvar (@myarray){
print "$lvar\n";
}
|
When you run the program then you can see that @myarray contains
in the second loop only upper case values:
before:
data1
data2
data3
after:
DATA1
DATA2
DATA3
|
The command line
We have seen in Perl II that a function &getopt can be used to read
the command line and any options provided on the command line.
&getopt is like the C equivalent. It is a library function. The values
of the command line get in Perl assigned to an array called @ARGV.
&getopt only takes this @ARGV and evaluates the elements.
Unlike in C the content of the first element in the array is not
the program name but the first command line argument. If you want
to know the name of the perl program then you need to read $0 but
that is not the subject of this article. Here is an
example program called add. It takes 2 numbers
from the command line and adds them:
.... and here is the program:
#!/usr/bin/perl -w
# check if we have 2 arguments:
die "USAGE: add number1 number2\n" unless ($ARGV[1]);
print "$ARGV[0] + $ARGV[1] is:", $ARGV[0] + $ARGV[1] ,"\n";
|
A stack
Perl has a number of build in functions which use an array as a stack.
- push adds an element to the end of the array
- pop reads an element from the end of the array
- shift reads an element from the beginning of the array
- unshift adds an element to the beginning of the array
The following program adds two elements to an already existing
array:
#!/usr/bin/perl -w
my @myarray =("data1","data2","data3");
my $lvar;
print "the array:\n";
foreach $lvar (@myarray){
print "$lvar\n";
}
push(@myarray,"a");
push(@myarray,"b");
print "\nafter adding \"a\" and \"b\":\n";
while (@myarray){
print pop(@myarray),"\n";
}
|
Pop removes the elements from the end of the array and the while loop runs until
the array is empty.
Reading directories
Perl offers the functions opendir, readdir and closedir to read out the
content of a directory. readdir returns an array with all the file names.
Using a foreach loop you can iterate over all the file names and
search for a given name. Here is a simple program that searches for
a given filename in the current directory:
#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
die "Usage: search_in_curr_dir filename\n" unless($ARGV[0]);
opendir(DIRHANDLE,".")||die "ERROR: can not read current directory\n";
foreach (readdir(DIRHANDLE)){
print"\n";
print "found $_\n" if (/$ARGV[0]/io);
}
closedir DIRHANDLE;
|
Let's look at the program. First we check that the user provided a
command line argument. If not then we print usage information and exit.
Next we open the current directory ("."). opendir is similar to
the open functions for files. The first argument is a file descriptor that
you need to pass to the readdir and closedir functions. The second
argument is the path to the directory.
Next comes the foreach loop. The first interesting thing is that
the loop variable is missing. Perl does in this case something magic
for you and creates a variable called $_ which is then used as loop
variable. readdir(DIRHANDLE) returns an array and we use foreach to
look at each element. /$ARGV[0]/io matches (compares) the regular expressions contained in $ARGV[0] against the variable $_.
The io means search case insensitive and compile the regular expressions
only once. The latter one is an optimization which makes the program faster.
You can use it when you have a variable inside a regular expression and
you can guarantee that this variable does not change at run time.
Let's try it. Assuming we have the files article.html, array1.txt
and array2.txt in the current directory then searching for "HTML"
will print:
>search_in_curr_dir HTML
.
..
article.html
found article.html
array1.txt
array2.txt
As you can see the readdir function found 2 more files. "." and
"..". These are the names of the current and
previous directory.
A file finder
I would like to finish this article with a more complex and useful
program. It should be a file finder program. We call it pff (perl file
finder). It shall work basically like the program above but search also
sub-directories. How can we design such a program? Above we have some code
that reads the current directory and searches for files in it. We need
to start with the current directory but if one of the files (except
. and ..) is again a directory then we need to search in there. This
is a typical recursive algorithm:
sub search_file_in_dir(){
my $dir=shift;
...read the directory $dir ....
...if a file is again a directory
then call &search_file_in_dir(that file)....
}
You can test in perl if a file is a directory and not a symlink to a directory by using
if (-d "$file" && ! -l "$dir/$_"){....}.
Now we have all functionality
that we need and we can write the actual code (pff.gz).
#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
# written by: guido socher, copyright: GPL
#
&help unless($ARGV[0]);
&help if ($ARGV[0] eq "-h");
# start in current directory:
search_file_in_dir(".");
#-----------------------
sub help{
print "pff -- perl regexp file finder
USAGE: pff [-h] regexp
pff searches the current directory and all sub-directories
for files that match a given regular expression.
The search is always case insensitive.
EXAMPLE:
search a file that contains the string foo:
pff foo
search a file that ends in .html:
pff \'\\.html\'
search a file that starts with the letter \"a\":
pff \'^a\'
search a file with the name article<something>html:
pff \'article.*html\'
note the .* instead of just *
\n";
exit(0);
}
#-----------------------
sub search_file_in_dir(){
my $dir=shift;
my @flist;
if (opendir(DIRH,"$dir")){
@flist=readdir(DIRH);
closedir DIRH;
foreach (@flist){
# ignore . and .. :
next if ($_ eq "." || $_ eq "..");
if (/$ARGV[0]/io){
print "$dir/$_\n";
}
search_file_in_dir("$dir/$_") if (-d "$dir/$_" && ! -l "$dir/$_");
}
}else{
print "ERROR: can not read directory $dir\n";
}
}
#-----------------------
|
Let's look at the program a bit. First we test if the user has
provided an argument on the command line. If not then this
is an error and we print a little help text. We print also a help
text if option -h was given.
Otherwise we start to search in the current directory. We use the
recursive algorithm as described above. Read the directory, search the files,
test if a file is a directory, if yes call search_file_in_dir() again.
In the statement where we check for directories we check also that
it is not a link to a directory. We need to do that because someone may
have created a sym-link to "..". Such a link would cause the program
to run for ever if we did not have that check.
The next if ($_ eq "." || $_ eq ".."); is a statement which we did not discuss yet.
The "eq" operator is the perl string compare operator. Here we
test if the content of variable $_ is equal to ".." or ".".
If it is equal then the "next" statement is executed. "next"inside a foreach loop means start again at the top of the loop with the next
element in the array. It is similar to the C-statement "continue".
References
Here is a list of other interesting perl tutorials.
Talkback form for this article
Every article has its own talkback page. On this page you can submit a comment or look at comments from other readers:
2001-01-27, generated by lfparser version 2.8