Perl part III

ArticleCategory: [Es gibt verschiedene Artikel Kategorien]

Software Development

AuthorImage:[Ein Bild von Dir]

TranslationInfo:[Author and translation history]

original in en Guido Socher

AboutTheAuthor:[Eine kleine Biographie über den Autor]

Guido is a long time Linux fan and Perl hacker. His Linux home page can be found at www.oche.de/~bearix/g/.

Abstract:[Here you write a little summary]

Perl part I provided a general overview about Perl. In perl part II the first useful program was written. In part III we will now take a closer look at arrays.

ArticleIllustration:[This is the title picture for your article]

ArticleBody:[The article body]

Arrays

An array consists of a list of variables which can be accessed by an index. We have seen that "normal variables", also called scalar variables, start their name with a dollar sign ($). Arrays start with a @-sign however the data inside the array consists of several scalar variables. You must therefore again write a dollar sign when you refer to the individual fields in the array. Let's look at an example:

!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
# declare a new array variable:
my @myarray;
# initialize it with some data:
@myarray=("data1","data2","data3");
# access the first element (at index 0):
print "the first element of myarray is: $myarray[0]\n";

As you can see we write @myarray when we refer to the whole thing and $myarray[0] when we refer to an individual element. Perl arrays start at index 0. New indices are automatically created as soon as you assign data. You do not have to know how big your array will be at declaration time. As you can see above you can initialize arrays with a whole bunch of data by listing the data comma separated inside round braces.
("data1","data2","data3")
is really an anonymous array. You can therefore write ("data1","data2","data3")[1]
to get the second element from this anonymous array:

!/usr/bin/perl -w
print "The second element is:"
print ("data1","data2","data3")[1];
print "\n"

Loops over arrays

The foreach loop in perl gives you the possibility to iterate over all the elements of an array. It works as follows:

#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
my @myarray =("data1","data2","data3");
my $lvar;
my $i=0;
foreach $lvar (@myarray){
print "element number $i is $lvar\n";
$i++;
}

Running this program produces:

element number 0 is data1
element number 1 is data2
element number 2 is data3

The foreach statement takes each element out of the array and puts it into the loop variable ($lvar in the example above). It is important to note that the values are not copied out to the array into the loop variable. Instead the loop variable is some kind of pointer and modifying the loop variable modifies the elements in the array. The following program makes all elements in the array upper case. The perl tr/a-z/A-Z/ is similar to the unix command "tr". It translates in this case all letters to upper case.

#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
my @myarray =("data1","data2","data3");
my $lvar;
print "before:\n";
foreach $lvar (@myarray){
    print "$lvar\n";
    $lvar=~tr/a-z/A-Z/;
}
print "\nafter:\n";
foreach $lvar (@myarray){
    print "$lvar\n";
}

When you run the program then you can see that @myarray contains in the second loop only upper case values:

before:
data1
data2
data3

after:
DATA1
DATA2
DATA3

The command line

We have seen in Perl II that a function &getopt can be used to read the command line and any options provided on the command line. &getopt is like the C equivalent. It is a library function. The values of the command line get in Perl assigned to an array called @ARGV. &getopt only takes this @ARGV and evaluates the elements.
Unlike in C the content of the first element in the array is not the program name but the first command line argument. If you want to know the name of the perl program then you need to read $0 but that is not the subject of this article. Here is an example program called add. It takes 2 numbers from the command line and adds them:

> add 42 2
42 + 2 is:44

.... and here is the program:

#!/usr/bin/perl -w
# check if we have 2 arguments:
die "USAGE: add number1 number2\n" unless ($ARGV[1]);
print "$ARGV[0] + $ARGV[1] is:", $ARGV[0] + $ARGV[1] ,"\n";

A stack

Perl has a number of build in functions which use an array as a stack.

push adds an element to the end of the array
pop reads an element from the end of the array
shift reads an element from the beginning of the array
unshift adds an element to the beginning of the array

The following program adds two elements to an already existing array:

#!/usr/bin/perl -w
my @myarray =("data1","data2","data3");
my $lvar;
print "the array:\n";
foreach $lvar (@myarray){
print "$lvar\n";
}
push(@myarray,"a");
push(@myarray,"b");
print "\nafter adding \"a\" and \"b\":\n";
while (@myarray){
print pop(@myarray),"\n";
}

Pop removes the elements from the end of the array and the while loop runs until the array is empty.

Reading directories

Perl offers the functions opendir, readdir and closedir to read out the content of a directory. readdir returns an array with all the file names. Using a foreach loop you can iterate over all the file names and search for a given name. Here is a simple program that searches for a given filename in the current directory:

#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
die "Usage: search_in_curr_dir filename\n" unless($ARGV[0]);
opendir(DIRHANDLE,".")||die "ERROR: can not read current directory\n";
foreach (readdir(DIRHANDLE)){
print"\n";
print "found $_\n" if (/$ARGV[0]/io);
}
closedir DIRHANDLE;

Let's look at the program. First we check that the user provided a command line argument. If not then we print usage information and exit. Next we open the current directory ("."). opendir is similar to the open functions for files. The first argument is a file descriptor that you need to pass to the readdir and closedir functions. The second argument is the path to the directory.
Next comes the foreach loop. The first interesting thing is that the loop variable is missing. Perl does in this case something magic for you and creates a variable called $_ which is then used as loop variable. readdir(DIRHANDLE) returns an array and we use foreach to look at each element. /$ARGV[0]/io matches (compares) the regular expressions contained in $ARGV[0] against the variable $_. The io means search case insensitive and compile the regular expressions only once. The latter one is an optimization which makes the program faster. You can use it when you have a variable inside a regular expression and you can guarantee that this variable does not change at run time.
Let's try it. Assuming we have the files article.html, array1.txt and array2.txt in the current directory then searching for "HTML" will print:

>search_in_curr_dir HTML
.
..
article.html
found article.html
array1.txt
array2.txt

As you can see the readdir function found 2 more files. "." and "..". These are the names of the current and previous directory.

A file finder

I would like to finish this article with a more complex and useful program. It should be a file finder program. We call it pff (perl file finder). It shall work basically like the program above but search also sub-directories. How can we design such a program? Above we have some code that reads the current directory and searches for files in it. We need to start with the current directory but if one of the files (except . and ..) is again a directory then we need to search in there. This is a typical recursive algorithm:

sub search_file_in_dir(){
  my $dir=shift;
  ...read the directory $dir ....
  ...if a file is again a directory 
    then call &search_file_in_dir(that file)....
}

You can test in perl if a file is a directory and not a symlink to a directory by using if (-d "$file" && ! -l "$dir/$_"){....}.
Now we have all functionality that we need and we can write the actual code (pff.gz).

#!/usr/bin/perl -w
# vim: set sw=8 ts=8 si et:
# written by: guido socher, copyright: GPL
#
&help unless($ARGV[0]);
&help if ($ARGV[0] eq "-h");

# start in current directory:
search_file_in_dir(".");
#-----------------------
sub help{
    print "pff -- perl regexp file finder
USAGE: pff [-h] regexp

pff searches the current directory and all sub-directories
for files that match a given regular expression.
The search is always case insensitive.

EXAMPLE:
search a file that contains the string foo:
    pff foo
search a file that ends in .html:
    pff \'\\.html\'
search a file that starts with the letter \"a\":
    pff \'^a\'
search a file with the name article<something>html:
    pff \'article.*html\'
    note the .* instead of just *
\n";
    exit(0);
}
#-----------------------
sub search_file_in_dir(){
    my $dir=shift;
    my @flist;
    if (opendir(DIRH,"$dir")){
        @flist=readdir(DIRH);
        closedir DIRH;
        foreach (@flist){
            # ignore . and .. :
            next if ($_ eq "." || $_ eq "..");
            if (/$ARGV[0]/io){
                print "$dir/$_\n";
            }
            search_file_in_dir("$dir/$_") if (-d "$dir/$_" && ! -l "$dir/$_");
        }
    }else{
        print "ERROR: can not read directory $dir\n";
    }
}
#-----------------------

Let's look at the program a bit. First we test if the user has provided an argument on the command line. If not then this is an error and we print a little help text. We print also a help text if option -h was given. Otherwise we start to search in the current directory. We use the recursive algorithm as described above. Read the directory, search the files, test if a file is a directory, if yes call search_file_in_dir() again.

In the statement where we check for directories we check also that it is not a link to a directory. We need to do that because someone may have created a sym-link to "..". Such a link would cause the program to run for ever if we did not have that check.

The next if ($_ eq "." || $_ eq ".."); is a statement which we did not discuss yet. The "eq" operator is the perl string compare operator. Here we test if the content of variable $_ is equal to ".." or ".". If it is equal then the "next" statement is executed. "next"inside a foreach loop means start again at the top of the loop with the next element in the array. It is similar to the C-statement "continue".

References

Here is a list of other interesting perl tutorials.