Perl provides a nice programming interface to many features which were sometimes difficult to use in other languages. For example, to analyze the files exported from dBASE a large amount of text has to be scanned and certain fields have to be cut out of each line based on a pattern represented as a regular expression.
Another strength of perl is its ability to cooperate with other
programs like the small UNIX tools
. Those two features will be
demonstrated in a first example . Let's say, we want some
statistics about the users on a certain system. We want to use the
information given by UNIX command last. It is our
interest, how often each user logged in, and how long he was working.
The output of the command last has the following
format:
login tty hostname date start end time logged in.
A sample line:
melzer tty1 turing.mathematik Sat Jun 15 21:05 - 23:35 (02:29)
The following example will do the desired. It reads the output of the
last command and prints an entry for each user on the system,
showing the total login time and the number of logins for
each
.
1 #!/usr/local/bin/perl -w
2
3 open(DB,"last |") or die "Could not execute last :-(";
4 while (<DB>) {
5 if (/^(\S*)\s*.*\((.*):(.*)\)$/) {
6 $hours{$1} += $2;
7 $minutes{$1} += $3;
8 $logins{$1}++;
9 }
10 }
11
12 foreach $user (sort(keys %logins)) {
13 $hours{$user} += int($minutes{$user} / 60);
14 $minutes{$user} %= 60;
15 $totaltime =
sprintf("%02d:%02d", $hours{$user}, $minutes{$user});
16 write;
17 }
18
19 format STDOUT_TOP =
20 User Total login time Total logins
21 -------------- -------------------- --------------------
22 .
23 format STDOUT =
24 @<<<<<<<<<<<<< @<<<<<<<< @####
25 $user, $totaltime, $logins{$user}
26 .
First, the filehandle DB is assigned to the reading end of a pipe the last command writes to. Without the pipe symbol | the file last would be opened for reading. If an error occures, the program terminates with an error message.
In line 4 is the head of a while loop which ends in line 10
and is executed until its condition is false
. However, the conditional expression ``<DB>'' looks
funny. This expression uses the filehandle DB to read one line
from it. It is false if there is nothing left to read. In
perl a lot of commands allow the absence of a variable and use
$_
instead. So in this case, the line read will
be assigned to the variable $_.
The evil looking mess on line 5 is just an if
statement
. So lines 6-8 are
executed if the condition within the braces is true. The
strange looking condition is just a regular expression which
is true if a string match is possible. Again, since no input is given,
the contents of the variable $_ is used for the pattern
match. This regular expression
filters the login name and the time he was logged in out of the line
provided by the last command and stores the name in
$1, the hours in $2, and the minutes in
$3
. The rest of the information is ignored. To help
reading regular expressions , some of
the most important characters in such an expression have been compiled
in a list
:
\ | Quote the next metacharacter |
^ | Match the beginning of the line |
. | Match any character (except newline) |
$ | Match the end of the line |
| | | Alternation |
() | Grouping |
[] | Character class |
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match 1 or 0 times |
{n} | Match exactly n times |
{n,} | Match at least n times |
{n,m} | Match at least n but not more than m times |
*? | Match 0 or more times |
+? | Match 1 or more times |
?? | Match 0 or 1 time |
{n}? | Match exactly n times |
{n,}? | Match at least n times |
{n,m}? | Match at least n but not more than m times |
\w | Match a ``word'' character (alphanumeric plus ``_'') |
\W | Match a non-word character |
\s | Match a whitespace character |
\S | Match a non-whitespace character |
\d | Match a digit character |
\D | Match a non-digit character |
These three pieces of information are processed in the body of the
if statement in lines 6-8 through the use of a
hash
. Hashes
are similar to normal arrays, but instead of integers any scalar value
may be used as a subscript. In this example, the user's login name is
used to access the three hashes. As an example, referencing the
variable $logins{'melzer'} will return
how often the user melzer was logged in. In perl , the
increment operators like += are equivalent to
the corresponding operators in C.
The next interesting spot is in line 12 -- the foreach
statement. It iterates over a list of values, and sets the variable
$user to each element of the list in turn
. Once more, if the variable $user
would be omitted, $_ would be used automatically.
The list required for the foreach loop is generated by the
function keys which returns a list of all keys used to index a
hash which is in this case a list of all login names. This list gets
sorted by the command sort
first. Therefore, the program is looping over a sorted list of login
names, assigning each name in turn to the variable $user.
After the probably self-explaining lines 13-14, line 15 has been used
to demonstrate perl's output abilities
. First, the total
login time is formated nicely
, and then
another nice feature of perl is demonstrated. The write
command generates a small report using the definition at the end of the
program. The STDOUT_TOP definition in line 20-23 describes
the header of the report, to be printed at the top of each page of the
output. In this case lines 21 and 22 are printed without any
substitutions. The STDOUT format starting in line 24 describes
the look of each line of the output. It is used every time the
write command is executed. Like the parameters of the
printf command, it can be subdivided in two parts. The first
line is similar to the format string, and the second line contains the
variables to be used. Each field used in the format part starts with a
@ character, followed by information about the justification.
@<<<< for instance signifies five left-justified text
characters, while @#### specifies a numeric field with five
digits which are displayed right-justified.
This little program might produce an output like this
:
User Total login time Total logins -------------- -------------------- -------------------- atl 01:46 1 melzer 12:34 11 meneghin 00:01 1 plonka 07:02 5 swg 447:51 35
Some additional information:
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while $_ = <STDIN>;
print while <STDIN>;