Perl Quick Reference is designed as a reference
guide for the Perl language, rather than an introductory text. However, there
are some aspects of the language that are better summarized in a short paragraph
as opposed to a table in a reference section. Therefore, this part of the book
puts the reference material in context giving an overview of the Perl language,
in general.
Running Perl
The simplest way to run a Perl program is to invoke the Perl
interpreter with the name of the Perl program as an argument:
The name of the Perl file is sample.pl, and perl
is the name of the Perl interpreter. This example assumes that Perl is in the
execution path; if not, you will need to supply the full path to Perl, too:
/usr/local/hin/perl sample.pl
This is the preferred way of invoking Perl because it eliminates
the possibility that you might accidentally invoke a copy of Perl other than
the one you intended. We will use the full path from now on to avoid any confusion.
This type of invocation is the same on all systems with a command-line
interface. The following line will do the trick on Windows NT, for example:
UNIX systems have another way to invoke an interpreter on a script
file. Place a line like
at the start of the Perl file. This tells UNIX that the rest
of this script file is to be interpreted by /usr/local/bin/perl. Then
make the script itself executable:
You can then "execute" the script file directly and
let the script file tell the operating system what interpreter to use while
running it.
Windows NT, on the other hand, is quite different. You can use
File Manager (Explorer under Windows NT 4 or Windows 95) to create an association
between the file extension .PL and the Perl executable. Whenever a file ending
in .PL is invoked, Windows will know that Perl should be used to interpret it.
Perl takes a number of optional command-line arguments for various
purposes. These are listed in table 1. Most are rarely used but are given here
for reference purposes.
Table 1 Perl 5 Command-Line Switches
|
Option
|
Arguments |
Purpose |
Notes |
|
-0
|
octal character code |
Specify record separator |
Default is newline (\n) |
|
-a
|
none |
Automatically spli recordst |
Used with -n or or -p |
|
-c
|
none |
Check syntax only |
Do not execute |
|
-d
|
none |
Run script using Perl debugger |
If Perl debugging option was included when Perl was installed
|
|
-D
|
flags |
Specify debugging behavior |
See table 2 |
|
-e
|
command |
Pass a command to Perl from the command line |
Useful for quick operations |
|
-F
|
regular expression |
If -a used |
Expression to split by default is white space |
|
-i
|
extension |
Replace original file with results |
Useful for modifying contents of files |
|
-I
|
directory |
Specify location of include files |
|
|
-l
|
octal character code |
Drop newlines when used |
With -n and -p and use designated character
as line- termination character |
|
-n
|
none |
Process the script using each specified file as an argument
|
Used for performing the same set of actions on a set of files
|
|
-p
|
none |
Same as -n but each line is printed |
|
|
-P
|
none |
Run the script through the C preprocessor before Perl compiles
it |
|
|
-s
|
none |
Enable passing of arbitrary switches to Perl |
Use -s -what -ever to have the Perl variables $what
and $ever defined within your script |
|
-S
|
none |
Tell Perl to look along the path for the script |
|
|
-T
|
none |
Use taint checking; don't evaluate expressions supplied on
the command line |
|
|
-u
|
none |
Make Perl dump core after compiling your script; intended
to allow for generation of Perl executables |
Very messy; wait for the Perl compiler |
|
-U
|
none |
Unsafe mode; overrides Perl's natural caution |
Don't use this! |
|
-v
|
none |
Print Perl version number |
|
|
-w
|
none |
Print warnings about script syntax |
Extremely useful, especially during development |
| Tip |
The -e option is handy for quick Perl operations
from the command line. Want to change all instances of "oldstring"
in Wiffle.bat to "newstrong"? Try
perl -i.old -p -e "s/ oldstring/ newstrong/g"
wiffle.bat
This says: "Take each line of Wiffle.bat (-p);
store the original in Wiffle.old (-i); substitute
all instances of oldstring with newstrong (-e);
write the result (-p) to the original file (-i)."
|
You can supply Perl command-line arguments on the interpreter
invocation line in UNIX scripts. The following line is a good start to any Perl
script:
#!/usr/local/bin/perl -w -t
Table 2 shows the debug flags, which can be specified with the
-D command-line option. If you specify a number, you can simply add
all the numbers of each flag together so that 6 is 4 and 2. If you use the letter
as a flag then simply list all the options required. The following two calls
are equivalent:
#perl -d -D6 test.pl
#perl -d -Dls test.pl
Table 2 Perl Debugging Flags
|
Flag Number
|
Flag Letter
|
Meaning of Flag |
|
1
|
p
|
Tokenizing and parsing |
|
2
|
s
|
Stack snapshots |
|
4
|
l
|
Label stack processing |
|
8
|
t
|
Trace execution |
|
16
|
o
|
Operator node construction |
|
32
|
c
|
String/numeric conversions |
|
64
|
P
|
Print preprocessor command for -P |
|
128
|
m
|
Memory allocation |
|
256
|
f
|
Format processing |
|
512
|
r
|
Regular expression parsing |
|
1024
|
x
|
Syntax tree dump |
|
2048
|
u
|
Tainting checks |
|
4096
|
L
|
Memory leaks (not supported anymore) |
|
8192
|
H
|
Hash dump; usurps values() |
|
6384
|
X
|
Scratchpad allocation (Perl 5 only) |
|
32768
|
D
|
Cleaning up (Perl 5 only) |
A Perl program consists of an ordinary text file containing a
series of Perl commands. Commands are written in what looks like a bastardized
amalgam of C, shell script, and English. In fact, that's pretty much what it
is.
Perl code can be quite free-flowing. The broad syntactic rules
governing where a statement starts and ends are
- Leading white space is ignored. You can start a Perl statement anywhere
you want: at the beginning of the line, indented for clarity (recommended),
or even right-justified (definitely frowned on) if you like.
- Commands are terminated with a semicolon.
- White space outside of string literals is irrelevant; one space is as
good as a hundred. That means you can split statements over several lines
for clarity.
- Anything after a pound sign (#) is ignored. Use this to pepper your code
with useful comments.
Here's a Perl statement inspired by Kurt Vonnegut:
print "My name is Yon Yonson\n";
No prizes for guessing what happens when Perl runs this code;
it prints
If the \n doesn't look familiar, don't worry; it simply
means that Perl should print a newline character after the text; in other words,
Perl should go to the start of the next line.
Printing more text is a matter of either stringing together statements
or giving multiple arguments to the print function:
print "My name is Yon Yonson,\n";
print "I live in Wisconsin,\n",
"I work in a lumbermill there.\n";
That's right, print is a function. It may not look like
it in any of the examples so far, where there are no parentheses to delimit
the function arguments, but it is a function, and it takes arguments. You can
use parentheses in Perl functions if you like; it sometimes helps to make an
argument list clearer. More accurately, in this example the function takes a
single argument consisting of an arbitrarily long list. We'll have much more
to say about lists and arrays later, in the "Data Types" section.
There will be a few more examples of the more common functions in the remainder
of this chapter, but refer to the "Functions" chapter for a complete
run-down on all of Perl's built-in functions.
So what does a complete Perl program look like? Here's a trivial
UNIX example, complete with the invocation line at the top and a few comments:
#!/usr/local/bin/perl -w
# Show warnings
print "My name is Yon Yonson,\n";
# Let'sintroduce ourselves
print "I live in Wisconsin,\n",
"I work in a lumbermill there.\n";
# Remember the line breaks
That's not at all typical of a Perl program though; it's just
a linear sequence of commands with no structural complexity. The "Flow
Control" section later in this overview introduces some of the constructs
that make Perl what it is. For now, we'll stick to simple examples like the
preceding for the sake of clarity.
Perl has a small number of data types. If you're used to working
with C, where even characters can be either signed or unsigned, this makes a
pleasant change. In essence, there are only two data types: scalars and
arrays. There is also a very special kind of array called an associative
array that merits a section all to itself.
All numbers and strings are scalars. Scalar variable names start
with a dollar sign.
| Note |
All Perl variable names, including scalars, are case sensitive.
$Name and $name, for example, are two completely different
quantities.
|
Perl converts automatically between numbers and strings as required,
so that
$a = 2;
$b = 6;
$c = $a . $b; # The "." operator concatenates two
#strings
$d = $c / 2;
print $d;
yields the result
13
This example involves converting two integers into strings, concatenating
the strings into a new string variable, converting this new string to an integer,
dividing it by two, converting the result to a string, and printing it. All
of these conversions are handled implicitly, leaving the programmer free to
concentrate on what needs to be done rather than the low-level details of how
it is to be done.
This might be a problem if Perl were regularly used for tasks
where, for example, explicit memory offsets were used and data types were critical.
But for the type of task where Perl is normally used, these automatic conversions
are smooth, intuitive, and useful.
We can use this to develop the earlier example script using some
string variables:
#!/usr/local/bin/perl -w
# Show warnings
$who = 'Yon Yonson';
$where = 'Wisconsin';
$what = 'in a lumbermill';
print "My name is $who,\n";
# Let's introduce ourselves
print "I live in $where,\n",
"I work $what there.\n";
# Remember the line breaks
print "\nSigned: \t$who,\n\t\t$where.\n";
which yields
My name is Yon Yonson,
I work in Wisconsin,
I work in a lumbermill there.
Signed: Yon Yonson,
Wisconsin.
A collection of scalars is an array. An array variable name starts
with an @ sign, while an explicit array of scalars is written as a comma-separated
list within parentheses:
@trees = ("Larch", "Hazel", "Oak");
Array subscripts are denoted using square brackets: $trees[0]
is the first element of the @trees array. Notice that it's @trees
but $trees[0]; individual array elements are scalars, so they start
with a $.
Mixing scalar types in an array is not a problem. For example,
@items = (15, 45.67, "case");
print "Take $items[0] $items[2]s at \$$items[1] each.\n";
results in
Take 15 cases at $45.67 each.
All arrays in Perl are dynamic. You never have to worry about
memory allocation and management because Perl does all that stuff for you. Combine
that with the fact that arrays can contain arrays as sub-arrays, and you're
free to say things like the following:
@A = (1, 2, 3);
@B = (4, 5, 6);
@C = (7, 8, 9);
@D = (@A, @B, @C);
which results in the array @D containing numbers 1
through 9. The power of constructs such as
@Annual = (@Spring, @Summer, @Fall, @Winter);
takes some getting used to.
| Note |
An aspect of Perl that often confuses newcomers (and occasionally
the old hands too) is the context-sensitive nature of evaluations. Perl
keeps track of the context in which an expression is being evaluated
and can return a different value in an array context than in a scalar
context. In the following example
|
@A = (1, 2, 3, 4);
@B = @A;
$C = @A;
The array @B contains 1 through 4 while $C contains "4",
the number of values in the array. Thiscontext-sensitivity becomes more of an
issue when you use functions and operators that can take either a single argument
or multiple arguments. The results can be quite different depending on what
is passed to them.
Many of Perl's built-in functions take arrays as arguments. One
example is sort, which takes an array as an argument and returns the
same array sorted alphabetically:
print sort ( 'Beta', 'Gamma', 'Alpha' );
prints AlphaBetaGamma.
We can make this neater using another built-in function, join.
This function takes two arguments: a string to connect with and an array of
strings to connect. It returns a single string consisting of all elements in
the array joined with the connecting string. For example,
print join ( ' : ', 'Name', 'Address', 'Phone' );
returns the string Name : Address : Phone.
Since sort returns an array, we can feed its output
straight into join:
print join( ', ', sort ( 'Beta', 'Gamma', 'Alpha' ) );
prints Alpha, Beta, Gamma.
Note that we haven't separated the initial scalar argument of
join from the array that follows it: The first argument is the string to join
things with; the rest of the arguments are treated as a single argument, the
array to be joined. This is true even if we use parentheses to separate groups
of arguments:
print join( ': ', ('A', 'B', 'C'), ('D', 'E'), ('F', 'G', 'H', 'I'));
returns A: B: C: D: E: F: G: H: I. That's because of
the way Perl treats arrays; adding an array to an array gives us one larger
array, not two arrays. In this case, all three arrays get bundled into one.
| Tip |
For even more powerful string-manipulation capabilities,
refer to the splice function in "Functions" chapter.
|
There is a certain elegance to associative arrays that makes
experienced Perl programmers a little snobbish about their language of choice.
Rightly so! Associative arrays give Perl a degree of database functionality
at a very low yet useful level. Many tasks that would otherwise involve complex
programming can be reduced to a handful of Perl statements using associative
arrays.
Arrays of the type we've already seen are lists of values
indexed by subscripts. In other words, to get an individual element of an
array, you supply a subscript as a reference:
@fruit = ("Apple", "Orange", "Banana");
print $fruit[2];
This example yields Banana because subscripts start
at 0, so 2 is the subscript for the third element of the @fruit
array. A reference to $fruit[7] here returns the null value, as no
array element with that subscript has been defined.
Now, here's the point of all this: associative arrays are lists
of values indexed by strings. Conceptually, that's all there is to them. Their
implementation is more complex, obviously, as all of the strings need to be
stored in addition to the values to which they refer.
When you want to refer to an element of an associative array,
you supply a string (also called the key) instead of an integer (also
called the subscript). Perl returns the corresponding value. Consider the following
example:
This prints Banana as before. The first line defines
the associative array in much the same way as we have already defined ordinary
arrays; the difference is that instead of listing values, we list key/value
pairs. The first value is Apple and its key is Green;
the second value is Orange, which happens to have the same string for
both value and key; and the final value is Banana and its key is Yellow.
On a superficial level, this can be used to provide mnemonics
for array references, allowing us to refer to $Total{'June'} instead
of $Total[5]. But that's not even beginning to use the power of associative
arrays. Think of the keys of an associative array as you might think of a key
linking tables in a relational database, and you're closer to the idea:
%Folk = ( 'YY', 'Yon Yonson',
'TC', 'Terra Cotta',
'RE', 'Ron Everly' );
%State = ( 'YY', 'Wisconsin',
'TC', 'Minnesota',
'RE', 'Bliss' );
%Job = ( 'YY', 'work in a lumbermill',
'TC', 'teach nuclear physics',
'RE', 'watch football');
foreach $person ( 'TS', 'YY', 'RE' ) {
print "My name is $Folk{$person},\n",
"I live in $State{$person},\n",
"I $Job{$person} there.\n\n";
}
The foreach construct is explained later in the "Flow
Control" section; for now, you just need to know that it makes Perl execute
the three print statements for each of the people in the list after
the foreach keyword.
The keys and values of an associative array may be treated as
separate (ordinary) arrays as well, by using the keys and values
keywords respectively. For example,
print keys %Folk;
print values %State;
prints the string YYRETCWisconsinBlissMinnesota. String
handling will be discussed later in this chapter.
| Note |
There is a special associative array, %ENV, that
stores the contents of all environment variables, indexed by variable
name. So $ENV{'PATH'} returns the current search path, for
example. Here's a way to print the current value of all environment
variables, sorted by variable name for good measure:
|
foreach $var (sort keys %ENV ) {
print "$var: \"$ENV{$var}\".\n";
}
| Note |
The foreach clause sets $var to each of
the environment variable names in turn (in alphabetical order), and
the print statement prints each name and value. As the symbol
" is used to specify the beginning and end of the string
being printed, when we actually want to print a " we have
to tell Perl to ignore the special meaning of the character. This is
done by prefixing it with a backslash character (this is sometimes called
quoting a character).
|
We'll finish our look at Perl data types with a look at file
handles. Really this is not a data type but a special kind of literal string.
A file handle behaves in many ways like a variable, however, so this is a good
time to cover them. Besides, you won't get very far in Perl without them.
A file handle can be regarded as a pointer to a file from which
Perl is to read or to which it will write. C programmers will be familiar with
the concept. The basic idea is that you associate a handle with a file or device
and then refer to the handle in the code whenever you need to perform a read
or write operation.
File handles are generally written in all uppercase. Perl has
some useful predefined file handles, which are listed in table 3.
Table 3 Perl's Predefined File Handles
|
File Handle
|
Points To |
|
STDIN
|
Standard input, normally the keyboard. |
|
STDOUT
|
Standard output, normally the console. |
|
STDERR
|
Device where error messages should be written, normally the
console. |
The print statement can take a file handle as its first
argument:
print STDERR "Oops, something broke.\n";
Note that there is no comma after the file handle, which helps
Perl to figure out that the STDERR is not something to be printed.
If you're uneasy with this implicit list syntax, you can put parentheses around
all of the print arguments:
print (STDERR "Oops, something broke.\n");
Note that there is still no comma after the file handle.
| Tip |
Use the standard file handles explicitly, especially in complex
programs. It is sometimes convenient to redefine the standard input
or output device for a while; make sure that you don't accidentally
wind up writing to a file what should have gone to the screen.
|
The open function may be used to associate a new file
handle with a file:
open (INDATA, "/etc/stuff/Friday.dat");
open (LOGFILE, ">/etc/logs/reclaim.log");
print LOGFILE "Log of reclaim procedure\n";
By default, open opens files for reading only. If you
want to override this default behavior, add one of the special direction symbols
from table 4 to the file name. That's what the > at the start of
the file name in the second output statement is for; it tells Perl
that we intend to write to the named file.
Table 4 Perl File Access Symbols
|
Symbol
|
Meaning |
|
<
|
Opens the file for reading. This is the default action. |
|
>
|
Opens the file for writing. |
|
>>
|
Opens the file for appending. |
|
+<
|
Opens the file for both reading and writing. |
|
+>
|
Opens the file for both reading and writing. |
|
| (before file name)
|
Treat file as command into which Perl is to pipe text. |
|
| (after file name)
|
Treat file as command from which input is to be piped to Perl.
|
To take a more complex example, the following is one way to feed
output to the mypr printer on a UNIX system:
open (MYLPR, "|lpr -Pmypr");
print MYLPR "A line of output\n";
close MYLPR;
There is a special Perl operator for reading from files. It consists
of two angle brackets around the file handle of the file from which we want
to read, and it returns the next line or lines of input from the file or device,
depending on whether the operator is used in a scalar or an array context. When
no more input remains, the operator returns False.
For example, a construct like the following
while (<STDIN>) {
print;
}
simply echoes each line of input back to the console until the
Ctrl and D keys are pressed. That's because the print function takes
the current default argument here, the most recent line of input. Refer to the
"Special Variables" chapter later for an explanation.
If the user types
then the screen will look like
Note that in this case, <STDIN> is in a scalar
context, so one line of standard input is returned at a time. Compare that with
the following example:
In this case, because print expects an array of arguments
(it can be a single element array, but it's an array as far as print
is concerned), the <> operator obligingly returns all the contents
of STDIN as an array and print prints it. This means that
nothing is written to the console until the user presses the Ctrl and D keys:
This script prints out the contents of the file .signature, double-spaced:
open (SIGFILE, ".signature");
while ( <SIGFILE> ) {
print; print "\n";
}
The first print has no arguments, so it takes the current
default argument and prints it. The second has an argument, so it prints that
instead. Perl's habit of using default arguments extends to the <>
operator: if used with no file handle, it is assumed that <ARGV>
is intended. This expands to each line in turn of each file listed on the command
line.
If no files are listed on the command line, it is instead assumed
that STDIN is intended. So for example,
while (<>) {
print "more.... ";
}
keeps printing more.... as long as something other
than Ctrl+D appears on standard input.
| Note |
Perl 5 allows array elements to be references to any data
type. This makes it possible to build arbitrary data structures of the
kind used in C and other high-level languages, but with all the power
of Perl; you can, for example, have an array of associative arrays.
|
The examples we've seen so far have been quite simple, with little
or no logical structure beyond a linear sequence of steps. We managed to sneak
in the occasional while and foreach. Perl has all of the flow
control mechanisms you'd expect to find in a high-level language, and this section
takes you through the basics of each.
Let's start with two operators that are used like glue holding
Perl programs together: the || (or) and && (and) operators.
They take two operands and return either True or False depending on the operands:
$Weekend = $Saturday || $Sunday;
If either $Saturday or $Sunday is True, then
$Weekend is True.
$Solvent = ($income > 3) && ($debts < 10);
$Solvent is True only if $income is greater
than 3 and $debts is less than 10.
Now consider the logic of evaluating one of these expressions.
It isn't always necessary to evaluate both operands of either a &&
or a || operator. In the first example, if $Saturday is True,
then we know $Weekend is True, regardless of whether $Sunday
is also True.
This means that having evaluated the left side of an ||
expression as True, the righthand side will not be evaluated. Combine this with
Perl's easy way with data types, and you can say things like the following:
$value > 10 || print "Oops, low value $value ...\n";
If $value is greater than 10, the right side
of the expression is never evaluated, so nothing is printed. If $value
is not greater than 10, Perl needs to evaluate the right side to decide
whether the expression as a whole is True or False. That means it evaluates
the print statement, printing the message like
Okay, it's a trick, but it's a very useful one.
Something analogous applies to the && operator.
In this case, if the left side of an expression is False, then the expression
as a whole is False and so Perl will not evaluate the right side. This can be
used to produce the same kind of effect as our || trick but with the
opposite sense:
As with most Perl constructs, the real power of these tricks
comes when you apply a little creative thinking. Remember that the left and
right sides of these expressions can be any Perl expression; think of them as
conjunctions in a sentence rather than as logical operators and you'll get a
better feel for how to use them. Expressions such as
give a little of the flavor of creative Perl.
The &bust in that last line is a subroutine call,
by the way. Refer to the "Subroutines" section later in this chapter
for more information.
The basic kind of flow control is a simple branch: A statement
is either executed or not depending on whether a logical expression is True
or False. This can be done by following the statement with a modifier and a
logical expression:
open ( INFILE, "./missing.txt") if $missing;
The execution of the statement is contingent upon both the evaluation
of the expression and the sense of the operator.
The expression evaluates as either True or False and can contain
any of the relational operators listed in table 5, although it doesn't have
to. Examples of valid expressions are
Table 5 Perl's Relational Operators
| Operator |
Numeric Context
|
String Context
|
| Equality |
==
|
eq
|
| Inequality |
!=
|
ne
|
| Inequality with signedresult |
<=>
|
cmp
|
| Greater than |
>
|
gt
|
| Greater than or equal to |
>=
|
ge
|
| Less than |
<
|
lt
|
| Less than or equal to |
<=
|
le
|
| Note |
What exactly does "less than" mean when we're comparing
strings? It means "lexically less than." If $left
comes before $right when the two are sorted alphabetically,
$left is less than $right.
|
There are four modifiers, each of which behaves the way you might
expect from the corresponding English word:
- if The statement executes if the logical expression
is True and does not execute otherwise. Examples:
$max = 100 if $min < 100;
print "Empty!\n" if !$full;
- unless The statement does not execute if the logical
expression is True and executes otherwise. Examples:
open (ERRLOG, "test.log") unless $NoLog;
print "Success" unless $error>2;
- while The statement executes repeatedly until the
logical expression is False. Examples:
$total -= $decrement while $total > $decrement;
$n=1000; "print $n\n" while $n- > 0;
- until Thestatement executes repeatedly until the logical
expression is True. Examples:
$total += $value[$count++] until $total > $limit;
print RESULTS "Next value: $value[$n++]" until $value[$n] = -1;
Note that the logical expression is evaluated once only in the
case of if and unless but multiple times in the case of while
and until. In other words, the first two are simple conditionals, while
the last two are loop constructs.
The syntax changes when we want to make the execution of multiple
statements contingent on the evaluation of a logical expression. The modifier
comes at the start of a line, followed by the logical expression in parentheses,
followed by the conditional statements contained in braces. Note that the parentheses
around the logical expression are required, unlike with the single statement
branching described in the previous section. For example,
if ( ( $total += $value ) > $limit ) {
print LOGFILE "Maximum limit $limit exceeded.",
" Offending value was $value.\n";
close (LOGFILE);
die "Too many! Check the log file for details.\n";
}
This is somewhat similar to C's if syntax, except that
the braces around the conditional statement block are required rather than optional.
The if statement is capable of a little more complexity,
with else and elseif operators:
if ( !open( LOGFILE, "install.log") ) {
close ( INFILE );
die "Unable to open log file!\n";
}
elseif ( !open( CFGFILE, ">system.cfg") ) {
print LOGFILE "Error during install:",
" Unable to open config file for writing.\n";
close ( LOGFILE );
die "Unable to open config file for writing!\n";
}
else {
print CFGFILE "Your settings go here!\n";
}
The loopmodifiers (while, until, for,
and foreach) are used with compound statements in much the same way:
until ( $total >= 50 ) {
print "Enter a value: ";
$value = scalar (<STDIN>);
$total += $value;
print "Current total is $total\n";
}
print "Enough!\n";
The while and until statements were described
in the earlier "Conditional Expressions" section. The for
statement resembles the one in C: It is followed by an initial value, a termination
condition, and an iteration expression, all enclosed in parentheses and separated
by semicolons:
for ( $count = 0; $count < 100; $count++ ) {
print "Something";
}
The foreach operator is special. It iterates over the
contents of an array and executes the statements in a statement block for each
element of the array. A simple example is the following:
The variable $num first takes on the value one,
then two, and so on. That example looks fairly trivial, but the real
power of this operator lies in the fact that it can operate on any array:
foreach $arg ( @ARGV ) {
print "Argument: \"$arg\".\n";
}
foreach $namekey ( sort keys %surnames ) {
print REPORT "Surname: $value{$namekey}.\n",
"Address: $address{$namekey}.\n";
}
Labels may be used with the next, last, and
redo statements to provide more control over program flow through loops.
A label consists of any word, usually in uppercase, followed by a colon. The
label appears just before the loop operator (while, for, or
foreach) and can be used as an anchor for jumping to from within the
block:
RECORD: while ( <INFILE> ) {
$even = !$even;
next RECORD if $even;
print;
}
That code snippet prints all the odd-numbered records in INFILE.
The three label control statements are
- next Jumps to the next iteration of the loop marked
by the label or to the innermost enclosing loop if no label is specified.
- last Immediately breaks out of the loop marked by
the label or out of the innermost enclosing loop if no label is specified.
- redo Jumps back to the loop marked by the specified
label or to the innermost enclosing loop if no label is specified. This
causes the loop to execute again with the same iterator value.
The basicsubunit of code in Perl is a subroutine. This is similar
to a function in C and a procedure or a function in Pascal. A subroutine may
be called with various parameters and returns a value. Effectively, the subroutine
groups together a sequence of statements so that they can be re-used.
The Simplest Form of Subroutine
Subroutines can be declared anywhere in a program. If more than
one subroutine with the same name is declared each new version replaces the
older ones so that only the last one is effective. It is possible to declare
subroutines within an eval() expression, these will not actually be
declared until the runtime execution reaches the eval() statement.
Subroutines are declared using the following syntax:
sub subroutine-name {
statements
}
The simplest form of subroutine is one that does not return any
value and does not access any external values. The subroutine is called by prefixing
the name with the & character. (There are other ways of calling
subroutines, which are explained in more detail later.) An example of a program
using the simplest form of subroutine illustrates this:
#!/usr/bin/perl -w
# Example of subroutine which does not use
# external values and does not return a value
&egsub1; # Call the subroutine once
&egsub1; # Call the subroutine a second time
sub egsub1 {
print "This subroutine simply prints this line.\n";
}
| Tip |
While it is possible to refer from a subroutine to any global
variable directly, it is normally considered bad programming practice.
Reference to global variables from subroutines makes it more difficult
to re-use the subroutine code. It is best to make any such references
to external values explicit by passing explicit parameters to the subroutine
as described in the following section. Similarly it is best to avoid
programming subroutines that directly change the values of global variables
because doing so can lead to unpredictable side-effects if the subroutine
is re-used in a different program. Use explicit return values or explicit
parameters passed by reference as described in the following section.
|
Returning Values from Subroutines
Subroutines can also return values, thus acting as functions.
The return value is the value of the last statement executed. This can be a
scalar or an array value.
| Caution |
Take care not to add seemingly innocuous statements near
the end of a subroutine. A print statement returns 1,
for example, so a subroutine which prints just before it returns will
always return 1.
|
It is possible to test whether the calling context requires an
array or a scalar value using the wantarray construct, thus returning
different values depending on the required context. For example,
wantarray ? (a, b, c) : 0;
as the last line of a subroutine returns the array (a, b,
c) in an array context, and the scalar value 0 in a scalar context.
It is possible to return from a subroutine before the last statement
by using the return() function. The argument to the return()
function is the returned value in this case. This is illustrated in the following
example, which is not a very efficient way to do the test but illustrates the
point:
Note that it is usual to make any variables used within a subroutine
local() to the enclosing block. This means that they will not interfere
with any variables that have the same name in the calling program. In Perl 5,
these may be made lexically local rather than dynamically local, using my()
instead of local() (this is discussed in more detail later).
When returning multiple arrays, the result is flattened into
one list so that, effectively, only one array is returned. So in the following
example all the return values are in @return-a1 and the send array
@return-a2 is empty.
#!/usr/bin/perl -w
# Example of subroutine which does not use
# external values returning an array
(@return-a1, @return-a2) = &egsub4; # Call the subroutine once
print "Return array a1",@return-a1,
" Return array a2 ",@return-a2, ".\n";
sub egsub4 {
print "This subroutine returns a1 and a2.\n";
local(@a1) = (a, b, c);
local(@a2) = (d, e, f);
return(@a1,@a2);
}
In Perl 4, this problem can be avoided by passing the arrays
by reference using a typeglob (see the following section). In Perl
5, you can do this and also manipulate any variable by reference directly (see
the following section).
Passing Values to Subroutines
The next important aspect of subroutines, is that the call can
pass values to the subroutine. The call simply lists the variables to be passed,
and these are passed in the list @_ to the subroutine. These are known
as the parameters or the arguments. It is customary to assign each value a name
at the start of the subroutine so that it is clear what is going on. Manipulation
of these copies of the arguments is equivalent to passing arguments by value
(that is, their values may be altered but this does not alter the value of the
variable in the calling program).
#!/usr/bin/perl -w
# Example of subroutine is passed external values by #value
$returnval = &egsub5(45,3); # Call the subroutine once
print "The (45+1) * (3+1) is $returnval.\n";
$x = 45;
$y = 3;
$returnval = &egsub5($x,$y);
print "The ($x+1) * ($y+1) is $returnval.\n";
print "Note that \$x still is $x, and \$y still is $y.\n";
sub egsub5 { # Access $x and $y by value
local($x, $y) = @_;
return ($x++ * $y++);
}
To pass scalar values by reference, rather than by value, the
elements in @_ can be accessed directly. This will change their values
in the calling program. In such a case, the argument must be a variable rather
than a literal value, as literal values cannot be altered.
#!/usr/bin/perl -w
# Example of subroutine is passed external values by #reference
$x = 45;
$y = 3;
print "The ($x+1) * ($y+1) ";
$returnval = &egsub6($x,$y);
print "is $returnval.\n";
print "Note that \$x now is $x, and \$y now is $y.\n";
sub egsub6 { # Access $x and $y by reference
return ($_[0]++ * $_[0]++);
}
Array values can be passed by reference in the same way. However
several restrictions apply. First, as with returned array values, the @_
list is one single flat array, so passing multiple arrays this way is tricky.
Also, although individual elements may be altered in the subroutine using this
method, the size of the array cannot be altered within the subroutine (so push()
and pop() cannot be used).
Therefore, another method has been provided to facilitate the
passing of arrays by reference. This method is known as typeglobbing
and works with Perl 4 or Perl 5. The principle is that the subroutine declares
that one or more of its parameters are typeglobbed, which means that all the
references to that identifier in the scope of the subroutine are taken to refer
to the equivalent identifier in the namespace of the calling program. The syntax
for this declaration is to prefix the identifier with an asterisk, rather than
an @ sign, this *array1 typeglobs @array1. In fact, typeglobbing
links all forms of the identifier so the *array1 typeglobs @array1,
%array1, and $array1 (any reference to any of these in the
local subroutine actually refers to the equivalent variable in the calling program's
namespace). It only makes sense to use this construct within a local()
list, effectively creating a local alias for a set of global variables. So the
previous example becomes the following:
In Perl 4, this is the only way to use references to variables,
rather than variables themselves. In Perl 5, there is also a generalized method
for dealing with references. Although this method looks more awkward in its
syntax because of the abundance of underscores, it is actually more precise
in its meaning. Typeglobbing automatically aliases the scalar, the array, and
the hashed array form of an identifier, even if only the array name is required.
With Perl 5 references, this distinction can be made explicit; only the array
form of the identifier is referenced.
Subroutine Recursion
One the most powerful features of subroutines is their ability
to call themselves. There are many problems that can be solved by repeated application
of the same procedure. However, care must be taken to set up a termination condition
where the recursion stops and the execution can unravel itself. Typical examples
of this approach are found when processing lists: Process the head item and
then process the tail; if the tail is empty do not recurse. Another neat example
is the calculation of a factorial value:
Issues of Scope with my() and local()
Issues of scope are very important with relation to subroutines.
In particular all variables inside subroutines should be made lexical local
variables (using my()) or dynamic local variables (using local()).
In Perl 4, the only choice is local() because my() was only
introduced in Perl 5.
Variables declared using the my() construct are considered
to be lexical local variables. They are not entered in the symbol table for
the current package. Therefore, they are totally hidden from all contexts other
than the local block within which they are declared. Even subroutines called
from the current block cannot access lexical local variables in that block.
Lexical local variables must begin with an alphanumeric character or an underscore.
Variables declared using the local() construct are considered
to be dynamic local variables. The value is local to the current block and any
calls from that block. It is possible to localize special variables as dynamic
local variables, but these cannot be made into lexical local variables. The
following two differences from lexical local variables show the two cases in
Perl 5 where it is still advisable to use local() rather than my():
- Use local() if you want the value of the local variables to be
visible to subroutines
- Use local() if you are localizing special variables
We'll finish this overview of Perl with a look at Perl's pattern
matching capabilities. The ability to match and replace patterns is vital to
any scripting language that claims to be capable of useful text manipulation.
By this stage, you probably won't be surprised to read that Perl matches patterns
better than any other general purpose language. Perl 4's patterns matching was
excellent, but Perl 5 has introduced some significant improvements, including
the ability to match even more arbitrary strings than before.
The basic pattern matching operations we'll be looking at are
- Matching Where we want to know of a particular string
matches a pattern
- Substitution Where we want to replace portions of a
string based on a pattern
The patterns referred to here are more properly known as regular
expressions, and we'll start by looking at them.
A regular expression is a set of rules describing a generalized
string. If the characters that make up a particular string conform to the rules
of a particular regular expression, then the regular expression is said to match
that string.
A few concrete examples usually helps after an overblown definition
like that. The regular expression b. will match the strings bovine,
above, Bobby, and Bob Jones but not the strings Bell,
b, or Bob. That's because the expression insists that the
letter b must be in the string and it must be followed immediately
by another character.
The regular expression b+, on the other hand, requires
the lowercase letter b at least once. This matches b and Bob
in addition to the example matches for b.. The regular expression b*
requires zero or more bs, so it will match any string. That is fairly
useless, but it makes more sense as part of a larger regular expression; for
example, Bob*y matches Boy, Boby, and Bobby
but not Boboby.
Assertions
There are a number of so-called assertions that are used
to anchor parts of the pattern to word or string boundaries. The ^
assertion matches the start of a string, so the regular expression ^fool
matches fool and foolhardy but not tomfoolery or
April fool. The assertions are listed in table 6.
Table 6 Perl's Regular Expression Assertions
|
Assertion
|
Matches |
Example |
Matches |
Doesn't Match |
|
^
|
Start of string |
^fool |
foolish |
tomfoolery |
|
$
|
End of string |
fool$ |
April fool |
foolish |
|
\b
|
Word boundary |
be\bside |
be side |
beside |
|
\B
|
Non-word boundary |
be\Bside |
beside |
be side |
Atoms
The . we saw in b. is an example of a regular
expression atom. Atoms are, as the name suggests, the fundamental building
blocks of a regular expression. A full list appears in table 7.
Table 7 Perl's Regular Expression Atoms
| Atom |
Matches |
Example |
Matches |
Doesn't Match |
|
.
|
Any character except newline |
b.b |
bob |
bb |
| List of characters in square brackets |
Any one of those characters |
^[Bb] |
Bob, bob |
Rbob |
| Regular expression in parentheses |
Anything that regular expression matches |
^a(b.b)c$ |
abobc |
abbc |
Quantifiers
A quantifier is a modifier for an atom. It can be used
to specify that a particular atom must appear at least once, for example, as
in b+. The atom quantifiers are listed in table 8.
Table 8 Perl's Regular Expression Atom Quantifiers
|
Quantifier
|
Matches |
Example |
Matches |
Doesn't Match |
|
*
|
Zero or more instances of the atom |
ab*c |
ac, abc |
abb |
|
+
|
One or more instances of the atom |
ab*c |
abc |
ac |
|
?
|
Zero or one instances of the atom |
ab?c |
ac, abc |
abbc |
|
{n}
|
n instances of the atom |
ab{2}c |
abbc |
abbbc |
|
{n,}
|
At least n instances of the atom |
ab{2,}c |
abbc, .abbbc |
abc |
|
{nm}
|
At least n, at most m instances of the atom
|
ab{2,3}c |
abbc |
abbbbc |
Special Characters
There are a number of special characters denoted by the backslash;
\n being especially familiar to C programmers perhaps. Table 9 lists
the special characters.
Table 9 Perl Regular Expression's Special Characters
|
Symbol
|
Matches |
Example
|
Matches
|
Doesn't Match
|
|
\d
|
Any digit |
b\dd
|
b4d
|
bad
|
|
\D
|
Non-digit |
b\Dd
|
bdd
|
b4d
|
|
\n
|
Newline |
|
|
|
|
\r
|
Carriage return |
|
|
|
|
\t
|
Tab |
|
|
|
|
\f
|
Formfeed |
|
|
|
|
\s
|
White space character |
|
|
|
|
\S
|
Non-white space character |
|
|
|
|
\w
|
Alphanumeric character |
a\wb
|
a2b
|
a^b
|
|
\W
|
Non-alphanumeric character |
a\Wb
|
aa^b
|
aabb
|
Backslashed Tokens
It is essential that regular expressions are able to use all
characters so that all possible strings occuring in the real word can be matched.
With so many characters having special meanings, a mechanism is therefore required
that allows us to represent any arbitrary character in a regular expression.
This is done using a backslash followed by a numeric quantity.
This quantity can take on any of the following formats:
- Single or double digit Matched quantities after a match. These
are called backreferences and will be explained in the later "Matching"
section.
- Two or three digit octal number The character with that number
as character code, unless it's possible to interpret it as a backreference.
- x followed by two hexadecimal digits The character
with that number as its character code. For example, \x3e is >.
- c followed by a single character This is the control
character. For example, \cG matches Ctrl+G.
- Any other character This is the character itself. For example,
\& matches the & character.
Let's start putting all of that together with some real pattern
matching. The match operator normally consists of two forward slashes with a
regular expression in between, and it normally operates on the contents of the
$_ variable. So if $_ is serendipity, then /^ser/,
/end/, and /^s.*y$/ are all True.
Matching on $_
The $_ operator is special; it is described in full
in "Special Variables" chapter in this book. In many ways, it is the
default container for data being read in by Perl; the <> operator,
for example, gets the next line from STDIN and stores it in $_.
So the following code snippet lets you type lines of text and tells you when
your line matches one of the regular expressions:
$prompt = "Enter some text or press Ctrl-Z to stop: ";
print $prompt;
while (<>) {
/^[aA]/ && print "Starts with a or A. ";
/[0-9]$/ && print "Ends with a digit. ";
/perl/ && print "You said it! ";
print $prompt;
}
Bound Matches
Matching doesn't always have to operate on $_, although
this default behavior is quite convenient. There is a special operator, =~,
that evaluates to either True or False depending on whether its first operand
matches on its second operand. For example, $filename =~ /dat$/ is
True if $filename matches on /dat$/. This can be used in conditionals
in the usual way:
$filename =~ /dat$/ && die "Can't use .dat files.\n";
There is a corresponding operator with the opposite sense, !~.
This is True if the first operator does not match the second:
Alternate Delimiters
The match operator can use other characters instead of //;
a useful point if you're trying to match a complex expression involving forward
slashes. A more general form of the match operator than // is m//.
If you use the leading m here, then any character may be used to delimit
the regular expression. For example,
$installpath =~ m!^/usr/local!
|| warn "The path you have chosen is odd.\n";
Match Options
A number of optional switches may be applied to the match operator
(either the // or m// forms) to alter its behavior. These
options are listed in table 10.
Table 10 Perl Match Operator's Optional Switches
|
Switch
|
Meaning |
|
g
|
Perform global matching |
|
i
|
Case-insensitive matching |
|
o
|
Evaluate the regular expression once only |
The g switch continues matching even after the first
match has been found. This is useful when using backreferences to examine the
matched portions of a string, as described in the later "Backreferences"
section.
The o switch is used inside loops where a lot of pattern
matching is taking place. It tells Perl that the regular expression (the match
operator's operand) is to be evaluated once only. This can improve efficiency
in cases where the regular expression is fixed for all iterations of the loop
that contains it.
Backreferences
As we mentioned earlier in the "Backslashed Tokens"
section, pattern matching produces quantities known as backreferences. These
are the parts of your string where the match succeeded. You need to tell Perl
to store them by surrounding the relevant parts of your regular expression with
parentheses, and they may be referred to after the match as \1, \2,
and so on. In this example, we check if the user has typed three consecutive
four-letter words:
while (<>) {
/\b(\S{4})\s(\S{4})\s(\S{4})\b/
&& print "Gosh, you said $1 $2 $3!\n";
}
The first four-letter word lies between a word boundary (\b)
and some white space (\s) and consists of four non-white space characters
(\S). If matched, the matching substring is stored in the special variable
\1 and the search continues. Once the search is complete, the backreferences
may be referred to as $1, $2, and so on.
What if you don't know in advance how many matches to expect?
Perform the match in an array context, and Perl returns the matches in an array.
Consider this example:
@hits = ("Yon Yonson, Wisconsin" =~ /(\won)/g);
print "Matched on ", join(', ', @hits), ".\n";
Let's start at the right side and work back. The regular expression
(\won) means that we match any alphanumeric character followed by on
and store all three characters. The g option after the //
operator means that we want to do this for the entire string, even after we've
found a match. The =~ operator means that we carry out this operation
on a given string, Yon Yonson, Wisconsin; and finally, the whole thing
is evaluated in an array context, so Perl returns the array of matches and we
store it in the @hits array. The output from thisexample is
Matched on yon, Yon, son, con.
Once you get the hang of pattern matching, substitutions are
quite straightforward and very powerful. The substitution operator is s///
that resembles the match operator but has three rather than two slashes. As
with the match operator, any other character may be substituted for forward
slashes, and the optional i, g, and o switches may
be used.
The pattern to be replaced goes between the first and second
delimiters, and the replacement pattern goes between the second and third. To
take a simple example,
$house = "henhouse";
$house =~ s/hen/dog/;
change $house from henhouse to doghouse.
Note that it isn't possible to use the =~ operation with a literal
string in the way we did when matching; that's because you can't modify a literal
constant. Instead, store the string in a variable and modify that.