Mike Gossland's Perl Tutorial Course for Windows


Introduction | Binding Op | Reg Expressions | Examples | Substitution | Advanced


Chapter 2: Matching and Substitution

Substituting Old Text With New

Perl can substitute text just as easily as it can match it, but instead of using the plain matching operator m//, you use the substitution operator, s///.

When a match is made, Perl knows which characters matched, and it sets up built-in variables to point at the starting position and the ending position of the match in the searched string. For example, if you had:

$text = "Here is some text"

and you did a match on the regular expression /some.*/, with:

$text =~ m/some.*/

then Perl would know that the matched string was "some text", and it would know that the match started at the 9th character and ended at the 17th.

When you use the substitution operator, s///, Perl uses that positional information to know which characters to replace with the substitution text.

Simple substitution

The substitution operator has two specifications: on the left, the matching regular expression like the matching operator, and on the right, the substitution value.

Let's say you wanted to change the first occurence of the word dog into cat in the string variable $story. This is simple:

$story =~ s/dog/cat/;

Your substituted string does not have to be the same length as the matched string. You could put in more letters or fewer:

s/a short phrase/a much longer phrase/
s/1999/MCMXCIX/
s/Twentyfirst century/21st century/

You can also look for a more abstract pattern and replace it. Let's say you wanted to edit any 3 digits and replace them with dummy values, of nnn. You could use a substitution operator like this:

s/\d\d\d/nnn/;

This would take any sequence of 3 digits and replace replace them with the letters "nnn".

Let's say you wanted to edit all phone numbers and replace them with dummy values. You could use a substitution operator like this:

s/\d{3}-\d{3}-\d{4}/123-123-1234/;

This would take any sequence of 3 digits, minus sign, 3 digits, minus sign, and 4 digits and replace that phone number with 123-123-1234.

For any of the matching expressions in the table on the previous page of examples, we could just as easily have specified some text to replace it with, by using the s/// operator instead of just the m// operator. 

Deletion

You can use s/// for deleting things too. Just use an empty value for the substitution. Here's how you might delete an html comment consisting of everything between <!-- and -->:

s/<!--.*-->//;

Isn't that nice and simple?

Remembering Matched Values

Suppose you wanted to match on something and modify it, but re-use part of what you matched on. Let's say you wanted to replace an occurrence of boys with boyz or girls with girlz. You could do this in separate passes like this:

s/boys/boyz/;
s/girls/girlz/;

or alternatively, you could match on either boy or girl, and remember what it was you matched on, like this:

s/(boy|girl)s/$1z/;

The $1 is called a positional parameter, and it is an internal variable maintained automatically by Perl to represent whatever was matched within the brackets of the search expression. Here, we are looking for either boy or girl followed by an s. We want to replace it by whatever we find, with a z substituted for the s. The $1 parameter will remember whichever word matches and will put it in the substitution.

You can remember more than one matching expression. In fact you can remember up to 9 expressions in the variables the $1 through $9. I have never had occasion to go past $4, so  9 variables is probably more than enough.

As an example of remembering two matches consider this method of getting rid of potential visitors:

s/(dog|cat)s are (invited|welcome)/$1s are not $2/;

Note that $1 represents either dog or cat, whichever was found, and $2 represents either invited or welcome, whichever was found. Note also that Perl is smart enough to know that the string "$1s" means the $1 variable followed by an "s". It does not get confused into thinking you meant a variable with the name of $1s.

If you have absorbed this and want more then go on to the next page. Otherwise, email me with questions.