Mike Gossland's Perl Tutorial Course for Windows


IO Introduction | STDOUT | Writing to Files | Reading from Files | Reading Directories | Editing File Contents | Recursive Editing


Chapter 4. Handling Files in Perl

Recursive Editing

Sometimes as a webmaster or an administrator, you'd like to be able to make a particular regular expression substitution in all the files throughout an entire directory tree. This is not too hard, with a little help from a built-in Perl module called File::Find.

There will be a few new concepts introduced in this section and you might have to start digging into the documentation for more details on how these things work.

Perl can be extended from the core language by way of "modules". Modules are additional special-purpose packages of code that can be imported into the Perl environment when you need to use them. Your Perl installation comes with many modules available. They are not used until you ask for them since using modules adds a small performance penalty.

One module that comes with every Perl installation is the module: Find::File. To use this module, add the line:

use File::Find;

to your script.

File::Find provides just the directory-walking behaviour we need to traverse a directory tree. Consider this script:

use File::Find;

$dir = "c:/web";
find(\&edits, $dir);
sub edits() {
  print "File name is $_\n\t\tFull path is $File::Find::name\n";
}

Alter the $dir="c:/web" to something that makes sense on your system and try running this script. You'll find it recursively lists every file in the directory tree you picked. Let's see how it works.

The line "find(\&edits, $dir)" is the line that walks through the tree specified by "$dir". Find is a function provided by the File::Find module. For parameters, find takes a reference to a function (\&edits), and the directory to walk through. For each file in the tree, it calls the referenced function.

So, for every file, the function "edits" is called. Within the edits function, the default variable, $_, is set to the name of the file. If you want the complete path to the file, look at $File::Find::name. Each time a new subdirectory is entered, the system changes into that directory so the short $_ file names are all you need to open them.

In the above example, we just listed the files, but we have all the tools to do a recursive edit. We just have to add some code to open the file and edit it within the edits function, as follows. Be careful, because this code has the potential to edit a lot of files!

#!/usr/bin/perl

use File::Find;
$dir = "c:/web";
find(\&edits, $dir);

sub edits() {
  $seen = 0;
  if ( -f and /.html?/ ) {
    $file = $_;
    open FILE, $file;
    @lines = <FILE>;
    close FILE;
    for $line ( @lines ) {
      if ( $line =~ s/Lesson/Chapter/  ) {
        $seen++;
      }
    }
    open FILE, ">$file";
    print FILE @lines;
    close FILE;
  }
  print "Found in $File::Find::name\n" if $seen > 0;
}

This script can be used as a starting point for more useful or powerful scripts. Readers are encouraged to look up this kind of editing in the Perl cookbook.

Here's a more full featured script with a few extra features. How it works is left as an exercise for the student.

#!/usr/bin/perl

use File::Find;

@ARGV = ('.') unless @ARGV;
$dir = shift @ARGV;
find(\&edits, $dir);

sub edits() {
  return unless -f;        #skip directories
  $seen = 0;
  $file = $_;
  #Uncomment next line if you want multi-line edits
  #undef $/;
  local $^I=".backup";
  #Warning - heavy magic here
  local @ARGV = ($file);
  while(<>) {
    #Remember to use the s option if doing multiline edits
    $seen++ if s/Lesson/Chapter/;
    print;
  }
  print "Found in $File::Find::name\n" if $seen > 0;
  #Comment out if you want to keep the backup
  #unlink $file.".backup";
}

To decipher the above, look up the $^I variable or the equivalent command-line option, -i. Also, look up @ARGV, and explore the special meaning of "while(<>)" when used with the @ARGV array.

Be careful, and make sure you've got a backup of your directory because you can mess up a lot of files quickly with this script. Don't run it twice in a row without checking that the content is preserved, or you can completely clobber your original work!

With care, recursive editing is a very handy addition to your bag of tricks.