Regular expression capture into named vars

Regular expressions are of course an extremely powerful tool, particularly in Perl.

A common use is capturing parts of a string, using parenthesised sections of the regular expression to 'capture' the parts we're interested in.

For an example,

my $string = "Bob went fishing on 13/06/2006 at 13:30";
if ($string =~ m{went (.+) on ([0-9]{2})/([0-9]{2})/([0-9]{4})}) {
    print "activity: $1 day:$2 month:$3 year:$4\n";

Using $1, $2, $3 etc for the first parethesised block, second parenthesised block etc is fine for small expressions, but as you can imagine it'll soon get very complicated for a big expression capturing lots of information.

A better way is:

if (my($activity, $day, $month, $year) = 
    $string =~ m{went (.+) on ([0-9]{2})/([0-9]{2})/([0-9]{4})}) {
    print "activity: $activity day:$day month:$month year:$year\n";

As you can see, we just captured the parameters into named variables instead of $1, $2, $3 etc.

Even if you only want to capture one var, you still need to use my($var) = instead of just $var = because the regexp match returns a list of captured params, so, in scalar context, you'll get back the number of items in the list.

my $string = "Bob went fishing on 13/06/2006 at 13:30";
if (my $who1 = $string =~ /(.+) went fishing/) {
    print "Fisherman 1 was: $who1\n";
if (my($who2) = $string =~ /(.+) went fishing/) {
    print "Fisherman 2 was: $who2\n";

The above code will print:

Fisherman 1 was: 1
Fisherman 2 was: Bob

Non-capturing parenthesised parts

Another useful tip related to this one, is to only capture the parts you need. It's common to use parenthesis for grouping, but you may not want to capture the match.

For example, the expression /Rob(ert)?/ will match Rob or Robert, using the parenthesis to group 'ert' together to apply the '?' operator to.

However, we likely don't want it to capture, so the alternative is: /Rob(?:ert)?/ - this will perform the same grouping function but without capturing the contents of the parenthesis.


perl/regexpnamedcapture.txt · Last modified: 2010/02/26 10:45 (external edit)
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki