16 Regular Expressions in PHP

Hiren Joshi

 

Introduction:

 

Regular expressions are used to search for matching pattern in a string. The regular expression generally used to validate the data which is user entered in web form.

 

Regular expression are originating from Unix where grep statement is used to work with strings and manipulate text. By using few basic rules for regular expression, very complex search pattern can be created.

 

PHP provides functions specific to two set of regular expression. These functions are specific to a particular set of regular expression. User can use any one as per his/her convenience. These two types are

  • POSIX Regular Expression (Portable Operating System Interface for uniX)
  • PCRE (PERL Compatible Regular Expression)

 

Regular expression is also known as regex or regexp.

 

POSIX style functions for regular expression is start with ereg_.

 

Both the types are compiled into PHP by default. PCRE regular expressions are binary safe. Since version PHP5.3 POSIX style regular expression is deprecated. PCRE style functions for regular expression is start with preg_.

 

The preg_match function is used to match a regular expression in a string.

 

Function Description
preg_match($pattern, $string)

 

Search $pattern in $string. If matching found, the function  will return 1 else

it will return 0. In case, there is an error in $pattern it will return FALSE.

 

In PHP, each regex pattern is defined as text string and the pattern must be enclosed in forward slashes (/).

 

Let us write one example to understand preg_match function.

 

<?php

$pattern = ‘/Hiren/’;

$author = ‘Hiren Joshi’;

$editor = ‘Hardik Joshi’;

//preg_match

$ans = preg_match($pattern,$author);

echo $ans;

if($ans === False)

{

echo “<br> There is error in pattern”;

}

else if($ans === 1)

{

echo “<br> pattern $pattern matched with author $author”;

}

else

{

echo “<br> pattern $pattern is not matched with author $author”;

}

//editor check

$result = preg_match($pattern,$editor);

if($result === 1)

{

echo “<br> pattern $pattern is matched with editor $editor”;

}

else if($result === 0)

{

echo “<br> pattern $pattern is not matched with editor $editor”;

}

else

{

echo “<br> There is an error”;

}

?>

 

Then the output will be:

 

1

 

pattern /Hiren/ matched with author Hiren Joshi

 

pattern /Hiren/ is not matched with editor Hardik Joshi

 

To create case-insensitive regular expression use i at the end of regular expression. So the case is ignored in pattern and string.

 

For example.

 

<?php

$pattern = ‘/hiren/’;

$author = ‘HIREN JOSHI’;

//preg_match

$ans = preg_match($pattern,$author); // $ans is 0

echo “<br> $ans”;

//case in-sensitive

$pattern = ‘/hiren/i’;

$author = ‘HIREN JOSHI’;

//preg_match

$ans = preg_match($pattern,$author); // $ans is 1

echo “<br> $ans”;

?>

 

Character Matching

  1. Escape character in regular expression is backslash (\). The back slash gives special meaning to some characters and removes special meaning from other characters.
  2. If you want to match \ in a string, you have to code \\\\. The reason is that as \ is an escape character, \\ interpreted as \ which is again treated as an escape character. So you have to code \\\\to match a single \.
  3. To match forward slash you have to code \/ (back slash forward slash) because pattern is enclosed Between forward slashes.
  4. The character is known as metacharacter if it has a special meaning in a pattern. For example, back slash, forward slash are example of metacharacter.
  5. It is advisable to use single quote for regular expression. Though double quote can be used for regular expression, it creates unnecessary complications.
  6. Following code snippet shows the code to match back slash.

 

$string = ‘This is backslash \ which is used as escape character in regex’;

$ans = preg_match(‘/\\\\/’,$string); // will return 1

$string = ‘This is forwardslash / which is used in regex’;

$ans = preg_match(‘/\//’,$string); // will return 1

 

Following table shows few match types of characters.

 

Pattern Matches
. The dot matches any single character. Use \. (back slash dot) to match . (dot)
\w Matches any upper or lower alphabet , digit or  _
\W Matches any non alpha-numeric character excluding _
\d Any digit character
\D Any non-digit character
\s Any whitespace character
\S Any non-white space character

 

Following code snippet shows the example of above pattern matching

 

<?php

$string = ‘The product code is JTK-4702’;

$ans1 = preg_match(‘/JT./’,$string);

echo “<br> Answer : $ans1”;

$ans2 = preg_match(‘/JT\d/’,$string);

echo “<br> Answer : $ans2”;

$ans3 = preg_match(‘/JTK-\d/’,$string);

echo “<br> Answer : $ans3”;

?>

 

Then the output will be:

 

Answer : 1

Answer : 0

Answer : 1

 

Character Class

 

The character class allows you to match a single character from the set of characters. The characters between square bracket( [ ]) is matched with the character. For example, [xyz] matches with x or y or z. To match opening square bracket you can use \[. In the same way \] is used to match closing square bracket.

 

$string = ‘The product code is JTK-4702’;

$ans1 = preg_match(‘/JT[JKL]/’,$string);  // returns 1

$ans2 = preg_match(‘/JTK-[1234]/’,$string);  // returns 1

$ans3 = $ans3 = preg_match(‘/JTK-[123]/’,$string); // returns 0

 

Most of the metacharcaters loses their special meaning inside a character class. However, ^ and – is retain their special meaning inside a character class and these characters are position dependent.

 

Metacharacters inside a character class

 

Character Example Meaning
^

 

 

[^aeoiu]

 

 

^ is treated as not. So it negate the set of characters

inside a character class. Matches any character except

a,e,I,o or u.

 

 

[a-z]

 

 

–    (read as hyphen ) is used to represent range.

It matches any small alphabet character

between a to z.

 

Below are few examples of metacharacter inside a character class.

 

 $string = ‘The product code is JTK-4702’;

 $ans1 = preg_match(‘/JT[^JKL]/’,$string);  // returns 0

 $ans2 = preg_match(‘/JTK[^^]/’,$string);  // returns 1

 $ans3 = preg_match(‘/JTK-[1-5]/’,$string); // returns 1

 $ans4 = preg_match(‘/JTK[*-_]/’,$string);  // returns 1

 

Bracket Expression

 

Bracket expression can be used in character class. Bracket expression is a named range of characters. For example [:digit:] expression is equivalent to the range 0-9. Following table lists bracket expression used in a character class.

Pattern Matches
[:digit:] Digits (same as \d)
[:lower:] Lower case letters
[:upper:] Upper case letters
[:letter:] Lower and upper case letters
[:alnum:] Alphanumeric (upper and lower case letters and digits)
[:word:] Upper and lower case letters, digits and underscores (same as \w)
[:print:] All printable characters including the space
[:graph:] All printable characters excluding the space
[:punct:] All printable character excluding letters and digits

 

Example for bracket expression

 

<?php

$string = ‘The product code is JTK-4702’;

$ans1 = preg_match(‘/JTK[[:punct:]]/’,$string); // return 1

$ans2 = preg_match(‘/JT[[:upper:]]/’,$string); // return 1

$ans3 = preg_match(‘/JTK[[:digit:]]/’,$string); // return 0

$ans4 = preg_match(‘/JTK-[[:digit:]]/’,$string); // return 1

?>

 

Match string position

 

Positional indicator and subpattern repletion allows developer to write complex patterns. The following table list positional indicators.

 

Pattern Matches
^ Starting of the string.
$ Ending of the string
^pattern$ The entire string must match
\b The starting or ending of the word
\B A position other than starting or ending of a word

 

 

Example for matching string position

 

<?php

$string = ‘The product code is JTK-4702’;

$ans1 = preg_match(‘/^The/’,$string);   // return 1

$ans2 = preg_match(‘/4702$/’,$string); // return 1

$ans3 = preg_match(‘/code/’,$string);   // return 1

$ans4 = preg_match(‘/^The product code is JTK-4702$/’,$string);   // return 1

$ans5 = preg_match(‘/JTK\b/’,$string);  // return 1

$ans6 = preg_match(‘/rod\B/’,$string); // return 1

?>

 

Group and match subpattern

 

A part of pattern can be grouped into subpattern. Numbered and unnumbered subpattern can be created by using parenthesis.

Pattern Description
(subpattern) Creates a numbered subpattern group
(?:subpattern) Creates an unnumbered subpattern group
| Matches either left side or right side subpattern
\n Matches a numbered subpattern

 

For example,

 

<?php

$string = ‘The product code is JTK-4702’;

$string1 = ‘Raj Rajeshwar’;

$ans1 = preg_match(‘/^(Raj)|(Kaj)/’,$string1);

echo “<br> Answer1 : $ans1”;

$ans2 = preg_match(‘/^(\w\w\w) \1/’,$string1); // numbered subpattern echo “<br> Answer2 : $ans2”;

$ans3 = preg_match(‘/^(The)|(That)/’,$string);

echo “<br> Answer3 : $ans3”;

$ans4 = preg_match(‘/(rod\B)|(That)/’,$string);

echo “<br> Answer4 : $ans4”;

// matching ode

$pattern = ‘/(?:ode)/’;

$ans5 = preg_match($pattern,$string);

if($ans5 === 1)

{

echo “<br> $pattern is matched with $string”;

}

else if($ans5 === 0)

{

echo “<br> $pattern is not matched with $string”;

}

else

{

echo “<br> Error”;

}

?>

 

The output will be:

 

Answer1 : 1

Answer2 : 1

Answer3 : 1

Answer4 : 1

 

/(?:ode)/ is matched with The product code is JTK-4702

 

Matching a repeating pattern

 

To match  a repeating  pattern following patterns are used.

Pattern Description
{n} Pattern must be  repeated exactly n times.
{n, } Pattern must be repeated  n or more times.
{n, m} Pattern must be  repeated from minimum n times to maximum m times
? Zero or one of the previous subpattern. Same as {0,1}
+ One or more of the previous subpattern. Same as {1,}
* Zero or more of the previous subpattern. Same as {0,}

 

<?php

$phone = ‘123-456-7890’;

$phonepattern = ‘/^\d{3}-\d{3}-\d{4}$/’;

$matchphonepattern = preg_match($phonepattern,$phone); if($matchphonepattern === 1) {

echo “<br> Phone $phone is matched”;

}

$fax = ‘(123) 456-7890’;

$faxpattern = ‘/^\(\d{3}\) \d{3}-\d{4}$/’;

$matchfaxpattern = preg_match($faxpattern,$fax);

if($matchfaxpattern === 1)

{

echo “<br> Fax $fax is matched”;

}

// phone or fax

$phoneorfax = ‘/^(\(\d{3}\) |(\d{3}-) ?)\d{3}-\d{4}$/’; $pmatch = preg_match($phoneorfax, $phone); if($pmatch === 1)

{

echo “<br> Phone is matched with phone or fax”;

}

$fmatch = preg_match($phoneorfax, $fax);

if($fmatch === 1)

{

echo “<br> Fax is matched with phone or fax”;

}

?>

 

Then the output is:

 

Phone 123-456-7890 is matched

Fax (123) 456-7890 is matched

Phone is matched with phone or fax

Fax is matched with phone or fax

 

Look-ahead assertion

 

A look ahead assertion is a special type of subpattern. The look-ahead assertion is an assertion that must be matched for the overall pattern to me matched. A look ahead assertion must be matched starting immediately after the position of the assertion in the pattern. A negative look-ahead assertion is an assertion which does not matched for the overall pattern to be matched.

 

To create a look ahead assertion, code ? followed by = between parenthesises .

 

Look-ahead assertion (?=assertion) while negative look-ahead assertion is (!?assertion).

 

The look-ahead assertion (?=[[:digit:]]) is used to match the next character in the pattern must be a digit.

 

The look-ahead assertion (?=.*[[:digit:]]) is used to match that the pattern must contain at-least one digit.

 

Following code show another example of look-ahead assertion.

 

<?php

$pattern1 = ‘/^(?=.*[[:digit:]])[[:alnum:]]{6}$/’;

$ans1 = preg_match($pattern1,’Paresh’);

echo “<br> Ans1 : $ans1”;

$ans2 = preg_match($pattern1,’Pares8′);

echo “<br> Ans2 : $ans2”;

//negative lookahead assertion

$pattern2 = ‘/^(?!3[2-9])[0-3][[:digit:]]$/’;  // values excluded between 32 to 39. Negative look-ahead

$ans3 = preg_match($pattern2,’32’);

echo “<br> Ans3 : $ans3”;

$ans4 = preg_match($pattern2,’31’);

echo “<br> Ans4 : $ans4”;

//password complexity

$pwd = ‘/^(?=.*[[:digit:]])(?=.*[[:punct:]])[[:print:]]{6,}$/’;

//  ^  represent from the start of the pattern

//  (?=.*[[:digit:]]) represent that the assertion must contain atleast one digit

//   (?=.*[[:punct:]]) represent that the assertion must containt atleast one punctuation character

//   [[:print:]]{6,} represent that the pattern must have 6 or more printable characters

//    $ represent from the end of the pattern

 

$pass1 = ‘super3man’;

$p1 = preg_match($pwd, $pass1);

echo “<br> pass1 : $p1”;

$pass2 = ‘supe2m@n’;

$p2 = preg_match($pwd, $pass2);

echo “<br> pass2 : $p2”;

?>

 

The output :

 

Ans1 : 0

Ans2 : 1

Ans3 : 0

Ans4 : 1

pass1 : 0

pass2 : 1

 

Global Regular Expression

 

A global regular expression is used to find multiple matches of a pattern in a string. To find multiple matches of a pattern in a string, preg_match_all function is used.

 

Function Description
preg_match_all($pattern,$string,

$matches)

 

 

Returns no. of matches found in a string. This function stores all the

matched substrings in $matches. $matches is a multi-dimensional

array. In $matches, $matches[0] (the first element) is an array of

matched substring.

 

<?php

$string = ‘JTK-4702 JTK-1379’;

$pattern = ‘/JTK-[[:digit:]]{4}/’;

$count = preg_match_all($pattern,$string,$matches);

foreach($matches[0] as $match)

{

echo “<br> $match”;

}

?>

 

The output is:

 

JTK-4702

JTK-1379

 

Replace a regular expression with string

 

To replace the text which matches a pattern, preg_replace function is used. This function works similar to str_replace and stri_replace. However, preg_replace function allows to replace any text which can be matched with specified regular expression.

 

Function Description
preg_replace($pattern,$new,

$string)

Returns a string which is created by  replacing the $new of $string.

 

 

<?php

$items = ‘JTK-4702 JTM-3618’;

$items = preg_replace(‘/JT[KM]/’,’PCode’,$items);

echo $items;

?>

 

The output is:

 

PCode-4702 PCode-3618

 

Following code shows few of data validation using regular expression

 

<?php

//phone in 999-999-9999 format

$phone = ‘/^\d{3}-\d{3}-\d{4}$/’;

$phone1 = ‘/^[[:digit:]]{3}-[[:digit:]]{3}-[[:digit:]]{4}$/’; $p1 =’559-237-8024′;

$ans1 = preg_match($phone,$p1);

echo “<br> $ans1”;

$ans11 = preg_match($phone1,$p1);

echo “<br> Ans11 : $ans11”;

//  Credit Card in 9999-9999-9999-9999-999 $creditcard = ‘/^\d{4}(-\d{4}){3}$/’; $card = ‘1234-5678-9012-3456’;

$cardmatch = preg_match($creditcard,$card); echo “<br> Card Match is : $cardmatch”; //Code in 99999 or 99999-9999

$code = ‘/^\d{5}(-\d{4}) ?$/’; $zip1 = ‘12345’;

$zipm1 = preg_match($code,$zip1);

echo “<br> Zip1 $zip1 is matched”;

$zip2 = ‘12345-6789’;

$zipm2 = preg_match($code,$zip2);

echo “<br> Zip2 $zip2 is matched”;

//date in format dd/mm/yyyy

$date = ‘/^(0?[1-9]|[12][[:digit:]]|[3][01])\/(0?[1-9]|1[012])\/[[:digit:]]{4}$/’; $bday = ’12/6/1985′;

$bdmatch = preg_match($date, $bday);

echo “<br> Birthday : $bdmatch”;

?>

 

The output will be:

 

1

Ans11 : 1

Card Match is : 1

Zip1 12345 is matched

Zip2 12345-6789 is matched

Birthday : 1

 

References and suggested additional reading:

 

1. Luke Welling, Laura Thomson: PHP and MySQL Web Development, Pearson,

2. W. Jason Gilmore: Beginning PHP and MySQL 5 From Novice to Professional, Apress

3. Elizabeth Naramore, Jason Gerner, Yann Le Scouarnec, Jeremy Stolz, Michael K. Glass: Beginning PHP5, Apache, and MySQL Web Development, Wrox,

4. Robin Nixon: Learning PHP, MySQL, and JavaScript, O’Reilly Media

5. Ed Lecky-Thompson, Heow Eide-Goodman, Steven D. Nowicki, Alec Cove:Professional PHP 5, Wrox

6. Tim Converse, Joyce Park, Clark Morgan: PHP5 and MySQL Bible

7. Joel Murach, Ray Harris: Murach’s PHP and MySQL, Shroff/Murach

8. Ivan Bayross, Web Enabled Commercial Application Development Using HTML/Javascript/DHTML/PHP , BPB Publications

9. Julie C. Meloni, Sams Teach Yourself PHP, MySQL and Apache All in One, Sams

10. Larry Ullman, PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide,Pearson Education

11. http://www.php.net/

12. http://www.w3schools.com/

13. http://www.tutorialspoint.com/

14. http://www.noupe.com/development/php-regular-expressions.html

15. http://php5.kiev.ua/manual/en/regexp.reference.subpatterns.html