Advanced Regex: Power of lookahead and lookbehind

We might have been using Regex in all sort of tasks like splitting, replacing or matching strings. But you might not know one powerful feature of Regex which can really help you in many cases that you thought Regex is impossible. And today, we are going to talk about lookbehind and lookahead in Regex

  1. Lookbehind

Before going into details, let's start with an example. Let's say we have the following string

my phone number is010292291, this is a cool phone number: 010292291. Yeah1414757474 is another phone number

Our task: Find all numbers which are not separated from the front word by a white space.

The naive solution is to use the following regex: [A-Za-z]\d+. This will return all strings that start with a letter and many digits after it, and in this case, they are s010292291 and h1414757474. And then, we can iterate through them, remove the first character and get the number.

But that does not seem to be a very good solution. Why we have to manually remove the first letter? Why we can not just get the number I need without the letter? That's when we need to use Lookbehind.

Lookbehind is also a Regex, but the difference is the matching result will not include characters satisfying lookbehind expression. Its general form is:

(lookbehind expression)(normal regex)

Back to our example, here is the regex for searching numbers following a non-whitespace character

(?<=[A-Za-z])\d+

The expression (?<=[A-Za-z]) is called lookbehind expression.Using this expression ensure that we will get the strings which not only satisfy the Regex but also do not consist of the redundant letter. In this case, there are two matching strings: s010292291 and h1414757474. But the actual match results will not contain characters satisfying lookbehind expression, so it will just include the number 010292291 and 1414757474.

In the above case, if you want to change the condition to something like "strings which do not start with a letter", you can use not operator (!) like this

(?<![A-Za-z])\d+
  1. Lookahead
    =====

It is basically the same as lookbehind, the only difference is it matches characters at the end of the string. The syntax is slightly different, we use (?=) or (?!), like this example

\d+(?=[A-Za-z]) #find all numbers followed by a letter 

or

\d+(?![A-Za-z]) #find all numbers which are not followed by a letter

That is all for today post, I just want give you a very quick introduction to lookbehind and lookahead. If you want to know more details about how they work or more examples, please check this link: http://www.regular-expressions.info/lookaround.html. In addition, I recommend using Rubular (http://rubular.com) for testing Regular Expression.

Happy coding!