How use of Regular Expressions affects Website Performance


A few months back, I inadvertently came across a bug related to regular expressions. It was a new learning for me that improper regular expressions can cause catastrophic performance problems.

Let me first give you more details about the bug I found.

The form I was testing had a text box “Special Note” with maximum character limit set as 500.
When the form was submitted by the user, a regular expression was used to check that the text did not contain any Email address.
Everything worked fine when I entered strings with short words.
The problem started occurring when I entered a long character string (eg. “asdfghjiklasdfsdfsdkjhdfseds”) in the Special Note field, and then submitted the form. The software stopped responding to user input in this scenario.

The Performance was getting adversely affected when trying to match the regex pattern with the entered long character string.

To find out more about the issue I had faced, I researched online about how regular expressions can affect performance. I found that the performance issue occurs because of  backtracking. I also found  Microsoft’s example of a regex for email address that can cause performance issues. The linked articles provide more detail on backtracking, and on how the issue occurs if the regular expression used is not optimized.

So, when you are required to test a text field that uses a regular expression, always include the following test cases in your testing:

1. Different possibilities of text that matches the regular expression pattern.

2. Different possibilities of text that does not match the regular expression pattern.

3. Different possibilities of text that almost matches the regular expression pattern –  Including, a long string of characters, with no whitespace in between, that nearly (but not completely) matches the regular expression.

Advertisements