If you’ve worked with regular expressions at all, you know it’s easy for them to become quite unruly. It can be hard to decipher a regular expression as you’re working on it, when you know everything you’re trying to accomplish. Imagine how hard it will be for the poor guy who has to do maintenance on that thing later!
There are a few things you can do to make it better for everybody in the long run.
Write Unit Tests
Unit tests are PERFECT for any code that uses regular expressions because you can write a test for each different scenario that you’re trying to match. You don’t have to worry about accidentally breaking something that you had working previously because the tests will regression test everything as you go.
Include Samples
I like to include samples in the code to make it as obvious as possibly what’s going on to anybody looking at the code. I don’t want developers to have to mentally process a regular expression unless they’re there to work on the regular expression itself. I like to provide simple examples like this:
// matches a field and value in quotes // matches // foo = "bar" // foo="bar" // doesn't match // foo = bar // foo : "bar" var pattern = @"((\w+)\s*=\s*("".*?"")";
Include Comments in the Pattern
Another trick you can do is to include comments in the regular expression itself by using #. This can be a helpful development tool, too, because it allows you to write out what you’re trying to match in isolated chunks. Note that you’ll need to use the IgnorePatternWhitespace option for this technique to work.
var pattern = @"( (?:"".*?"") # anything between quotes (?: -> not-captured) | # or \S+ # one or more non-whitespace characters )"; Regex re = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);
I really, really like regular expressions, but they can definitely be maintenance land mines. So, when you use them, do future developers a solid and use tips like these to make them as maintainable as possible.