Six Ways to Parse and Reformat Using Regular Expressions

The other day, I was consulted by a colleague on a regular expression. For those of you that know me, this is one of my favorite consultations, so I was thrilled to help him. He was doing a simple parse-and-reformat. It warmed my insides to know that he identified this as a perfect regular expression scenario and implemented it that way. It was a functional solution, but I felt that it could be simplified and more maintainable.

I’ll venture to say that the most straightforward way to do a regular expression parse-and-reformat for a developer that’s not familiar with regular expressions (You call yourself a developer..!?) is by creating a Match object and reformatting it.

1. Using a Match object

var date = "4/18/2013";
var regex = new Regex(@"^(\d+)/(\d+)/(\d+)$");

var match = regex.Match(date);
var result = string.Format("{0}-{1}-{2}", 
	match.Groups[3], 
	match.Groups[2], 
	match.Groups[1]);

Console.WriteLine(result);

You can accomplish the same task without creating a Match object by using the Replace method. There is a version that accepts a MatchEvaluator–which can be a lambda expression–so you can basically take the previous solution and plug it in.

2. Using a MatchEvaluator

var date = "4/18/2013";
var regex = new Regex(@"^(\d+)/(\d+)/(\d+)$");

var result = regex.Replace(date, 
	m => string.Format("{0}-{1}-{2}", 
		m.Groups[3], 
		m.Groups[2], 
		m.Groups[1]));

Console.WriteLine(result);

That’s a little bit better, but it’s still a little verbose. There’s another overload of the Replace method that accepts a replacement string. This allows you to skip the Match object altogether, and it results in a nice, tidy solution.

3. Using a replacement string

var date = "4/18/2013";
var regex = new Regex(@"^(\d+)/(\d+)/(\d+)$");

var result = regex.Replace(date, "${3}-${1}-${2}");

Console.WriteLine(result);

I have two problems with all three of these solutions, though. First, they use hard-coded indexes to access the capture groups. If another developer comes along and modifies the regular expression by adding another capture group, it could unintentionally affect the reformatting logic. The second issue I have is that it’s hard to understand the intent of the code. I have to read and process the regular expression and its capture groups in order to determine what the code is trying to do. These two issues add up to poor maintainability.

Don’t worry, though. Regular expressions have a built-in mechanism for naming capture groups. By modifying the regular expression, you can now reference the capture groups by name instead of index. It makes the regular expression itself a little noisier, but the rest of the code becomes much more readable and maintainable. Way better!

4. Using a Match object with named capture groups

var date = "4/18/2013";
var regex = new Regex(
	@"^(?<day>\d+)/(?<month>\d+)/(?<year>\d+)$");

var match = regex.Match(date);
var result = string.Format("{0}-{1}-{2}", 
	match.Groups["year"], 
	match.Groups["month"], 
	match.Groups["day"]);

Console.WriteLine(result);

5. Using a MatchEvaluator with named capture groups

var date = "4/18/2013";
var regex = new Regex(
	@"^(?<day>\d+)/(?<month>\d+)/(?<year>\d+)$");

var result = regex.Replace(date, 
	m => string.Format("{0}-{1}-{2}", 
		m.Groups["year"], 
		m.Groups["month"], 
		m.Groups["day"]));

Console.WriteLine(result);

6. Using a replacement string with named capture groups

var date = "4/18/2013";
var regex = new Regex(
	@"^(?<day>\d+)/(?<month>\d+)/(?<year>\d+)$");

var result = regex.Replace(date, "${year}-${month}-${day}");

Console.WriteLine(result);
Advertisements

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s