The other day, I was consulted by a colleague on a regular expression. For those of you that know me, this is one of my favorite consultations, so I was thrilled to help him. He was doing a simple parse-and-reformat. It warmed my insides to know that he identified this as a perfect regular expression scenario and implemented it that way. It was a functional solution, but I felt that it could be simplified and more maintainable.
I’ll venture to say that the most straightforward way to do a regular expression parse-and-reformat for a developer that’s not familiar with regular expressions (You call yourself a developer..!?) is by creating a Match object and reformatting it.
1. Using a Match object
var date = "4/18/2013";
var regex = new Regex(@"^(\d+)/(\d+)/(\d+)$");
var match = regex.Match(date);
var result = string.Format("{0}-{1}-{2}",
match.Groups[3],
match.Groups[2],
match.Groups[1]);
Console.WriteLine(result);
You can accomplish the same task without creating a Match object by using the Replace method. There is a version that accepts a MatchEvaluator–which can be a lambda expression–so you can basically take the previous solution and plug it in.
2. Using a MatchEvaluator
var date = "4/18/2013";
var regex = new Regex(@"^(\d+)/(\d+)/(\d+)$");
var result = regex.Replace(date,
m => string.Format("{0}-{1}-{2}",
m.Groups[3],
m.Groups[2],
m.Groups[1]));
Console.WriteLine(result);
That’s a little bit better, but it’s still a little verbose. There’s another overload of the Replace method that accepts a replacement string. This allows you to skip the Match object altogether, and it results in a nice, tidy solution.
3. Using a replacement string
var date = "4/18/2013";
var regex = new Regex(@"^(\d+)/(\d+)/(\d+)$");
var result = regex.Replace(date, "${3}-${1}-${2}");
Console.WriteLine(result);
I have two problems with all three of these solutions, though. First, they use hard-coded indexes to access the capture groups. If another developer comes along and modifies the regular expression by adding another capture group, it could unintentionally affect the reformatting logic. The second issue I have is that it’s hard to understand the intent of the code. I have to read and process the regular expression and its capture groups in order to determine what the code is trying to do. These two issues add up to poor maintainability.
Don’t worry, though. Regular expressions have a built-in mechanism for naming capture groups. By modifying the regular expression, you can now reference the capture groups by name instead of index. It makes the regular expression itself a little noisier, but the rest of the code becomes much more readable and maintainable. Way better!
4. Using a Match object with named capture groups
var date = "4/18/2013";
var regex = new Regex(
@"^(?<day>\d+)/(?<month>\d+)/(?<year>\d+)$");
var match = regex.Match(date);
var result = string.Format("{0}-{1}-{2}",
match.Groups["year"],
match.Groups["month"],
match.Groups["day"]);
Console.WriteLine(result);
5. Using a MatchEvaluator with named capture groups
var date = "4/18/2013";
var regex = new Regex(
@"^(?<day>\d+)/(?<month>\d+)/(?<year>\d+)$");
var result = regex.Replace(date,
m => string.Format("{0}-{1}-{2}",
m.Groups["year"],
m.Groups["month"],
m.Groups["day"]));
Console.WriteLine(result);
6. Using a replacement string with named capture groups
var date = "4/18/2013";
var regex = new Regex(
@"^(?<day>\d+)/(?<month>\d+)/(?<year>\d+)$");
var result = regex.Replace(date, "${year}-${month}-${day}");
Console.WriteLine(result);
Like this:
Like Loading...