In my business, we do a lot with addresses. Generally, we rely on 3rd party products from companies like ESRI for what we need, but from time to time, we still need to parse an address the old-fashioned way. Something like US Address Parser is exactly what I need, but I can’t use it since it’s GPL’d. I didn’t need an exhaustive, perfect solution, so I thought I’d just whip one up with regular expressions.
Sample input:
- 100 MAIN
- 100 MAIN ST
- 100 S MAIN ST
- 100 S MAIN ST W
- 100 S MAIN ST W APT 1A
Create StreetAddress class
The first step was simply to create an address object with the properties I needed:
public class StreetAddress { public string HouseNumber { get; set; } public string StreetPrefix { get; set; } public string StreetName { get; set; } public string StreetType { get; set; } public string StreetSuffix { get; set; } public string Apt { get; set; } }
Build regular expression
The next thing I did was get to work on my regular expression. I built my expression with the help of RegExr and did my initial testing. Once I was satisfied, I moved it over to code. Here’s what I came up with:
private static string BuildPattern() { var pattern = "^" + // beginning of string "(?<HouseNumber>\\d+)" + // 1 or more digits "(?:\\s+(?<StreetPrefix>" + GetStreetPrefixes() + "))?" + // whitespace + valid prefix (optional) "(?:\\s+(?<StreetName>.*?))" + // whitespace + anything "(?:" + // group (optional) { "(?:\\s+(?<StreetType>" + GetStreetTypes() + "))" + // whitespace + valid street type "(?:\\s+(?<StreetSuffix>" + GetStreetSuffixes() + "))?" + // whitespace + valid street suffix (optional) "(?:\\s+(?<Apt>.*))?" + // whitespace + anything (optional) ")?" + // } "$"; // end of string return pattern; }
Functions for valid values
Note that there are several functions called while building the regular expression. This is done purely for readability and maintainability. Here are the functions, which each just return a pipe-delimited list of valid values:
private static string GetStreetPrefixes() { return "TE|NW|HW|RD|E|MA|EI|NO|AU|SE|GR|OL|W|MM|OM|SW|ME|HA|JO|OV|S|OH|NE|K|N"; } private static string GetStreetTypes() { return "TE|STCT|DR|SPGS|PARK|GRV|CRK|XING|BR|PINE|CTS|TRL|VI|RD|PIKE|MA|LO|TER|UN|CIR|WALK|CO|RUN|FRD|LDG|ML|AVE|NO|PA|SQ|BLVD|VLGS|VLY|GR|LN|HOUSE|VLG|OL|STA|CH|ROW|EXT|JC|BLDG|FLD|CT|HTS|MOTEL|PKWY|COOP|ACRES|ESTS|SCH|HL|CORD|ST|CLB|FLDS|PT|STPL|MDWS|APTS|ME|LOOP|SMT|RDG|UNIV|PLZ|MDW|EXPY|WALL|TR|FLS|HBR|TRFY|BCH|CRST|CI|PKY|OV|RNCH|CV|DIV|WA|S|WAY|I|CTR|VIS|PL|ANX|BL|ST TER|DM|STHY|RR|MNR"; } private static string GetStreetSuffixes() { return "NW|E|SE|W|SW|S|NE|N"; }
Parse the input
At this point, the work is done. All that’s left is to run the regular expression on your address string and deal with the results.
public static StreetAddress Parse(string address) { if (string.IsNullOrEmpty(address)) return new StreetAddress(); StreetAddress result; var input = address.ToUpper(); var re = new Regex(BuildPattern()); if (re.IsMatch(input)) { var m = re.Match(input); result = new StreetAddress { HouseNumber = m.Groups["HouseNumber"].Value, StreetPrefix = m.Groups["StreetPrefix"].Value, StreetName = m.Groups["StreetName"].Value, StreetType = m.Groups["StreetType"].Value, StreetSuffix = m.Groups["StreetSuffix"].Value, Apt = m.Groups["Apt"].Value, }; } else { result = new StreetAddress { StreetName = input, }; } return result; }
End product
And, finally, for those of you who love big, gnarly regular expressions, here’s my end product:
^(?<HouseNumber>\\d+)(?:\\s+(?<StreetPrefix>TE|NW|HW|RD|E|MA|EI|NO|AU|SE|GR|OL|W|MM|OM|SW|ME|HA|JO|OV|S|OH|NE|K|N))?(?:\\s+(?<StreetName>.*?))(?:(?:\\s+(?<StreetType>TE|STCT|DR|SPGS|PARK|GRV|CRK|XING|BR|PINE|CTS|TRL|VI|RD|PIKE|MA|LO|TER|UN|CIR|WALK|CO|RUN|FRD|LDG|ML|AVE|NO|PA|SQ|BLVD|VLGS|VLY|GR|LN|HOUSE|VLG|OL|STA|CH|ROW|EXT|JC|BLDG|FLD|CT|HTS|MOTEL|PKWY|COOP|ACRES|ESTS|SCH|HL|CORD|ST|CLB|FLDS|PT|STPL|MDWS|APTS|ME|LOOP|SMT|RDG|UNIV|PLZ|MDW|EXPY|WALL|TR|FLS|HBR|TRFY|BCH|CRST|CI|PKY|OV|RNCH|CV|DIV|WA|S|WAY|I|CTR|VIS|PL|ANX|BL|ST TER|DM|STHY|RR|MNR))(?:\\s+(?<StreetSuffix>NW|E|SE|W|SW|S|NE|N))?(?:\\s+(?<Apt>.*))?)?$
Dear Adam,
This week I need to parse an address into different fields in SQL and I found your blog. I am try to follow your steps with VS 2005 but get 14 errors. Could you please provide more details or a sample project ? You help will be highly appreciated.
Thanks a lot!
Can you tell me what error(s) you’re getting?
Hi Adam,
thanks for your time and your code. Would you please if possible to place an example code of how to use your code for people who are new into c# and like to take advantage of your code?
Thanks again
Sure! Here’s the code from the console application that produces the output in the screenshot example:
static void Main(string[] args)
{
string input = string.Empty;
while (!string.Equals(input, “x”, StringComparison.CurrentCultureIgnoreCase))
{
Console.Write(“Address (x to quit): “);
input = Console.ReadLine();
var a = Parser.Parse(input);
Console.WriteLine(“House Number: {0}”, a.HouseNumber);
Console.WriteLine(“Street Prefix: {0}”, a.StreetPrefix);
Console.WriteLine(“Street Name: {0}”, a.StreetName);
Console.WriteLine(“Street Type: {0}”, a.StreetType);
Console.WriteLine(“Street Suffix: {0}”, a.StreetSuffix);
Console.WriteLine(“Apt: {0}”, a.Apt);
Console.WriteLine();
}
}
Very cool. I converted it to SQLCLR but it failed badly. Never returned. Works great as an app, though.
This is great. Needed it for a SharePoint custom workflow. I’m marginal at regular expressions. This saved me hours of work. Thanks.
Very nice…saved me a ton of time!
Hi, Man. I Iike a lot your work done here. I wonder if in the street name appears the word “Street” instead of “St/ST” this code still working. Best regards