In my business, we do a lot with addresses. Generally, we rely on 3rd party products from companies like ESRI for what we need, but from time to time, we still need to parse an address the old-fashioned way. Something like US Address Parser is exactly what I need, but I can’t use it since it’s GPL’d. I didn’t need an exhaustive, perfect solution, so I thought I’d just whip one up with regular expressions.
Sample input:
- 100 MAIN
- 100 MAIN ST
- 100 S MAIN ST
- 100 S MAIN ST W
- 100 S MAIN ST W APT 1A
Create StreetAddress class
The first step was simply to create an address object with the properties I needed:
public class StreetAddress { public string HouseNumber { get; set; } public string StreetPrefix { get; set; } public string StreetName { get; set; } public string StreetType { get; set; } public string StreetSuffix { get; set; } public string Apt { get; set; } }
Build regular expression
The next thing I did was get to work on my regular expression. I built my expression with the help of RegExr and did my initial testing. Once I was satisfied, I moved it over to code. Here’s what I came up with:
private static string BuildPattern() { var pattern = "^" + // beginning of string "(?<HouseNumber>\\d+)" + // 1 or more digits "(?:\\s+(?<StreetPrefix>" + GetStreetPrefixes() + "))?" + // whitespace + valid prefix (optional) "(?:\\s+(?<StreetName>.*?))" + // whitespace + anything "(?:" + // group (optional) { "(?:\\s+(?<StreetType>" + GetStreetTypes() + "))" + // whitespace + valid street type "(?:\\s+(?<StreetSuffix>" + GetStreetSuffixes() + "))?" + // whitespace + valid street suffix (optional) "(?:\\s+(?<Apt>.*))?" + // whitespace + anything (optional) ")?" + // } "$"; // end of string return pattern; }
Functions for valid values
Note that there are several functions called while building the regular expression. This is done purely for readability and maintainability. Here are the functions, which each just return a pipe-delimited list of valid values:
private static string GetStreetPrefixes() { return "TE|NW|HW|RD|E|MA|EI|NO|AU|SE|GR|OL|W|MM|OM|SW|ME|HA|JO|OV|S|OH|NE|K|N"; } private static string GetStreetTypes() { return "TE|STCT|DR|SPGS|PARK|GRV|CRK|XING|BR|PINE|CTS|TRL|VI|RD|PIKE|MA|LO|TER|UN|CIR|WALK|CO|RUN|FRD|LDG|ML|AVE|NO|PA|SQ|BLVD|VLGS|VLY|GR|LN|HOUSE|VLG|OL|STA|CH|ROW|EXT|JC|BLDG|FLD|CT|HTS|MOTEL|PKWY|COOP|ACRES|ESTS|SCH|HL|CORD|ST|CLB|FLDS|PT|STPL|MDWS|APTS|ME|LOOP|SMT|RDG|UNIV|PLZ|MDW|EXPY|WALL|TR|FLS|HBR|TRFY|BCH|CRST|CI|PKY|OV|RNCH|CV|DIV|WA|S|WAY|I|CTR|VIS|PL|ANX|BL|ST TER|DM|STHY|RR|MNR"; } private static string GetStreetSuffixes() { return "NW|E|SE|W|SW|S|NE|N"; }
Parse the input
At this point, the work is done. All that’s left is to run the regular expression on your address string and deal with the results.
public static StreetAddress Parse(string address) { if (string.IsNullOrEmpty(address)) return new StreetAddress(); StreetAddress result; var input = address.ToUpper(); var re = new Regex(BuildPattern()); if (re.IsMatch(input)) { var m = re.Match(input); result = new StreetAddress { HouseNumber = m.Groups["HouseNumber"].Value, StreetPrefix = m.Groups["StreetPrefix"].Value, StreetName = m.Groups["StreetName"].Value, StreetType = m.Groups["StreetType"].Value, StreetSuffix = m.Groups["StreetSuffix"].Value, Apt = m.Groups["Apt"].Value, }; } else { result = new StreetAddress { StreetName = input, }; } return result; }
End product
And, finally, for those of you who love big, gnarly regular expressions, here’s my end product:
^(?<HouseNumber>\\d+)(?:\\s+(?<StreetPrefix>TE|NW|HW|RD|E|MA|EI|NO|AU|SE|GR|OL|W|MM|OM|SW|ME|HA|JO|OV|S|OH|NE|K|N))?(?:\\s+(?<StreetName>.*?))(?:(?:\\s+(?<StreetType>TE|STCT|DR|SPGS|PARK|GRV|CRK|XING|BR|PINE|CTS|TRL|VI|RD|PIKE|MA|LO|TER|UN|CIR|WALK|CO|RUN|FRD|LDG|ML|AVE|NO|PA|SQ|BLVD|VLGS|VLY|GR|LN|HOUSE|VLG|OL|STA|CH|ROW|EXT|JC|BLDG|FLD|CT|HTS|MOTEL|PKWY|COOP|ACRES|ESTS|SCH|HL|CORD|ST|CLB|FLDS|PT|STPL|MDWS|APTS|ME|LOOP|SMT|RDG|UNIV|PLZ|MDW|EXPY|WALL|TR|FLS|HBR|TRFY|BCH|CRST|CI|PKY|OV|RNCH|CV|DIV|WA|S|WAY|I|CTR|VIS|PL|ANX|BL|ST TER|DM|STHY|RR|MNR))(?:\\s+(?<StreetSuffix>NW|E|SE|W|SW|S|NE|N))?(?:\\s+(?<Apt>.*))?)?$