Tag Archives: regular expressions

Parse US Street Addresses with Regular Expression in C#

In my business, we do a lot with addresses. Generally, we rely on 3rd party products from companies like ESRI for what we need, but from time to time, we still need to parse an address the old-fashioned way. Something like US Address Parser is exactly what I need, but I can’t use it since it’s GPL’d. I didn’t need an exhaustive, perfect solution, so I thought I’d just whip one up with regular expressions.

Sample input:

  • 100 MAIN
  • 100 MAIN ST
  • 100 S MAIN ST
  • 100 S MAIN ST W
  • 100 S MAIN ST W APT 1A

Create StreetAddress class

The first step was simply to create an address object with the properties I needed:

public class StreetAddress
{
    public string HouseNumber { get; set; }
    public string StreetPrefix { get; set; }
    public string StreetName { get; set; }
    public string StreetType { get; set; }
    public string StreetSuffix { get; set; }
    public string Apt { get; set; }
}

Build regular expression

The next thing I did was get to work on my regular expression. I built my expression with the help of RegExr and did my initial testing. Once I was satisfied, I moved it over to code. Here’s what I came up with:

private static string BuildPattern()
{
    var pattern = "^" +                                                       // beginning of string
                    "(?<HouseNumber>\\d+)" +                                  // 1 or more digits
                    "(?:\\s+(?<StreetPrefix>" + GetStreetPrefixes() + "))?" + // whitespace + valid prefix (optional)
                    "(?:\\s+(?<StreetName>.*?))" +                            // whitespace + anything
                    "(?:" +                                                   // group (optional) {
                    "(?:\\s+(?<StreetType>" + GetStreetTypes() + "))" +       //   whitespace + valid street type
                    "(?:\\s+(?<StreetSuffix>" + GetStreetSuffixes() + "))?" + //   whitespace + valid street suffix (optional)
                    "(?:\\s+(?<Apt>.*))?" +                                   //   whitespace + anything (optional)
                    ")?" +                                                    // }
                    "$";                                                      // end of string

    return pattern;
}

Functions for valid values

Note that there are several functions called while building the regular expression. This is done purely for readability and maintainability. Here are the functions, which each just return a pipe-delimited list of valid values:

private static string GetStreetPrefixes()
{
    return "TE|NW|HW|RD|E|MA|EI|NO|AU|SE|GR|OL|W|MM|OM|SW|ME|HA|JO|OV|S|OH|NE|K|N";
}

private static string GetStreetTypes()
{
    return "TE|STCT|DR|SPGS|PARK|GRV|CRK|XING|BR|PINE|CTS|TRL|VI|RD|PIKE|MA|LO|TER|UN|CIR|WALK|CO|RUN|FRD|LDG|ML|AVE|NO|PA|SQ|BLVD|VLGS|VLY|GR|LN|HOUSE|VLG|OL|STA|CH|ROW|EXT|JC|BLDG|FLD|CT|HTS|MOTEL|PKWY|COOP|ACRES|ESTS|SCH|HL|CORD|ST|CLB|FLDS|PT|STPL|MDWS|APTS|ME|LOOP|SMT|RDG|UNIV|PLZ|MDW|EXPY|WALL|TR|FLS|HBR|TRFY|BCH|CRST|CI|PKY|OV|RNCH|CV|DIV|WA|S|WAY|I|CTR|VIS|PL|ANX|BL|ST TER|DM|STHY|RR|MNR";
}

private static string GetStreetSuffixes()
{
    return "NW|E|SE|W|SW|S|NE|N";
}

Parse the input

At this point, the work is done. All that’s left is to run the regular expression on your address string and deal with the results.

public static StreetAddress Parse(string address)
{
    if (string.IsNullOrEmpty(address))
        return new StreetAddress();
            
    StreetAddress result;
    var input = address.ToUpper();
            
    var re = new Regex(BuildPattern());
    if (re.IsMatch(input))
    {
        var m = re.Match(input);
        result = new StreetAddress
                        {
                            HouseNumber = m.Groups["HouseNumber"].Value,
                            StreetPrefix = m.Groups["StreetPrefix"].Value,
                            StreetName = m.Groups["StreetName"].Value,
                            StreetType = m.Groups["StreetType"].Value,
                            StreetSuffix = m.Groups["StreetSuffix"].Value,
                            Apt = m.Groups["Apt"].Value,
                        };
    }
    else
    {
        result = new StreetAddress
                        {
                            StreetName = input,
                        };
    }
    return result;
}

End product

And, finally, for those of you who love big, gnarly regular expressions, here’s my end product:

^(?<HouseNumber>\\d+)(?:\\s+(?<StreetPrefix>TE|NW|HW|RD|E|MA|EI|NO|AU|SE|GR|OL|W|MM|OM|SW|ME|HA|JO|OV|S|OH|NE|K|N))?(?:\\s+(?<StreetName>.*?))(?:(?:\\s+(?<StreetType>TE|STCT|DR|SPGS|PARK|GRV|CRK|XING|BR|PINE|CTS|TRL|VI|RD|PIKE|MA|LO|TER|UN|CIR|WALK|CO|RUN|FRD|LDG|ML|AVE|NO|PA|SQ|BLVD|VLGS|VLY|GR|LN|HOUSE|VLG|OL|STA|CH|ROW|EXT|JC|BLDG|FLD|CT|HTS|MOTEL|PKWY|COOP|ACRES|ESTS|SCH|HL|CORD|ST|CLB|FLDS|PT|STPL|MDWS|APTS|ME|LOOP|SMT|RDG|UNIV|PLZ|MDW|EXPY|WALL|TR|FLS|HBR|TRFY|BCH|CRST|CI|PKY|OV|RNCH|CV|DIV|WA|S|WAY|I|CTR|VIS|PL|ANX|BL|ST TER|DM|STHY|RR|MNR))(?:\\s+(?<StreetSuffix>NW|E|SE|W|SW|S|NE|N))?(?:\\s+(?<Apt>.*))?)?$

Value Extraction with Regular Expressions in C#

Regular expressions are one of my favorite things in programming. Each time I write one, it’s like a challenging little brain teaser. One of the things that I commonly use them for is to extract data out of a string.

In the past, I’ve done this by instantiating a Regex with a pattern, checking for matches, getting a MatchCollection, iterating through its matches, and, finally, pulling my “value” out of the match’s group. That’s a whole lot of work to extract a piece of data, and I’ve always suspected there’s an easier way.

I figured out how to do this elegantly just the other day, and I was thrilled. I was working with an alphanumeric text field that was left-padded with 0s. I needed to strip the 0s, and my mind instantly went to regular expressions. Using the static Result method, you can specify capture groups for the output. So, getting my value could be done in a single operation!

// trim leading 0s 
if (value.StartsWith("0")) 
{ 
    value = Regex.Match(value, "^0+(.*)$").Result("$1"); 
}

For those of you who may not be as regular expression savvy, here’s what’s going on:

  • ^ – the beginning of the string; we use this so that we don’t match on a subset of the string
  • 0+ – one or more 0s
  • (.*) – zero or more characters; the parentheses indicate that this is a capture group
  • $ – the end of the string; we again use this so that we don’t match on a subset of the string
  • $1 – $n can be used to output the value of a capture group

Wonderful!

Regular expressions with ASCII values

I was writing some unit tests today to test a format out. The format that I was testing used ASCII characters for FS, GS, RS, and US.

A sample format might look like this:

1.03:1[us]00[rs]2[us]01[rs]10[us]01[gs]

So, in my test, I wanted to verify that my string started with “1.03:” and ended with “[rs]10[us]someValue[gs]” However, I didn’t know how to check for those pesky ASCII characters, though! After a bit of Googling, I found the answer, and it’s actually pretty simple. You can use an escaped u in a regular expression to specify a four-digit Unicode character. After a quick ASCII-to-Unicode lookup (here) I came up with the perfect regular expression:

Regex.IsMatch(contents, @"1.03:.*?\u001E10\u001F0*" + expected + @"\u001D")

Thanks for being so awesome, regular expressions!