The Access Wizard Newsletter
Tips, Tricks and Traps for Access Users and Developers.

Left Right and Center – Parsing to Pull Things Apart

With the coming of the presidential election in just a few months, we are in the heart of our political season. Clearly, we have President Obama on the left and former Massachusetts Governor Romney on the right. Every once in a while, we will get a more centrist candidate toward the middle – John Anderson in 1980, for example. It is usually pretty easy to separate 2 or 3 candidates to figure out who is left middle and center.
 
But what if we’re talking about the House of Representatives? There are clearly candidates on the left (Socialist Bernie Sanders from Vermont) and on the right (Libertarian Ron Paul). But where is the middle of these all these congressmen and how do we determine how to split them up?
 
This month’s Wizard will do just that. We’ll use parsing techniques to pull apart the left, right, and center, albeit with data rather than congressmen.
 

The Problem

 
Frequently, if your database has to deal with other sources of information, you will come across a set of data that contains multiple data items in a single field. For instance, let’s say that you are given the following list of names in a single Excel column or a single field.
 
Kit T. Litter
Al Dente
Pearl E. Gates
I.D. Clair
 
Being a good database designer, you would have never put people’s full names in a single field; you would have separate fields for first, middle, and last names.
 

Logic Overview

 
For our incoming list above, in all four cases, the last name is everything to the right of the last space in the name.  So we will use the following rule for Last Name:
 
Determine where the last space is (excluding any trailing spaces) and declare that everything to the right of that space as the last name.
 
The first name is a bit tougher. At first glance, we might just say that everything before the first space is the first name. This works for the first three names but falls apart when we get to I.D. Clair, the first couple characters are clearly initials rather than a name.
 
We could just plow ahead the put I.D. into the first name field and in many cases that works just fine. But we could get more sophisticated and parse out those two characters.
 
We know that I.D. is probably not the first name, but rather initials. Software is not nearly as smart as you and I, so would have to tell it how to make that distinction. But since we have a logical indicator that we have used for ourselves, we can do the same thing with software. We will tackle this particular problem next month. For now, we will just declare that I.D. is a first name and fix it manually after the fact if necessary.
 
That leaves us with teasing out the middle name or initial. We can see that some of the names have a middle initial because it is followed by a period. In addition, for those two names, there are two spaces in the full name. So we can declare that, if there are two or more spaces in a name, there must be a middle name.
 
But be careful here, because the person who typed these into a computer may have hit a double space when typing and we don't want our software to conclude there is a middle name when there is none. We will solve this by eliminating any double spaces.
 
So now we have logic that will take our names and split them. This works for our particular data, but think about the complexity of real names. What happens if we get John Robert Louis Stevenson? Here there are four names rather than three. Well, depending on our application, we could adjust our code. We have the option of adding a fourth field to our database for a second middle name. However, in my 20 years plus of database management, I've never run across an instance where a database needs a fourth name, so we will just declare for our purposes that everything between the first space and the last space is a middle name. Using this rule, John Robert Louis Stevenson has a middle name of Robert Louis. If you have an application that needed to parse those out, that is certainly possible; it just makes life a bit more complicated. For our purposes, we will stick with the easy stuff.
 

Deciding Where to Make This Happen

 
There are two ways we can parse this information. The first is in a query using a series of immediate if statements - see the  May 2010 Wizard for information about how to use the immediate if statement.
 
The second way you can do this is in code. In almost all circumstances, doing this in code rather than with queries is the way to go because code has the following advantages:
  • Faster to develop
  • Easier to understand
  • Easier to debug
  • Easier to enhance

 

The Code to Make it Happen

Below is the code that makes this all happen. Before we dive in, a bit of philosophy on coding: Life is better when your code is easy to read and logical. If you have any experience in coding at all, you know that writing clear logical code will make debugging and future changes much easier.
 
Our overall approach is to do all the hard work up front and then just zip through the requested option as needed. Although we could have set up a complicated array to handle the return variables, or passed variables back and forth, I find it easier and clearer to send back the variable first, middle, or last as needed
 
Now, on to the code:
 
Function ParseName(strName As String, strFirstMiddleOrLast As String) As String
    Dim intFirstSpace As Integer
    Dim intLastSpace As Integer
    Dim intLenName As Integer
    Dim rv As String
  
    rv = ""
    strName = Trim(strName)
    strName = Replace(strName, "  ", " ")
    intLenName = Len(strName)
    intFirstSpace = InStr(1, strName, " ")
    intLastSpace = InStrRev(strName, " ")
    Select Case strFirstMiddleOrLast
        Case Is = "First"
            rv = Left(strName, intFirstSpace)
        Case Is = "Last"
            rv = Right(strName, intLenName - intLastSpace)
        Case Is = "Middle"
            rv = Mid(strName, intFirstSpace, intLastSpace - intFirstSpace) & ""
         Case Else
            rv = "unexpected Request for Name to Return - use First, Middle or Last Name"
    End Select
    rv = Trim(rv)
    ParseName = rv
End Function
 
Let’s pull this apart piece by piece.
 
Function ParseName(strName As String, strFirstMiddleOrLast As String) As String
 
You will note that there are two parameters that must be passed in, strName, the name to parse, and strFirstMiddleOrLast, the name to return. In other words, the calling routine has to supply the full name and then request which name, first, middle, or last, to get back.
 
Next we have our variables:
 
   intFirstSpace As Integer       - the position of the first space we encounter
   intLastSpace As Integer        - the position of the last space we encounter
   intLenName As Integer         - the length of the name passed in (after cleanup)
   rv As String                           - a variable to save us from typing
 
A note about the rv variable: rv stands for return variable. Its sole purpose is to save us keystrokes. We could have used “ParseName” everyplace we use “rv”, but by using rv we save several keystrokes every time we have to reference ParseName. This not only speeds typing but also reduces the opportunity for errors. 
 
     rv = ""
We start by filling our return variable with an empty string “”. This guarantees that no matter what happens we have something for our function to return.
 
    strName = Trim(strName) – This removes both leading and trailing spaces.
 
    strName = Replace(strName, "  ", " ")  – This replaces any double spaces with a single space.
 
At this point, we have a clean name with no extra spaces.
 
    intLenName = Len(strName) – This gives us the length of the name.
 
    intFirstSpace = InStr(1, strName, " ") – This gives us the position of the first space by using the inStr function, a function that returns the first instance of whatever character string is sought.
 
    intLastSpace = InStrRev(strName, " ") – This gives us the position of the last space. The function InStrRev works the same way as Instr, except that it starts from the end and works backward.
 
    Select Case strFirstMiddleOrLast
The select case routine does a series test on our variable strFirstMiddleOrLast. It will exit the select Case routine once one of the tests is true.
 
        Case Is = "First"
            rv = Left(strName, intFirstSpace)
If the first name is requested, we return everything up to the position of the first space.
 
        Case Is = "Last"
            rv = Right(strName, intLenName - intLastSpace)
If the last name is requested, we return everything to the right of the last space.
 
        Case Is = "Middle"
            rv = Mid(strName, intFirstSpace, intLastSpace - intFirstSpace) & ""
If the middle name is requested, we use the mid function. This function returns the middle of our text string beginning at the first space and returns a string the length of the last space minus the first space.
 
         Case Else
            rv = "unexpected Request for Name to Return - use First, Middle or Last Name"
If we get here, something went wrong. Rather than having the function explode on us, we return a message providing guidance to the person calling the function.
 
    End Select
This closes out the Select Case routine.
 
    rv = Trim(rv)
We clean up our return variable by stripping out any leading or trailing spaces. This saves us from having to do more precise measurement of our first, last, and middle name lengths.
 
    ParseName = rv
And finally we load our return function with our return variable.
 
End Function  - and close out the function
 
 

Putting it into Action and Conclusion

 
The easiest way to test this is in the immediate window. From the code window press Control-G. Then type:
 
? ParseName(“John  Q Public”, “First”)
And you should get back “John”.
 
Do a bit more experimentation in the immediate window to test the middle and last names, to make sure things work as expected.
 
Once you have done the testing in the immediate window, put your function into a query similar to the one below.



And if your code works properly when you run this, you should get:



So from now on when you get the list of names that are all lumped into a single cell or are in a single field, you can now pull them apart.

 

   Trap of the Month – Names Are Not Straightforward

This month, we have been dealing with names and I’ve shown you how to pull these apart into neat little boxes of first, middle, and last using the space characters as our lever between the names. Most of the time we get easy names like Barak Obama or Mitt Romney; but in real life we also get names like Kay Bailey Hutchinson (current Texas Senator) and Chris Van Hollen (current Representative from Maryland). In both cases, the last name has a space in it.  So our code wouldn’t work in the above instances.

Better yet, consider physicist Robert J. Van de Graaff and actor James Van Der Beek. Here we have three separate words that make up the last name. You could write code until the cows come home and still encounter problems that would have to be resolved manually.
 
Even more interesting, what happens when you encounter a name like performers Madonna or Eminem?  Are those first names or last names?
 
So if you’re parsing names, allocate some time to review the results and hope that you never have to deal with the artist formerly known as Prince.

 

Tip of the Month: Know When to Hold Them, Know When to Fold Them
 

If you are a hard core developer, like me, when you see a problem like parsing names and begin running into unusual situations: Robert J. Van de Graaff, Prince, and the like, you realize that, yes, there is a solution! A solution you want to implement by writing very clever code with supporting reference tables. When you do this, it feels really good!
 
Keep in mind though that you can spend hours developing a solution that a human being at a key board can fix in a matter of seconds. If you’re doing this for your own entertainment, that’s fine. However, if you’re working on a project that someone else is paying for (be it an employer or a customer) resist the urge to code your way through a problem just because you can. Think of the issue from a business point of view and do a cost-benefit analysis before reaching for the keyboard.


Quick Links:

Custom Software Home

Access Wizard Archives

Our Services


© Copyright Custom Software
All Rights Reserved | Westford, MA 01886