Custom Software

The Access Wizard Newsletter

Tips Tricks and Traps for Access Users and Developers

January 2005

 

The Plague of Redundant Mail or How to Avoid Killing Trees and Wasting Money by Eliminating Near Duplicates

I'm a big fan of mail order. I get most of my hardware and software through mail order houses and I also use them for food and gift baskets. Not surprisingly, I'm on a whole bunch of mail order lists. I think that I alone could save some trees if I could convince these mail order folks that I don't use their flyers, catalogs and "special pricing" sheets. I just recycle them and continue to order off the net.

Just after the beginning of the New Year, I got three identical catalogs all from the same company; they were all addressed to me, but with slightly different addresses. This company, we'll call them the Tree Killing Catalog Company, was wasting considerable time and resources sending me the same piece of mail. Their problem is that in their database they had slightly different addresses for me and didn't have a way to tell that I was the same person in all three.

This month we'll solve the problem of how to avoid sending mail to nearly identical addresses.

In this issue

 Tip of the Month - Efficiency in Grouping

 The Problem of Near Identical Addresses

 The Solution

 Conclusion

 Trap of the Month



The Problem of Near Identical Addresses

Frequently when users enter addresses into databases they enter the data inconsistently.

For instance the Tree Killing Catalog Company sent mail to me at the following addresses:

3 Stonebolt St Westford, MA
3 Stonebolt St. Westford, MA
3 Stonebolt Street Westford, MA

Notice that there is no period after the St in the first address, a period after St. in the second address, and in the third address the word Street is spelled out.

For us humans, we know that the three addresses above are the same. For computers, however, these are three different addresses. Regardless of what you may have been told, computers are truly stupid. They do exactly what they are told. Their power lies in the fact that they do it very fast and (more or less) reliably. A computer does a quick comparison of these three addresses, detects differences between them (no matter how slight), and treats them as three different and distinct addresses.

So when we tell a database to send mail to every different address in the database, it generates three separate mailing labels for the addresses above. This is what happened when the company sent me three identical catalogs. When you consider the mailing and materials costs, this type of problem can be costly. It also makes the organization sending the information look non-professional since it appears to be wasting money and hurting the environment.

Typically, if a single mailing is to go to each person and address that may have multiple identical records in a database, a "group by" query is used (create this type of query in the query design view by pressing the "∑" symbol on the menu bar) or by selecting unique values in a query (see this month's tip). This is the most common way to group identical records together and eliminate exact duplicates. But that approach won't work when there are near matches like the ones above.



The Solution

To solve this problem, we need to identify the addresses that are almost, but not quite the same, and then send a single mailing to each address. We do that by using the Left function (introduced in the December 2004 AccessWizard) to determine the duplicates, and then take the first item that occurs.

Let's start with the following table

This table, like many I've seen, contains repetitive data. This is often caused by importing data from different sources or poor control of data entry -- almost certainly the case with the Tree Killing Catalog Company.

Our goal is to send out only 5 mailings, one each to Sally Barnes, Rocco Connell, Sam Jones, Mary Lawrence, and Patricia Quigley

If we try the standard approach of either a "group by" or "selecting unique values" in a query, we get:

This query eliminated the exact duplicate record for Mary Lawrence, but we are still left with the near exact match of Sally Barnes (10 Vinton Dr and 10 Vinton Drive) and Rocco Connell with his variations on Street.

To solve this problem, we'll use a built-in Access function called Left to get the first 6 characters of the Street Address. We will then use the isolated string from the address field to select the first record in the database with that address.

There's no magic in the number 6, I just scanned the data and figured it would that it was sufficient to make a match, but it depends on your database. For example, in Maryland, street addresses typically have 4-6 digits in the address, so you'd need to choose a higher number.

Back to our left function -- it takes two arguments:

1. The string that contains the characters
2. The number of characters to return.

We construct it as follows in the query window:
AddressFirst6: Left ([Address], 6)

When we put this into our "group by" query we get:

Note that this is a "group by" or Totals query. Do you see the highlighted ∑ (in some Access versions it will be depressed) button at the top of the screen in the menu bar? That provides the "group by" - it's a way of telling Access that the records that are exactly the same should be grouped together. We've accepted the default of "group by" for all fields except our primary key field, Customer Key. For this field we've chosen First, which means that it will not group on that field, but rather select the first one it comes to for the rest of the grouped fields. This is an important issue since this is the lever that we'll use to return the full field info from the table.

When we run this we get:

We're now down to 5 records, which is what we're looking for, however we still don't have the full address. We do, however, have a result called "First of Customer key", our query above. We'll use this to get the unique records with the complete address.

Our plan will be to save this query as qryAddressFirst6, then use that query to join with our address table to get only the unique addresses.

In design view it looks like this:

Notice that we've joined the query (qryAddressFirst6) to our address table (tblAddress). We've joined on the Customer Key on both sides. In our qryAddressFirst6, it's the Customer Key that we came to in our group by query. In this type of join (called an inner join - See the October 2004 issue for more information on joins and relationships), only those records which are exactly alike in the joined fields are returned.

When we run this we get:

Here we have our 5 addresses with no duplicates that we can use to create our mailing list.



Conclusion

This exercise was fairly trivial since we were dealing with just a handful of records. However, this same technique will work just as well on a table with thousands of records. If the Tree Killing Catalog Company had used this technique of determining the truly unique records, they would have saved time, money and trees.



Trap of the Month

Do you ever run a query and see row after row of exactly the same data, even though you know that your source tables do not have duplicates?

Check your design view to make sure that you've actually linked your tables together. If you have more than one table in a query and you don't join them, you'll get repeats for every row in your results set. The total number of repeating records will be equal to the product of the number of rows in the unjoined tables. Solve this problem (called a Cartesian query) by joining the appropriate fields in your query design window.

Tip of the Month - Efficiency in Grouping

In this issue we've used a technique called "group by". Although it's very easy to use (you just press a button in the design view of a query) it's also overkill if you just need to get to the unique records in a database. As an alternative, you can use the "unique values" option of a query. To use this take the following steps:

  • Open any query in design view
  • Right click in the grey area in the top half of the grid
  • Choose properties
  • Change the Select Unique Values option to yes.

Selecting "Unique Values" will make sure that you don't get multiple rows of the same result, without needing to resort to the power and the processing needed for a group by query.

 

Quick Links...

Custom Software Home

Access Wizard Archives

Our Services




Join our mailing list!

email: jim@custom-software.biz

phone: 978-392-3462

web: http://www.custom-software.biz

 

Forward email

This email was sent to jim@custom-software.biz, by jim@custom-software.biz

Update Profile/Email Address | Instant removal with SafeUnsubscribe™ | Privacy Policy.

Powered by


Custom Software | - | Westford | MA | 01886