Custom Software
The Access Wizard Newsletter Tips Tricks and Traps for Access Users and Developers
Febuary 2005

Normalization 103 – The Importance of Eliminating Redundancy or the Case of the Clever Camper

When I was in my early twenties, I worked at a summer camp. Billy, an 11-year-old camper, came up to me one day and said, "Hey Jim, if Pete and repeat were in a canoe, and Pete fell out, who would be left."

I pondered a moment and said "repeat?"

He said, "If Pete and repeat were in a canoe, and Pete fell out, who would be left."

I said, more confidently this time "repeat!"

He again said, "If Pete and repeat were in a canoe, and Pete fell out, who would be left."

Finally the light shone through, I figured out his little trick. Billy's joke is the perfect illustration of this month's topic, redundancy. Redundancy in a database occurs when the same data appears in the same column of a table again and again and again, just like the question Billy continued to ask back at summer camp.

We've touched on this topic in the past. In this issue, we'll dive deeper and take a look at why redundancy is so common,, the inherent problems it causes, and how to fix them.

In this issue
  • Tip of the Month - Recovering from Existing Problems
  • The Problem
  • The Solution
  • Advantages of Eliminating Redundancy
  • Conclusion
  • Trap of the Month

  • The Problem

    Staying with the theme, let's analyze a simple example from summer camp. Take a look at the following table, designed to keep track of camper-cabin assignments:

    The table contains five fields: a primary key, name of the cabin, first and last name of the camper, and the camper's date of birth.

    If you were to run a query to show you all the campers in the "Cherokee" cabin, how many campers would it show?

    Did you guess 5? If so, you'd be wrong. That query would only return 4 records. Upon closer examination of record 3, you see that the cabin name "Cheroke" is misspelled, ending with a single "e" rather than a double "ee".

    It's not hard to imagine a problem like this cropping up in table with repeating values in the same field. This is usually due to human error while inputting the data.

    At issue here is the fact that the same value in the CabinName field appears over and over in the table, which is known as "redundancy". As we've discussed previously in this forum, any given value for a field should appear once, and only once, in the database. This is an important part of a concept known as "normalization", which is one of the major keys to a well-designed database.


    The Solution

    The best way to solve the problem of redundancy is to eliminate it during the design of the database, when the tables and relationships are established. The key is to create the tables so that a field value that relates to multiple records (e.g. the name of a cabin, which is a one-to-many relationship with the campers) appears only once in the entire database.

    In our case we would lay out the tables as follows:

    Notice that we've established separate tables for cabins and campers, and linked them in a one-to-many relationship. The value for the field CabinName exists once and only once in the database. We assign campers to cabins by entering the primary key for the assigned cabin into the record for each camper in the Camper table.

    These tables are populated as follows:

    The Cabin Key in the Camper table is called a foreign key. It's a pointer to the Primary key (see June 2004 for details on Primary Keys) in the Cabin table.


    Advantages of Eliminating Redundancy

    Eliminating redundancy sounds wonderful, but there are some downsides. The database is a bit more complex and provides less immediate information when looking at a single table. Instead of CabinName (in our first example), the table without redundancy shows only a number in the CabinKey field. You'd have to do a query or open more than one table to get the complete picture. So given the disadvantages, why should we eliminate redundancy?

    There are several important advantages:

    A single change will ripple through the database
    In a normalized database where redundancy has been eliminated, the cabin name appears only once in the database. If it happens to be misspelled, it must to be changed in only one place. Any future queries or reports will show the correct updated value.

    Increased data accuracy
    When users are entering data into the system, it is more difficult for them to make a typing or spelling mistake. This helps to keep the bad data out. As the adage "Garbage In, Garbage Out" tells us, if we can avoid the "garbage in", we'll have better results when we extract information.

    Reduced data entry
    In a well-designed database where redundancy has been eliminated, the user populating the database could choose from a list (most likely using a combo box) rather than type in the full name for a data element. This list would be generated from the single table holding all valid values.

    Faster processing for complex analysis
    If you have to do grouping or mathematical analysis of the data in a large database, you'll find that manipulating numeric keys is considerably faster than manipulating text.

    Changes are easier
    Suppose the PC (that's "politically correct") police stop by and declare that naming cabins after American Indigenous People is offensive. Assuming that you are of sound mind, you'll tell them to get a life; however if you do decide to change the cabin names i.e. naming the cabins after trees (hopefully there are no arboreal rights people out there who'll take offense) then you simply have to change the values in a single table. If you had not normalized your database, then every single instance of a cabin name in your database would have to be hunted down and fixed.


    Conclusion

    Once again, taking the time to set up your tables in the beginning is a crucial step to building the best database possible. If you're populating your tables and you notice that the same value shows up again and again in the same field, you've likely committed the sin of redundancy. You can make amends by breaking your data into separate tables and establishing appropriate relationships between them, using foreign keys to link tables


    Trap of the Month

    Do you ever find that the rows in your tables have mysteriously grown taller or shorter? This happens when you've accidentally changed the row height - most likely through inadvertent mousing (something our parents would never have done).

    To fix this problem:

    • Open the table
    • Move your mouse to the left hand edge (on top of the grey boxes) and hover between two rows. You're cursor will change to a double arrow with a black line in the middle.
    • Hold down the left mouse key and drag up or down to make the rows taller or shorter.

    You may also use the Format menu in the Table View. Choose Row Height and enter a number that gives your desired row height (12.75 is the standard for all the conformists out there.)


    Tip of the Month - Recovering from Existing Problems

    If you have fallen into the trap of redundant data and notice a spelling error that occurs again and again, you're not stuck with manually changing every occurrence of the error. You can use an update query to make massive changes all at once. Search the help system (F1) for details under the topic "update query".

    Quick Links...

    Custom Software Home

    Access Wizard Archives

    Our Services



    Join our mailing list!

    Forward email

    This email was sent to jim@custom-software.biz, by jim@custom-software.biz
    Powered by

    Custom Software | - | Westford | MA | 01886