Normalization 103 – The
Importance of Eliminating Redundancy or the Case of the Clever
Camper
When I was in my early twenties, I worked at a summer camp.
Billy, an 11-year-old camper, came up to me one day and said,
"Hey Jim, if Pete and repeat were in a canoe, and Pete fell
out, who would be left."
I pondered a moment and said "repeat?"
He said, "If Pete and repeat were in a canoe, and Pete fell
out, who would be left."
I said, more confidently this time "repeat!"
He again said, "If Pete and repeat were in a canoe, and
Pete fell out, who would be left."
Finally the light shone through, I figured out his little
trick. Billy's joke is the perfect illustration of this
month's topic, redundancy. Redundancy in a database occurs
when the same data appears in the same column of a table again
and again and again, just like the question Billy continued to
ask back at summer camp.
We've touched on this topic in the past. In this issue,
we'll dive deeper and take a look at why redundancy is so
common,, the inherent problems it causes, and how to fix them.
The Problem |
 |
Staying with the theme, let's analyze a simple
example from summer camp. Take a look at the following
table, designed to keep track of camper-cabin
assignments:
The table contains five fields: a primary key, name
of the cabin, first and last name of the camper, and the
camper's date of birth.
If you were to run a query to show you all the
campers in the "Cherokee" cabin, how many campers would
it show?
Did you guess 5? If so, you'd be wrong. That query
would only return 4 records. Upon closer examination of
record 3, you see that the cabin name "Cheroke" is
misspelled, ending with a single "e" rather than a
double "ee".
It's not hard to imagine a problem like this cropping
up in table with repeating values in the same field.
This is usually due to human error while inputting the
data.
At issue here is the fact that the same value in the
CabinName field appears over and over in the table,
which is known as "redundancy". As we've discussed
previously in this forum, any given value for a field
should appear once, and only once, in the database. This
is an important part of a concept known as
"normalization", which is one of the major keys to a
well-designed database.
|
The Solution |
 |
The best way to solve the problem of redundancy is to
eliminate it during the design of the database, when the
tables and relationships are established. The key is to
create the tables so that a field value that relates to
multiple records (e.g. the name of a cabin, which is a
one-to-many relationship with the campers) appears only
once in the entire database.
In our case we would lay out the tables as follows:
Notice that we've established separate tables for
cabins and campers, and linked them in a one-to-many
relationship. The value for the field CabinName exists
once and only once in the database. We assign campers to
cabins by entering the primary key for the assigned
cabin into the record for each camper in the Camper
table.
These tables are populated as follows:
The Cabin Key in the Camper table is called a foreign
key. It's a pointer to the Primary key (see June
2004 for details on Primary Keys) in the Cabin
table.
|
Advantages of Eliminating
Redundancy |
 |
Eliminating redundancy sounds wonderful, but there
are some downsides. The database is a bit more complex
and provides less immediate information when looking at
a single table. Instead of CabinName (in our first
example), the table without redundancy shows only a
number in the CabinKey field. You'd have to do a query
or open more than one table to get the complete picture.
So given the disadvantages, why should we eliminate
redundancy?
There are several important advantages:
A single change will ripple through the
database In a normalized database where
redundancy has been eliminated, the cabin name appears
only once in the database. If it happens to be
misspelled, it must to be changed in only one place. Any
future queries or reports will show the correct updated
value.
Increased data accuracy When users are
entering data into the system, it is more difficult for
them to make a typing or spelling mistake. This helps to
keep the bad data out. As the adage "Garbage In, Garbage
Out" tells us, if we can avoid the "garbage in", we'll
have better results when we extract information.
Reduced data entry In a well-designed
database where redundancy has been eliminated, the user
populating the database could choose from a list (most
likely using a combo box) rather than type in the full
name for a data element. This list would be generated
from the single table holding all valid values.
Faster processing for complex analysis If
you have to do grouping or mathematical analysis of the
data in a large database, you'll find that manipulating
numeric keys is considerably faster than manipulating
text.
Changes are easier Suppose the PC (that's
"politically correct") police stop by and declare that
naming cabins after American Indigenous People is
offensive. Assuming that you are of sound mind, you'll
tell them to get a life; however if you do decide to
change the cabin names i.e. naming the cabins after
trees (hopefully there are no arboreal rights people out
there who'll take offense) then you simply have to
change the values in a single table. If you had not
normalized your database, then every single instance of
a cabin name in your database would have to be hunted
down and fixed.
|
Conclusion |
 |
Once again, taking the time to set up your tables in
the beginning is a crucial step to building the best
database possible. If you're populating your tables and
you notice that the same value shows up again and again
in the same field, you've likely committed the sin of
redundancy. You can make amends by breaking your data
into separate tables and establishing appropriate
relationships between them, using foreign keys to link
tables
|
Trap of the Month |
 |
Do you ever find that the rows in your tables have
mysteriously grown taller or shorter? This happens when
you've accidentally changed the row height - most likely
through inadvertent mousing (something our parents would
never have done).
To fix this problem:
- Open the table
- Move your mouse to the left hand edge (on top of
the grey boxes) and hover between two rows. You're
cursor will change to a double arrow with a black line
in the middle.
- Hold down the left mouse key and drag up or down
to make the rows taller or shorter.
You may also use the Format menu in the Table View.
Choose Row Height and enter a number that gives your
desired row height (12.75 is the standard for all the
conformists out there.)
|
|
Tip of the Month - Recovering from
Existing Problems |
|
If you have fallen into the trap of redundant data
and notice a spelling error that occurs again and again,
you're not stuck with manually changing every occurrence
of the error. You can use an update query to make
massive changes all at once. Search the help system (F1)
for details under the topic "update query".
|
|