When and Why to normalise?
View Single Post
12-01-2007, 04:18 PM
Join Date: Apr 2005
Location: Kent, UK
You're misunderstanding what I meant, due in part to the example being a very simple one. Repetative data is easier to see in this example (clearly a VERY bad design that needs normalising)
id, name, address, age, employer_name, empl_address, emp_phone
2, fred, 1 low st, 34, jones and co, 10 main road, 123456
22, john, 10 high st, 49, jones and co, 10 main road, 123456
717,Jane, 12 high st, 21, smith ltd, 15 big avenue, 654321
123, Carole, 43 skimmingdish lane, 30, smithltd, 15 big avenue, 654321
98, james, 12 high st, 55, windrushers ltd, 1 skimmingdish lane, 321321
221, john, 19 high st, 29, jones and co, 10 main road, 123456
(assuming that one big company owns several little ones and has a centralised database, perhaps)
Here in the unnormalised table, we are repeatedly entering the same address and same phone number for some of the employers. Time after time.
We should normalise the table to have worker and employer as separate entities.
employer table (empID, employer_name, empl_address, emp_phone) PK= empId
worker table (workID, name, address, age, empID) PK = workID, FK = empID
Now, the attributes that are functionally dependant on the workID and those dependant on the empID (which wasn't included in the original table as there it wasn't necessary) are now grouped together.
Now we need only enter the empID next to the worker's data, avoiding the repetative entry of the employer's other data, with, as you will notice, the risk of a typo ocurring (it was a real error as well). With hundreds of workers, the scope for typos is huge, and a search on "smith ltd" would miss out some people.
The simple example used didn't have scope to show what was meant by repetative redundant data. I do accept that the link table was probably missed out to shorten the post. But I've always found that examples which are very simple can give a beginner the wrong impression.
The original example is actually an example of a look-up table, which is usually used on important searchable fields to avoid misspellings causing problems eg comedy and comedies and comeddy. (Look-up tables are not fully required by normalisation, they are a ocnvenience.) But the link table is necessary as a film can be in several different categories, which wasn't immediately obvious (matrix, action) and (matrix, sci-fi) for example. And thus normalisation was required when it was added.
(Addresses are a field that are often left unnormalised for convenience.)
Hope this shows things a bit more clearly. Functional dependancies are always the best way to normalise things, they are part of the early research into relationships.
The Following User Says Thank You to Dr John For This Useful Post:
View Public Profile
Send a private message to Dr John
Visit Dr John's homepage!
Find More Posts by Dr John