Home > BI > Database Design and Modeling I

Database Design and Modeling I

Throughout my experience I come across so many database designing problems. And today I want to share some of the common influences and impacts of most common designing mistakes that bring big disaster to the whole project.

One of the most common designing mistakes is the “Poor Planning / Architecture” of the database or mart.

CDDM1

Let me start with an example. Suppose you brought a piece of land for construction of house today and today itself you hired a contractor. Now tell me, do you want the contractor to start putting foundations from the same day or the next day itself. Do you want him to build your house without any sketch or plans or Blue Prints? Or even without knowing that the land you got is not going to sink or if it is legal to build house on that land. Or even without thinking how much should be the thickness of the wall so as to prevent any unwanted foreign activities. I guess you would say “NO” but in case you have YES in your mind then I don’t think reading further will help you anymore.




CDDM2A proper design and planning is needed in order to make sure that at the end of the day, your dream house get built without any hassles or obstacles.

The same way a proper planning and designing is required for a robust database to be built. For any business to become success and sustain in the profitable margin, databases plays a vital role, it’s the pillar stone for any project or business. It has to be under proper care and attention and not to be like to get it done in one-night.

If proper care was not taken at the beginning stage for planning and mapping of all the resources, project requirements, etc.; it will become worse to bring any changes at the later stages resulting in more instability down the line. Moreover chances are there that it may divert the project to the wrong direction. It’s more like hunting in dark night expecting dears to be everywhere.

In most cases, just for the sake of getting it done, the databases got deployed and by the time it is realized that it would have done under proper techniques and guidance, no more time is there to go back and fix it.

Mapping is a big problem in many cases for complex applications. Hence most of the designers prefer to put all the columns into one table and not preferring normalization techniques as a good tool for design. But a good design should cover all the databases functionalities and qualities like:

  • Performance with respect of ETL operation (loading); and especially for query extraction (reporting).
  • Database Extensibility to cover future needs.CDDM3
  • Perfect balance of normalized (we may lose more space i.e. hard drive cost) and de-normalized (may need more processing i.e. CPU and RAM cost) tables to feed every requirement of the project.
  • Proper testing analysis for both known and unknown data set.

One of the common situations in designing is the proper implementation of 1-to-Many relationship that we usually do to avoid repeated fields. But in real world scenario, how much this is

CDDM4 significant. Let me give one example on this.  In your customer database, some customers may have more than one phone numbers. But the question is; do all the systems or applications really need to track all the phone numbers of a particular customer.

Let’s diagnose the problem little deeply. How many customers do have more than one phone numbers? May be 1 in 100th or 2 in 1000th.  That means on an average of only 0.5 percent of customers have more than one phone numbers and those also at most 2 numbers for each customer. Now if we provide PhoneNo1 and PhoneNo2 as two fields in our database to meet the requirement. Then is it good to waste the rest 99.5 percent as NULL values.

CDDM5

Well after little diagnostics, let’s decide to take one child table to keep information about phone numbers, resulting in better space management; like the one that shown below:

CDDM6

But the real question here is that “Is it required to have more than one phone numbers for a customer?” If the answer is YES, then we have to go for child table; but if it is NO then it is always good to avoid such complex design process and the effort needed to achieve this. Instead we can use the same effort in other productiveness. Moreover, in above case, the real issue is not about the wasted space but rather of wasted effort. What I mean to say is that while designing, main focus should be there on business critical measures and requirements.




Another question is about the sustainability and stability of the database i.e. is the above design good enough to answer the growth and expansion of the business or not. In order to have a foresight of your designing process to keep it sustainable over long period, a little analysis about the client business and the market behavior will be much helpful. Business requirements and hence the data structure may change from time to time, may be sooner or later. The model that is designed today may support tomorrow or maybe not. If we know that the data structure is changing eventually, then it’s always better to wait till the requirement becomes stable; instead of rebuilding the design which may or may not be nearer to the actual requirements.

Well for the above problem, if we think little out of box, then we can see that although we have taken child table to keep information about the phone numbers. But it can be optimized further to a better design.

Recall once again the requirement, what it says, a customer may have more than one phone numbers but at the same time, it’s very true that a customer will have at least one phone number (not having a phone number is very rare case). Hence, if I keep first phone number (customers with only single phone number) inside the main table and will move the second one (customers with double phone numbers) to the child table, it not only solves the space problem as well as increases the efficiency and performance of the extraction queries. As shown below:

CDDM7

All phone numbers has its particular pattern and it will never match with the CustID.  As we already concluded that 0.5% of customers have more than one phone numbers, then for the rest of the customers, we can directly fetch their phone numbers from the main table itself. Only for 0.5% customers, we need to map with the child table to get their phone numbers. And it can be easily done by conditional statements (i.e. case statement) for phone numbers where it ties to the Customer IDs.

Although, it’s true that it’s practically impossible to get around all the requirements and problems; and plan accordingly in the design phase. Possibilities are there that some unknown issues may arise at the later stage of deployment itself but it is important to militate against potential problems as much as possible, by careful planning.

Register yourself or Subscribe to get the updates for better designing practices.

Categories: BI
  1. Maria
    August 3rd, 2010 at 09:11 | #1

    Thanks, I really enjoyed this post. It’s interesting to see different approaches to tackle a database problem.
    waiting for your upcoming posts on database design enhancements.

  2. Bob Themlon
    August 3rd, 2010 at 11:15 | #2

    Wow! Very impressive, I really like the insight. Is there any tools available to track the client requirements for change request analysis.
    Thanks

  3. August 3rd, 2010 at 11:34 | #3

    Arun,

    Nice article. I suggest that you should post it to the SQL TechNet Wiki.

    http://social.technet.microsoft.com/wiki/contents/articles/sql-server-overview.aspx

    As this wiki grows, you will be able to contribute your articles, all the while gaining a ‘reputation’. And since the wiki is a Microsoft website, it will automatically have a bit of credibility…

    -Arnie

  4. Sauravh
    August 3rd, 2010 at 13:59 | #4

    Thanks for the great quick tip!

    Please explain more about responsibilities behind the scene.

    Anyway, very useful tip! Thanks!

  5. Celwin
    August 3rd, 2010 at 14:20 | #5

    This is amazing! Nicely written and very well illustrated !

  6. federal grants
    August 6th, 2010 at 02:12 | #6

    this post is very usefull thx!

  7. Lisas agents
    August 8th, 2010 at 21:02 | #7

    Table and database design should be logical enough to minimise the need for many tables and make use of functions like JOIN.

  8. DH
    August 11th, 2010 at 04:01 | #8

    Great information! I’ve been looking for something like this for a while now. Thanks!

  9. pradeep
    December 14th, 2010 at 10:41 | #9

    it helped me to understand the need of child table now. Thanx a lot !

  10. sandeep boora
    January 10th, 2011 at 10:16 | #10

    Thanks a lot for this beautiful article that covered the necessity of a child table in real time.

  11. Kakembo
    June 5th, 2011 at 21:55 | #11

    Thanks for the article, i also remember when it usually killed me to know what to do as a beginner. As a beginner, someone should keep a lot of thing in mind and relate real life examples and do real life examples as well. Test every Design by asking yourself a lot of question in planning before proceeding to the Database design phase.
    Waiting for the Many to many relationships.
    Thanks again.

  12. THeron
    January 23rd, 2013 at 19:26 | #12

    The point of a Normalized database is that data is not repeated. A de-normalized database repeats data for the purpose of speed of queries. The line:
    “Perfect balance of normalized (we may lose more space i.e. hard drive cost) and de-normalized (may need more processing i.e. CPU and RAM cost) tables to feed every requirement of the project.”
    makes this seem opposite.
    Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency.

  13. RT
    July 27th, 2013 at 02:21 | #13

    To me database designing is more of an art than science. If we are using the same column for storing the phone number and logical foreign key, then first of all we are breaking referential integrity and for every read operation, as mentioned in the article, we need to make extra check to determine if customer has one phone number or more than one. Also, readability of the database will not be really clear.

    Thoughts?

  1. No trackbacks yet.
You must be logged in to post a comment.