Home » Performance » MySQL » To normalise or not to normalise?
To normalise or not to normalise? [message #733] Fri, 16 February 2007 09:38 Go to next message
al.james  is currently offline al.james
Messages: 16
Registered: October 2006
Junior Member
I am designing a quite simple web app, but I am trying to design it so that it will scale well.

Basically I have an item table and a user table. A user may have many items and each item has exactly one user.

Every time an item is retrieved the corrisponding user must also be found. Currently, when listing items, the users username is retrieved using an inner join on the user_id field that is in both tables.

Reading some stuff on high-performance mysql and website design, it suggests de-normalization is something you should consider.

So I could de-normalise this schema to insert a 'cached copy' of the user's username in the item table.

This will save me a join in the most common query in the application. This is good.

However, I will face the issue of keeping the item.username field upto date. So, if a user changes their username, all their items will need to be updated. Given mysql's table locking rules (and the fact that I want the app to scale well in a replicated setup), is this a good idea?

[Updated on: Fri, 16 February 2007 09:38]

Re: To normalise or not to normalise? [message #734 is a reply to message #733 ] Fri, 16 February 2007 09:40 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
I do not think you need denormalization in this case.

Usually you can do separate lookup for the user which later can be cached with good hit ratio.

denormalization helps in other cases.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: To normalise or not to normalise? [message #736 is a reply to message #733 ] Fri, 16 February 2007 09:47 Go to previous messageGo to next message
al.james  is currently offline al.james
Messages: 16
Registered: October 2006
Junior Member
Interesting idea. So istead of an inner join, might I be better to pull out the user_id's from the selected items and do a second query using the IN operator on the list of user_id's?

Although, I guess a list of user_id's and the IN operator might not generate cache hits so much?

[Updated on: Fri, 16 February 2007 09:48]

Re: To normalise or not to normalise? [message #737 is a reply to message #736 ] Fri, 16 February 2007 09:57 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Well. This sounds like a bit more complicated query than I would expect.

Basically I'd suggest to try both layounts with simple test and see. It is the case than trying is better than theoritizing :)


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: To normalise or not to normalise? [message #738 is a reply to message #733 ] Fri, 16 February 2007 11:05 Go to previous messageGo to next message
al.james  is currently offline al.james
Messages: 16
Registered: October 2006
Junior Member
Good point...

I have quickly written a benchmark script in PHP and thought I would post the results.

3 methods:

1) De-normalised: SELECT SQL_NO_CACHE * FROM item ORDER BY score DESC LIMIT <LIMIT>;

2) Normalised: SELECT SQL_NO_CACHE * FROM item INNER JOIN user ORDER BY score DESC LIMIT <LIMIT>;

3) Split: SELECT SQL_NO_CACHE * FROM item ORDER BY score DESC LIMIT <LIMIT>; SELECT username FROM user WHERE user_id IN (<USER ID LIST TAKEN FROM PREVIOUS>).

Results are against varying LIMIT value;


Limit:	20	40	60	80	100	120	140
1	0.08	0.13	0.19	0.25	0.29	0.34	0.40	
2	0.12	0.23	0.33	0.42	0.52	0.62	0.73
3	0.15	0.27	0.37	0.48	0.6	0.71	0.81



So, de-normalised is a bit faster, but its not really going to make THAT much difference. Method 3 is slower than 2, suggesting, once again, its best to throw it all at mysql and let it worry about optimisation...

Database setup: There are 100,000 items, 10,000 users. All records randomly generated with evenly distributed random values. Times are in seconds to run the query 100 times. Query cache disabled (I hope).

[Updated on: Fri, 16 February 2007 11:09]

Re: To normalise or not to normalise? [message #739 is a reply to message #738 ] Fri, 16 February 2007 11:20 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Thanks for sharing results.

Your benchmarks however oversimplify things. Normally in Web application login step is separated - so you read and check user access before you start doing other things, this means you should already have user_id available before you start the query.

But anyway in your application it may be different.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: To normalise or not to normalise? [message #740 is a reply to message #739 ] Fri, 16 February 2007 11:32 Go to previous message
al.james  is currently offline al.james
Messages: 16
Registered: October 2006
Junior Member
No, its not quite like that. I am listing the top X items by ALL users of the service. Not just the current user. So if limit=20 I am picking the top 20 (as ordered by the score field above) items which could result in 20 distinct user records.

Sorry for not making myself clear earlier.
Previous Topic:Some tablespace understanding
Next Topic:InnoDB backup - using per-table tablespaces
Goto Forum:
  



Current Time: Fri Dec 5 11:20:24 EST 2008

Total time taken to generate the page: 0.03920 seconds
.:: Contact :: Home :: MySQL Support by Percona.com ::.

Powered by: FUDforum 2.7.5.
Copyright ©2001-2006 FUD Forum Bulletin Board Software

MySQL Performance | Forum authority Badge