Home » Performance » MySQL » How to import large datasets
How to import large datasets [message #433] Thu, 30 November 2006 16:39 Go to next message
nappin  is currently offline nappin
Messages: 8
Registered: November 2006
Junior Member
Hi,
I'm very new to mysql performance and have some questions regarding a project I'm working on.

I guess before I say anything, the project is an e-commerce store.

I have two tables right now I'm concerned about:
1. Product Information - 1 record per product, contains name of product, description, etc.
There will be roughly 3million records in here (myisam at the moment)

2. Inventory Table - quantity information for current products at different warehouses (innodb)
There will be rougly 1.5million records in here


I receive a ~1gb flat file (~3.5m records) from my distributor every 3 days which contains all the product information and another flatfile ~200mb from my distributor every day that contains the entire inventory (~1.5 records) at the distributor's warehouses.

These flat files aren't merely updates though, they contain all the data over again, just with modifications or new records. So I'm struggling with the best way to import this data quickly, and so the import doesnt affect the front end shoppers...

I first tried to tackle this problem by just writing scripts that performed LOAD DATA INFILE REPLACE... . It took about 22 minutes to import 3.3million records into my products table, but my table was locked the entire time. I tried adding the CONCURRENT option but it still was locked from read queries (I was researching and there was a bug report saying CONCURRENT was broken in 5.0.19-5.0.30?). I guess I'll just make sure the server I put this on doesnt have that bug, or should I be doing this whole thing a different way?


As for the inventory table, I also need to do something similar, but as far as I understand because this table is innodb, the CONCURRENT flag won't help and the entire table will be read locked while I run the LOAD DATA command. What's the best way to replace/add records to the inventory table (innodb) without locking the entire table while I'm importing it? Do I just do it in a batch sql file? Should I do it in small bursts? Are there any tricks to doing these mass inserts or updates?

thank you guys so much your help. Hopefully the above is not too vague.

Cheers,
Ray

Re: How to import large datasets [message #434 is a reply to message #433 ] Thu, 30 November 2006 17:30 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Hi,

You're saying it is new data all over again ?
Why do not you load it to the table_new and then rename this table.

That should be most efficient way


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: How to import large datasets [message #436 is a reply to message #433 ] Thu, 30 November 2006 20:41 Go to previous messageGo to next message
nappin  is currently offline nappin
Messages: 8
Registered: November 2006
Junior Member
Ok I like that idea. thank you so much for your help. A few other questions, how do I drop the old table and rename the shadow table to the old one's name in one command?

Also, I was just reading about myisampack, would it make sense to run this on the shadow table before I rename it to increase performance since this table will never be edited. If so, do I need to optimize before or after packing it or does that not make a difference?

Thanks
Ray
Re: How to import large datasets [message #437 is a reply to message #436 ] Thu, 30 November 2006 21:22 Go to previous messageGo to next message
nappin  is currently offline nappin
Messages: 8
Registered: November 2006
Junior Member
Oh ignore my first question, I didn't realize that rename was already atomic.
I'm still a little foggy on myisampack though. I'd like to know if it makes sense to run that on my table before renaming it to increase the read speed. And also if I need to run optimize at all either.

Thanks!
Re: How to import large datasets [message #438 is a reply to message #437 ] Fri, 01 December 2006 08:52 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
No need to run Optimize on it but you need to rebuild indexes,
Also make sure table is closed when you run mysampack, so it should be

FLUSH TABLES tbl;
myisampack tbl.MYI
myisamchk -rq tbl.MYI
RENAME TABLE ...

Assuming no one else may touch it in the process.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: How to import large datasets [message #439 is a reply to message #433 ] Sat, 02 December 2006 23:49 Go to previous messageGo to next message
nappin  is currently offline nappin
Messages: 8
Registered: November 2006
Junior Member
Hey Peter, thanks again, finally wrote some scripts to do the importing and it is running very fast!
I noticed in one of your articles you mentioned the LOAD INDEX INTO CACHE command. I'd like to use it on my shadow table right before renaming but unfortunately the indexes are different block sizes. I'm a little unclear on what block size is defined by. Is it a combination of col type and size? If so, would it worth increasing the size or changing the type for one of my index columns to match the other in order to take advantage of this command? The two indexes on the table are a varchar(13) (has a blocksize of 4096) and a tinyint(1) (blocksize 1024).

Thanks again for all your help,
Ray
Re: How to import large datasets [message #440 is a reply to message #439 ] Sun, 03 December 2006 05:57 Go to previous message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Right,

Different types have different block sizes.

If LOAD INDEX INTO CACHE does not work you can use set of queries which scan indexes instead, which is not going to be much slower for sorted indexes.

You can do it for example by using select sum(col) from tbl where col!=const;

Such query typically would do full index scan.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Previous Topic:Innodb 'Free buffers' going to 0 and system crashing
Next Topic:MySQL Manager
Goto Forum:
  



Current Time: Wed Jan 7 16:17:40 EST 2009

Total time taken to generate the page: 0.03655 seconds
.:: Contact :: Home :: MySQL Support by Percona.com ::.

Powered by: FUDforum 2.7.5.
Copyright ©2001-2006 FUD Forum Bulletin Board Software

MySQL Performance | Forum authority Badge