| How to import large datasets [message #433] |
Thu, 30 November 2006 16:39  |
nappin Messages: 8 Registered: November 2006 |
Junior Member |
|
|
Hi,
I'm very new to mysql performance and have some questions regarding a project I'm working on.
I guess before I say anything, the project is an e-commerce store.
I have two tables right now I'm concerned about:
1. Product Information - 1 record per product, contains name of product, description, etc.
There will be roughly 3million records in here (myisam at the moment)
2. Inventory Table - quantity information for current products at different warehouses (innodb)
There will be rougly 1.5million records in here
I receive a ~1gb flat file (~3.5m records) from my distributor every 3 days which contains all the product information and another flatfile ~200mb from my distributor every day that contains the entire inventory (~1.5 records) at the distributor's warehouses.
These flat files aren't merely updates though, they contain all the data over again, just with modifications or new records. So I'm struggling with the best way to import this data quickly, and so the import doesnt affect the front end shoppers...
I first tried to tackle this problem by just writing scripts that performed LOAD DATA INFILE REPLACE... . It took about 22 minutes to import 3.3million records into my products table, but my table was locked the entire time. I tried adding the CONCURRENT option but it still was locked from read queries (I was researching and there was a bug report saying CONCURRENT was broken in 5.0.19-5.0.30?). I guess I'll just make sure the server I put this on doesnt have that bug, or should I be doing this whole thing a different way?
As for the inventory table, I also need to do something similar, but as far as I understand because this table is innodb, the CONCURRENT flag won't help and the entire table will be read locked while I run the LOAD DATA command. What's the best way to replace/add records to the inventory table (innodb) without locking the entire table while I'm importing it? Do I just do it in a batch sql file? Should I do it in small bursts? Are there any tricks to doing these mass inserts or updates?
thank you guys so much your help. Hopefully the above is not too vague.
Cheers,
Ray
|
|
|
|
|
|
| Re: How to import large datasets [message #438 is a reply to message #437 ] |
Fri, 01 December 2006 08:52   |
Peter Messages: 405 Registered: August 2006 |
Senior Member Super Guru |
|
|
No need to run Optimize on it but you need to rebuild indexes,
Also make sure table is closed when you run mysampack, so it should be
FLUSH TABLES tbl;
myisampack tbl.MYI
myisamchk -rq tbl.MYI
RENAME TABLE ...
Assuming no one else may touch it in the process.
Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
|
|
|
|
| Re: How to import large datasets [message #440 is a reply to message #439 ] |
Sun, 03 December 2006 05:57  |
Peter Messages: 405 Registered: August 2006 |
Senior Member Super Guru |
|
|
Right,
Different types have different block sizes.
If LOAD INDEX INTO CACHE does not work you can use set of queries which scan indexes instead, which is not going to be much slower for sorted indexes.
You can do it for example by using select sum(col) from tbl where col!=const;
Such query typically would do full index scan.
Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
|
|
|