|
| Re: Fastest Way to Remove Duplicates [message #1350 is a reply to message #1342 ] |
Fri, 01 June 2007 05:32   |
Speeple Messages: 91 Registered: August 2006 |
Member |
|
|
Here's what I'd do:
Create a new column, Sentence_checksum BINARY(16), with a regular INDEX.
A binary MD5 checksum, 16 bytes in length rather than HEX alternative of 32... You could always put an index on the Sentence column, but to it would have to be full column length to correct match dupes - meaning a large index.
Then I would perform:
UPDATE table SET Sentence_checksum=UNHEX(MD5(Sentence))
Then I would create a table exactly like the old table, but this time Sentence_checksum would be altered to a UNIQUE index type.
I would then perform:
INSERT INTO table_new SELECT * FROM table_old GROUP BY Sentence_checksum
Then DROP the old table, and rename the new one.
Martin Gallagher | Speeple: The latest news
|
|
|
| Re: Fastest Way to Remove Duplicates [message #1354 is a reply to message #1342 ] |
Fri, 01 June 2007 18:53  |
GeoffreyF67 Messages: 19 Registered: May 2007 |
Junior Member |
|
|
That's a great idea!
Thanks!
G-Man
|
|
|