| Script to analyze all entries in a table and suggest optimal column data types. [message #1494] |
Wed, 27 June 2007 16:27  |
RichardBronosky Messages: 20 Registered: June 2007 |
Junior Member |
|
|
I've been working on a bash script that does this. I thought it was going to be really short and simple, but it is turning out to be quite complex. Before I continue with it, I want to know if such a thing already exists.
Basically I do a lot of queries to test each column like:
"select 'false.', \`$field\` from $TABLE where lpad(cast(0+\`$field\` as binary), length(\`$field\`), '0') != \`$field\` limit 1;"
Which tests to see if the data is an INT ZEROFILL.
Is there already a solution for this? If not, would anyone else be interested in helping to develop this? I intended to create a source forge project for it once I got a working version. But, I'm not finding the time to complete it.
.+# Richard Bronosky #+.
MySQL DBA, MythTV User
|
|
|
| Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1504 is a reply to message #1494 ] |
Fri, 29 June 2007 13:44   |
RichardBronosky Messages: 20 Registered: June 2007 |
Junior Member |
|
|
| Whould you be interested in the tool described here?[ 0 vote(s) ] |
| 1. | Yes, sounds cool. I'd like to help. | 0 / 0% | | 2. | Yes, sounds cool. I'd like to try it when you get it working. | 0 / 0% | | 3. | No, I do think it would work. | 0 / 0% | | 4. | No, your are a moron. Can't you just do a select * and do that in your head while 2 million rows of a 500 column wide table scroll by?  | 0 / 0% | | 5. | No, there is already a good tool for doing this. (please leave a link) | 0 / 0% | | 6. | Not unless you write it in lanuage... (leave a comment) | 0 / 0% |
Okay, since no one is interested in helping me with it.... Is anyone interested in using it when it is done?
Here is the basic outline. For each column in the table, analyze the values:
- Decide if the column is currently BINARY.
- Get the length of the longest value.
- If the values are numeric, then...
- Decide whether they require zerofill.
- Decide if they are unsigned.
- If they are all integers, decide what size is the best fit. Else...
- If they are [fixed point] DECIMAL, decide their significant digits and number of decimal places. Else...
- Decide if DOUBLE or single precision FLOAT is required.
- (optionally) Decide the significant digits and number of decimal places that best fit.
- If the values are temporal
- Decide which is the best fit. (YEAR, DATE, TIME, TIMESTAMP, or DATETIME in that order of precedence)
- If an ENUM would be appropriate, suggest it.
- If a CHAR would be most efficient, suggest it (BINARY is needed).
- If the length you came up with first is < 255 (or 65535 post-MySQL 5.0.3) use VARCHAR (VARBINARY is needed). Else...
- Find the TEXT/BLOB size that best fits.
What's missing:
- BIT type (I don't use it)
- SET type (I don't want to code it)
- Spatial Types (I hope I never need it)
I hope that draws some interest. I'll take your continued silence to mean that I'm an idiot and the only person in the community who would like a tool for doing this.
.+# Richard Bronosky #+.
MySQL DBA, MythTV User
|
|
|
| Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1505 is a reply to message #1494 ] |
Sun, 01 July 2007 14:03   |
Speeple Messages: 91 Registered: August 2006 |
Member |
|
|
The path I would take here is parsing the table schema, then contrasting the data type definitions with the actual data in the table.
Martin Gallagher | Speeple: The latest news
|
|
|
| Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1507 is a reply to message #1505 ] |
Sun, 01 July 2007 19:54   |
RichardBronosky Messages: 20 Registered: June 2007 |
Junior Member |
|
|
This is a good idea. It would be worth adding.
My original purpose for the script was to serve my own need. I'm currently working for a company whose idea of data storage was delimited text files or excel files. I've loaded dozens of tables into MySQL from comma/tab delimited files. In the interest of speed, I generally create the tables with all varchar fields. These tables are currently being used by individuals in the company. When it is decided that an application is going to be built against data in one of these tables, I either optimize the table or I pull the needed columns out of many tables and build one streamlined table.
I'd like to be able to take all these nasty tables that I have (and continue to create) and clean them up. So, as it stands, i don't much care about the current schema. I know what it is, and I know it is bad.
.+# Richard Bronosky #+.
MySQL DBA, MythTV User
|
|
|
|
| Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1556 is a reply to message #1539 ] |
Wed, 25 July 2007 11:30  |
RichardBronosky Messages: 20 Registered: June 2007 |
Junior Member |
|
|
Unfortunately that almost always just suggests that I make everything an ENUM. I think that should be a REALLY powerful tool, but turns out to be basically useless.
.+# Richard Bronosky #+.
MySQL DBA, MythTV User
|
|
|