| performance with varchar vs tinytext [message #1318] |
Fri, 25 May 2007 10:28  |
myrddin Messages: 12 Registered: April 2007 |
Junior Member |
|
|
Hey guys, I have tried searching for difference between varchar(255) and tinytext in terms of performance but I haven't found a straight answer to this. Is it because there is no significant difference in performance? I know the difference in terms of spaces being trimmed and case sensitivity.
I have a table with about 200+ columns. Most of the small text types are tinytext rather than varchar. Is there a good reason to be using one type over the other in terms for performance? Would the order by be faster with varchar than tinytext when the table has a big row length? I read somewhere that varchar is stored right along with the row data on disk while tinytext is stored else where on disk and such(I'm not sure about that).
Any thoughts on this? Thanks.
|
|
|
| Re: performance with varchar vs tinytext [message #1321 is a reply to message #1318 ] |
Sat, 26 May 2007 06:20   |
Speeple Messages: 91 Registered: August 2006 |
Member |
|
|
The problem is with TMP tables those with TEXT/BLOB type columns will always be disk based, and not fast memory based, thus taking a performance hit.
In MySQL versions 5.0.3+ you can have CHAR types upto 65535 in length.
[Updated on: Sat, 26 May 2007 06:23] Martin Gallagher | Speeple: The latest news
|
|
|
| Re: performance with varchar vs tinytext [message #1334 is a reply to message #1321 ] |
Tue, 29 May 2007 09:36   |
myrddin Messages: 12 Registered: April 2007 |
Junior Member |
|
|
Thanks for the response. I have been messing around this a bit more. Have a column called "region" and it was tinytext. Also created an index for it region(100). But I couldn't get mysql to use the index for a simple order by query (even with force index):-
select * from table order by region
The explain for it always used filesort. Am i missing something here? Is there a way you can force mysql to use the index that I created for tinytext type?
So then I duplicated that column and made it a varchar(100) instead and created another index region_varchar(100) on that new column. and issued the query again:-
select * from table order by region_varchar
Again it did not use the index and resorted to filesort. But it did work when i specified force index. Then I modified the index to use only first 10 characters region_varchar(10). Now again I couldn't get mysql to use the index even with force index.
The only other way I could force it to use the index was to put in a where clause on that column.
So I'm missing something in the way mysql uses indexes and am not sure what that is...I'm kind of new to this stuff so if anyone could point out what I'm missing, its much appreciated.
Thanks.
[Updated on: Tue, 29 May 2007 09:49]
|
|
|
| Re: performance with varchar vs tinytext [message #1335 is a reply to message #1318 ] |
Tue, 29 May 2007 10:03   |
Speeple Messages: 91 Registered: August 2006 |
Member |
|
|
Sounds to me like a cardinality problem.
Basically meaning there's not enough variance in the index justifying MySQL to use it.
For example, you have a region column and 99% of the values is "England" then MySQL will chose full table scan.
By using a INDEX prefix of only 10 CHARS you're further decreasing variables in the index tree.
Example (using INDEX(6) to make the point easier):
'United Kingdom' INDEX(6) => 'United',
'United States' INDEX(6) => 'United',
'United Arab Emirates' INDEX(6) => 'United'
No variance there, why use the index?
Martin Gallagher | Speeple: The latest news
|
|
|
|
| Re: performance with varchar vs tinytext [message #1348 is a reply to message #1318 ] |
Fri, 01 June 2007 05:19   |
Speeple Messages: 91 Registered: August 2006 |
Member |
|
|
Hmm,
This doesn't really make sense to me then, if there is high cardinality then it would be very efficient to resort to an index lookup over that of a full table scan.
Posting the table schema, a sample result and a sample query would help give a better responce.
Martin Gallagher | Speeple: The latest news
|
|
|
| Re: performance with varchar vs tinytext [message #1352 is a reply to message #1318 ] |
Fri, 01 June 2007 09:44   |
myrddin Messages: 12 Registered: April 2007 |
Junior Member |
|
|
First of all thanks for looking into this.
Alright, to keep things simple here's the sample table I'm using to test :-
CREATE TABLE `foo` (
`id` int(11) unsigned NOT NULL auto_increment,
`name` tinytext NOT NULL,
`name_varchar` varchar(100) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `name_index` (`name`(100)),
KEY `name_varchar_index` (`name_varchar`(7))
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;
name and name_varchar contains the same data which are 7 digit numbers(in the form of strings).
mysql> select * from foo limit 5;
+----+---------+--------------+
| id | name | name_varchar |
+----+---------+--------------+
| 1 | 1550620 | 1550620 |
| 2 | 1554011 | 1554011 |
| 3 | 1554864 | 1554864 |
| 4 | 1571814 | 1571814 |
| 5 | 1582609 | 1582609 |
+----+---------+--------------+
5 rows in set (0.00 sec)
mysql> select count(*) from foo;
+----------+
| count(*) |
+----------+
| 131951 |
+----------+
1 row in set (0.00 sec)
mysql> show index from foo;
+-------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| foo | 0 | PRIMARY | 1 | id | A | 131951 | NULL | NULL | | BTREE | |
| foo | 0 | name_index | 1 | name | A | 131951 | 100 | NULL | | BTREE | |
| foo | 1 | name_varchar_index | 1 | name_varchar | A | 131951 | 7 | NULL | | BTREE | |
+-------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> explain select * from foo order by name;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
mysql> explain select * from foo order by name_varchar;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
mysql> explain select * from foo force index(name_varchar_index) order by name_varchar;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
I have no idea why its using filesort.
Also like I mentioned earlier the only way i got it to use the index is by using a where clause
mysql> explain select * from foo force index(name_varchar_index) where name_varchar='blah' order by name_varchar;
+----+-------------+-------+------+--------------------+--------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------+--------------------+---------+-------+------+-------------+
| 1 | SIMPLE | foo | ref | name_varchar_index | name_varchar_index | 7 | const | 1 | Using where |
+----+-------------+-------+------+--------------------+--------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
[Updated on: Fri, 01 June 2007 10:01]
|
|
|
| Re: performance with varchar vs tinytext [message #1388 is a reply to message #1352 ] |
Fri, 08 June 2007 17:27   |
|
short answer from what i understand is:
mysql sorts by creating tmp tables
tmp tables dont support text/blob columns
mysql will then use a filesort for those queries
using a where does not require a tmp table to reorder the data, and therefor would not trigger a filesort.
That's my understanding. milage may vary
|
|
|
|
|
| Re: performance with varchar vs tinytext [message #1398 is a reply to message #1395 ] |
Mon, 11 June 2007 14:17   |
Speeple Messages: 91 Registered: August 2006 |
Member |
|
|
In that case is it absolutely neccessary to use a CHAR type over int type?
Int type will require 4 bytes for each int.
Say you have 123456789, as int type, 4 bytes, as varchar 9 bytes.
Also, problems occur on comparisons, e.g.
WHERE id=3
Will cause problems for the SQL interpreter, you have to be aware of the required use of quotes on VAR/CHAR columns, e.g.
WHERE id='3'
Martin Gallagher | Speeple: The latest news
|
|
|
|
| Re: performance with varchar vs tinytext [message #1400 is a reply to message #1318 ] |
Mon, 11 June 2007 16:01   |
myrddin Messages: 12 Registered: April 2007 |
Junior Member |
|
|
Update: Ignore the above post about the id column. I created another column of type int and created an index on that and it worked ok.
mysql> explain select * from foo force index(name_int_index) order by name_int;
+----+-------------+-------+-------+---------------+----------------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------------+---------+------+--------+-------+
| 1 | SIMPLE | foo | index | NULL | name_int_index | 5 | NULL | 131951 | |
+----+-------------+-------+-------+---------------+----------------+---------+------+--------+-------+
1 row in set (0.00 sec)
So...it would seem the indexes aren't being used for simple order by when dealing with columns of type VARCHAR and TEXT. I still dont understand why it doesn't use it...since text type cols are quite common.
[Updated on: Mon, 11 June 2007 16:02]
|
|
|
| Re: performance with varchar vs tinytext [message #1402 is a reply to message #1318 ] |
Mon, 11 June 2007 17:31   |
sterin Messages: 323 Registered: March 2007 Location: Sweden |
Senior Member |
|
|
The problem is basically a combination of
1. Choosing all columns with a "SELECT * "
2. Choosing all rows since you don't have a WHERE clause.
Compare these:
mysql> explain select * from foo order by id;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 2000 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
mysql> explain select id from foo order by id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | index | NULL | PRIMARY | 4 | NULL | 2000 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
Notice that the second query with only id column uses index while the first don't.
And these:
mysql> explain select * from foo order by id;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 2000 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
mysql> explain select * from foo where id < 100 order by id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 4 | NULL | 98 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
Notice how when we limit the amount of rows returned that it starts to use a index even though I still have SELECT *.
The reason for this is that using an index is actually expensive in it's own way.
If you are scanning an index and all columns that you want to return is part of the index it means that it can return the data right away and you have no extra cost (performance wise).
But if you are selecting columns that are _not_ part of the index that is used. It means that the DBMS has to read the index and then jump to the correct row in the table to return the data from the missing columns.
So each row read = one jump to the table.
And the big problem with this is that random reads against disk devices is very expensive compared to sequential reads (the whole table in one go).
So somewhere there is a trade off where there is more expensive to use an index than it is to read in the entire table, sort it in RAM memory and return it.
It's hard to really say where this limit is depending on your DB size and hardware. But as a rule of thumb I usually say that when you return more than somewhere around 5-10% of the rows you should start to think about it.
So that is why you get strange results with your explain.
BTW filesort doesn't necessarily mean that it is written to disk. That depends on the size of your sort_buffer and the size of the result set returned by the query.
Only when the sort_buffer is full it will start to write temporary files on disk.
[Updated on: Mon, 11 June 2007 17:37]
|
|
|
| Re: performance with varchar vs tinytext [message #1407 is a reply to message #1318 ] |
Tue, 12 June 2007 09:42   |
myrddin Messages: 12 Registered: April 2007 |
Junior Member |
|
|
| Quote: |
The problem is basically a combination of
1. Choosing all columns with a "SELECT * "
|
That is actually the first thing I tried (specifying the reqd cols only) after I started seeing this issue but:-
mysql> explain select name_varchar FROM foo force index (name_varchar_index) order by name_varchar;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
It doesn't seem to matter if I specify the columns or use *, I get the same result for explain and its using filesort. However it does seem to work for INT columns which is what "id" is. As before mysql seems to use the index for int cols and not varchar and text.
| Quote: |
BTW filesort doesn't necessarily mean that it is written to disk. That depends on the size of your sort_buffer and the size of the result set returned by the query.
Only when the sort_buffer is full it will start to write temporary files on disk.
|
| Quote: |
From the ini:-
# This buffer is allocated when MySQL needs to rebuild the index in
# REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE
# into an empty table. It is allocated per thread so be careful with
# large settings.
sort_buffer_size=64M
From the mysql manual:-
If you want to increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
Increase the size of the sort_buffer_size variable.
Increase the size of the read_rnd_buffer_size variable.
Change tmpdir to point to a dedicated filesystem with large amounts of empty space. This option accepts several paths that are used in round-robin fashion. Paths should be separated by colon characters (‘:’) on Unix and semicolon characters (‘;’) on Windows, NetWare, and OS/2. You can use this feature to spread the load across several directories. Note: The paths should be for directories in filesystems that are located on different physical disks, not different partitions on the same disk.
|
From what I understood, that variable is only used if mysql can't use the index that I created and thats whats puzzling me as to why it isn't using the index for varchar and text cols. And if my table is really big, it cant quite fit all the indexes in memory and it will most likely start writing to disk and in any case the index rebuilding will take time until its cached and hence the first query will always be slow.
If you have more suggestions on why this is happening please post away. I'm all ears at this point to understand how this thing works.
|
|
|
|
|
|
| Re: performance with varchar vs tinytext [message #1415 is a reply to message #1318 ] |
Wed, 13 June 2007 09:44  |
myrddin Messages: 12 Registered: April 2007 |
Junior Member |
|
|
| Quote: |
How will it be able to use the index to sort that data properly if the index only contains the first 4 characters?
|
I figured if you issue a force index it will use the index to sort granted in cases like you mentioned those parts of the sort will be undefined(i.e. it will do a sort based on those 4 chars only, after 4 chars is undefined). But apparently thats not how it works.
| Quote: |
I'm curious why you want it to use the index for a "select * from foo order by"?
Because since you are selecting all rows it will impose a lot of random reads of the table and they are very expensive.
So I don't think that you will gain anything.
|
My actual table has about 200+ cols and has about 200K rows right now and will be increasing. I will be selecting approximately 15 cols(of both numeric and text types) from that and will be doing an order by and limit(x,100). Are you saying that using the index in such a case will be slower than just sticking with filesort?
| Quote: |
So the recommendation is that you avoid BLOB/TEXT columns unless you really need them.
|
I know I got sidetracked there and thanks for getting back on topic. I will be following that recommendation.
I appreciate all the help/suggestions everyone. Thanks!
|
|
|