Home » Performance » MySQL » $100 Paypal if you can optimize this query
|
| Re: $100 Paypal if you can optimize this query [message #2611 is a reply to message #2610 ] |
Mon, 11 February 2008 14:23   |
|
You might try something along these lines:
SELECT Main.B_Number
FROM BB_Posts AS Main
WHERE Main.B_Number IN
(
SELECT Reply.Number
FROM BB_Posts AS Reply
WHERE Reply.B_PosterId = 654
AND Reply.B_Board IN ('1', '2', '3')
)
ORDER BY Main.B_Last_Post DESC;
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2612 is a reply to message #2611 ] |
Mon, 11 February 2008 16:20   |
Ythan Messages: 4 Registered: February 2008 Location: NY, USA |
Junior Member |
|
|
Hi James,
Thank you so much for your prompt response! I really appreciate your taking the time to look at this for me. I actually did try a similar approach using subqueries... although it eliminates the temporary table, it must perform a filesort on the entire posts table which unfortunately is significantly slower. For reference, the original query usually takes between 2 - 4 seconds. The new query hangs on "preparing" for 30+ seconds and I have to kill it before it completes.
+----+--------------------+-------+-----------------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-----------------+---------------+---------+---------+------+---------+-----------------------------+
| 1 | PRIMARY | Main | ALL | NULL | NULL | NULL | NULL | 5079927 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | Reply | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+-----------------+---------------+---------+---------+------+---------+-----------------------------+
I messed around with different index setups, but even using FORCE INDEX the main table always examines all 5 million rows. I may be missing something, since I don't have a lot of experience with subqueries (we used mySQL 3 until recently). However, at least as written, this approach seems too slow to be practical.
Thanks again for your reply though!
Regards,
-Y
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2614 is a reply to message #2610 ] |
Mon, 11 February 2008 18:08   |
|
Well, my query was wrong anyway, it should have been:
SELECT Main.B_Number
FROM BB_Posts AS Main
WHERE Main.B_Number IN
(
SELECT DISTINCT Reply.B_Main
FROM BB_Posts AS Reply
WHERE Reply.B_PosterId = 654
AND Reply.B_Board IN ('1', '2', '3')
)
ORDER BY Main.B_Last_Post DESC;
I don't think that will help with your filesort, though.
I'll think about it, though.
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2615 is a reply to message #2610 ] |
Tue, 12 February 2008 13:03   |
jonstjohn Messages: 7 Registered: February 2008 |
Junior Member |
|
|
Here is a solution:
Create a new key:
alter table BB_Posts add index new_index3 (B_Last_Post, B_PosterId, B_Board);
Now slightly modify your query:
SELECT BB_Posts.B_Number FROM BB_Posts FORCE INDEX (new_index3) WHERE BB_Posts.B_Last_Post = 1 and BB_Posts.B_PosterId = 654 and BB_Posts.B_Board in (1,2,3) ORDER BY BB_Posts.B_Last_Post desc;
Note the 'FORCE INDEX'
Now explain:
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | BB_Posts | range | new_index3 | new_index3 | 9 | NULL | 3 | Using where |
+----+-------------+----------+-------+---------------+------------+---------+------+------+-------------+
Do I win? : )
[Updated on: Tue, 12 February 2008 13:19] Get the weather, where you want it! Weather for over 250 climbing areas throughout the U.S.
http://www.climbingweather.com
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2617 is a reply to message #2615 ] |
Tue, 12 February 2008 14:10   |
Ythan Messages: 4 Registered: February 2008 Location: NY, USA |
Junior Member |
|
|
Thanks so much guys for your continued assistance. Jon unfortunately your example isn't exactly what I'm trying to do, but it's my fault for not explaining it better. I've been working with this stupid BB for so long that I forgot people aren't born with an inherent knowledge of its data structure. 
The complication arises from the fact that the BB supports threaded discussion, and each branch will have different last post info. Please consider these two threads:
+ Test Post
| B_Number = 2, B_PosterId = 666, B_Posted = 1200000000, B_LastPost = 1200000999
|
+-- Re: Test Post
| B_Number = 3, B_PosterId = 420, B_Posted = 1200000001, B_LastPost = 1200000002
| |
| +-- Re: Re: Test Post
| B_Number = 4, B_PosterId = 123, B_Posted = 1200000002, B_LastPost = 1200000002
|
+-- Re: Test Post BUMP
B_Number = 6, B_PosterId = 987, B_Posted = 1200000999, B_LastPost = 1200000999
+ Another Thread
| B_Number = 1, B_PosterId = 808, B_Posted = 1100000000, B_LastPost = 1200000888
|
+-- Re: Another Thread BUMP
B_Number = 5, B_PosterId = 123, B_Posted = 1200000888, B_LastPost = 1200000888
If I'm user #123 and I want to retrieve the main post # for threads I've posted in ordered by most recent activity, the correct order is:
B_Main = 2 (Last post = 1200000999)
B_Main = 1 (Last post = 1200000888)
However if I do something like "SELECT B_Main FROM Posts WHERE B_PosterId = 123 ORDER BY B_LastPost DESC", I'll end up with this:
B_Main = 1 (Last post = 1200000888)
B_Main = 2 (Last post = 1200000002)
(Note that in this example the main post ID order corresponds with the last post stamp order, but that can't be assumed.)
Anyway that's why my query joins the posts table on itself, so I can get the true last post stamp from the thread's parent, not the last post stamp for whatever particular branch of the thread the user posted in. The alternative is to add a field like B_ThreadLastPost or something, which would contain the same value for each post in a thread: the stamp of the very last post made. However, unless I have no choice I hate to throw out a working query which is almost good enough, duplicate the same data unnecessarily, and deal with populating this field + modifying the forum to use it.
I hope maybe this is a little more clear?
Anyway I get the sinking feeling this may not be possible with the existing structure and I'll have to add an additional field. But I hope I'm wrong and I'm still interested in any fresh insight into the problem. I'll still pay $25 to you guys who are making a good faith effort to help, even if the solutions aren't exactly what I need.
Thanks again!!
-Y
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2618 is a reply to message #2617 ] |
Tue, 12 February 2008 14:48   |
jonstjohn Messages: 7 Registered: February 2008 |
Junior Member |
|
|
Okay, here is another solution, sort of (of course). It doesn't use filesort, but it uses temporary table:
Add this index:
alter table BB_Posts add index index7(B_Last_Post, B_Number);
And then use the subquery proposed above:
SELECT Main.B_Number FROM BB_Posts AS Main WHERE Main.B_Number IN ( SELECT DISTINCT Reply.B_Main FROM BB_Posts AS Reply WHERE Reply.B_PosterId = 654 AND Reply.B_Board IN ('1', '2', '3')) ORDER BY Main.B_Last_Post DESC;
Finally, the explain:
explain SELECT Main.B_Number FROM BB_Posts AS Main WHERE Main.B_Number IN ( SELECT DISTINCT Reply.B_Main FROM BB_Posts AS Reply WHERE Reply.B_PosterId = 654 AND Reply.B_Board IN ('1', '2', '3')) ORDER BY Main.B_Last_Post DESC;
+----+--------------------+-------+-------+--------------------------------------------------------------------------+--------+---------+-------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+--------------------------------------------------------------------------+--------+---------+-------+------+------------------------------+
| 1 | PRIMARY | Main | index | NULL | index7 | 8 | NULL | 100 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | Reply | ref | board_topic_index,B_Board,w3t_Postsindex3,ID_ndx,new_index,index5,index8 | ID_ndx | 4 | const | 1 | Using where; Using temporary |
+----+--------------------+-------+-------+--------------------------------------------------------------------------+--------+---------+-------+------+------------------------------+
What do you think?
Get the weather, where you want it! Weather for over 250 climbing areas throughout the U.S.
http://www.climbingweather.com
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2619 is a reply to message #2618 ] |
Tue, 12 February 2008 16:31   |
Ythan Messages: 4 Registered: February 2008 Location: NY, USA |
Junior Member |
|
|
Oh man... looking at the EXPLAIN I thought you had it for sure... yeah it says it examines all the rows for 'Main', but it's using an index and I'm limiting the results to 25 at a time so no big deal, right? With high hopes I added an index on B_Last_Post, B_Number and tried the query on the full table, and...
+-----+-----------+--------------------+------------+-------------+------+------------+--------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+-----------+--------------------+------------+-------------+------+------------+--------------------------------------------------------------------------------------+
| 34 | root | 192.168.0.30:50525 | WWWThreads | Query | 120 | preparing | SELECT SQL_CALC_FOUND_ROWS Main.B_Number FROM posts AS Main WHERE Main.B_Number IN ( |
+-----+-----------+--------------------+------------+-------------+------+------------+--------------------------------------------------------------------------------------+
I had to kill it after two minutes of "preparing". 
It seems hard to evaluate the query performance without the full database of 5 million records, since it really looked like it would work based on the EXPLAIN output.
I was worried I was missing something obvious but maybe that's not the case... I'm gradually making peace with the idea that I may just need a table structure better tailored to this particular functionality. Any time you want to cut your losses and take the 25 bucks, I'll gladly pay up for the thought-provoking suggestions. I'm sure this technique will come in handy in the future, even if it doesn't work in this specific case. Just send me your Paypal address, my e-mail is ythan at shroomery.org.
Of course if you're having fun and want to keep trying that's fine too. 
Thanks again guys.
-Y
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2620 is a reply to message #2619 ] |
Tue, 12 February 2008 16:43   |
jonstjohn Messages: 7 Registered: February 2008 |
Junior Member |
|
|
Ythan -
Another way to approach it might be to split up the single query into two queries.
The first might get the last x number of replies from the particular user.
select B_Main from BB_Posts where B_PosterId = 654 and B_Board in (1,2,3) order by B_Last_Post desc limit 25;
put that into an array, then join it on commas and query for the main posts:
select B_Main from BB_Posts where B_Main in ([list]) order by B_Last_Post desc;
I know that it is not *quite* what you're looking for, but it might be a good approximation that performs significantly better than the single query.
Just some ideas -
Good luck!
Jon
Get the weather, where you want it! Weather for over 250 climbing areas throughout the U.S.
http://www.climbingweather.com
|
|
|
| Re: $100 Paypal if you can optimize this query [message #2621 is a reply to message #2610 ] |
Tue, 12 February 2008 19:15  |
|
Hierarchical data is a bitch to query efficiently.
The problem is that you can only use an index to select the subset of records that you want to sort OR to perform the sort, not both. If there are a large number of records returned, you're stuck with having to sort them all sans index.
I can't accept payment for the little assistance I might have provided. I've been helped several times on these forums myself, so I guess I'm just trying to give back.
|
|
|
Goto Forum:
Current Time: Fri Jan 9 00:13:34 EST 2009
Total time taken to generate the page: 0.03224 seconds |