Home » Performance » FullText » Setup Sphinx
Setup Sphinx [message #868] Mon, 05 March 2007 19:54 Go to next message
sroussey  is currently offline sroussey
Messages: 5
Registered: March 2007
Junior Member

I have a problem domain where there are lots of messages to be indexed, and I want to be able to limit them by a set of groupids.

Essentially, each group is a subgroup of another group. Something like this:

(((WebsiteID,AreaOfWebSiteID),SubAreaOfWebSiteID),ForumID)

And an alternate:

(WebsiteID,UserID)

All in addition to the keywords.

I'm fine with just using Sphinx for now, but want to use its MySQL integration at some point as well.

Just not sure how to get started....
Re: Setup Sphinx [message #871 is a reply to message #868 ] Tue, 06 March 2007 04:34 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Just define several groups.

WebSiteID
AreaofWebSiteID

etc

and then you can apply filters as you need to.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: Setup Sphinx [message #880 is a reply to message #871 ] Tue, 06 March 2007 11:16 Go to previous messageGo to next message
sroussey  is currently offline sroussey
Messages: 5
Registered: March 2007
Junior Member

Thanks, I wasn't sure if there was a composite type index for groups.

I guess my other issue is that I don't have a unique document ID. My unique id is a pair (ForumID,MessageID). Do I need to create a translation table to map this tuple to a single unique integer?

Thanks again!
Re: Setup Sphinx [message #881 is a reply to message #880 ] Tue, 06 March 2007 11:37 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Right. You need to make it unique.

If you have no more than 256 forums and 16M messages per forums you could combine these two to get unique integer value.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: Setup Sphinx [message #883 is a reply to message #881 ] Tue, 06 March 2007 11:51 Go to previous messageGo to next message
sroussey  is currently offline sroussey
Messages: 5
Registered: March 2007
Junior Member

Nope, that won't work. Confused Bummer. A mapping table to global uniques makes it harder to plan for future partitioning, but I'll muddle my way through. Thanks!
Re: Setup Sphinx [message #884 is a reply to message #883 ] Tue, 06 March 2007 11:59 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Well. We have bunch of partitioned sphinx installation it is not the problem. We use technique as I described. We reserve X partitions and them use high bits as partition ID.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: Setup Sphinx [message #885 is a reply to message #884 ] Tue, 06 March 2007 12:04 Go to previous messageGo to next message
sroussey  is currently offline sroussey
Messages: 5
Registered: March 2007
Junior Member

Yeah, I was thinking of that. I can use a 24-bit unique for each 8-bit partition number. The partition # just can't be ForumID -- we have hundreds of thousands of them, and hundreds of millions of messages.

Thanks again. I can't wait to pull this from inside mysql some day. Cool

[Updated on: Tue, 06 March 2007 12:06]

Re: Setup Sphinx [message #886 is a reply to message #885 ] Tue, 06 March 2007 12:11 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
Right. In this case you need other form of partition and you can have forums mapped to partitions.

At hundreds of millions of posts you likely need distributed sphinx setup.

Let us know if you need any help with that.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: Setup Sphinx [message #887 is a reply to message #868 ] Tue, 06 March 2007 12:15 Go to previous messageGo to next message
sroussey  is currently offline sroussey
Messages: 5
Registered: March 2007
Junior Member

One other thing... since adding new documents is slow, I was thinking of creating files sort of like this:

Main Index
Delta Index for current month
Delta Index for current day

Then at the end of each day, merge things into the month, and at the end of each month, merge things into the main index. I want the searching to seem as up-to-date or "live" as possible.

Any estimate on how many documents are in an index before adding one seems slow? The idea of splitting by day or month is unscientific. It really should be based on document count (for example, what if I ended up needing to split off on a per hour basis?). I know only testing will tell me, but if there are some basic guidelines, it would be very helpful.

Thanks!
Re: Setup Sphinx [message #888 is a reply to message #887 ] Tue, 06 March 2007 12:20 Go to previous message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
We usually use two indexes global one and incremental index, which is usually per day. There is number of these index pairs used for various reasons.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Previous Topic:Sphinx as built-in SE causes server crashes?
Next Topic:Fulltext search with order by static field
Goto Forum:
  



Current Time: Fri Dec 5 11:35:18 EST 2008

Total time taken to generate the page: 0.06784 seconds
.:: Contact :: Home :: MySQL Support by Percona.com ::.

Powered by: FUDforum 2.7.5.
Copyright ©2001-2006 FUD Forum Bulletin Board Software

MySQL Performance | Forum authority Badge