Home » Performance » FullText » Creating a custom stoplist from actual data
Creating a custom stoplist from actual data [message #3634] Thu, 16 October 2008 05:27
myraz  is currently offline myraz
Messages: 1
Registered: October 2008
Junior Member
I have some huge fulltext indexes created using ft_min_wordlen=2 and no stoplist. The data is not english language so the default stoplist cannot be used.

I have this idea if you could 'look' at the current index (using a custom tool, a mysql patch or whatever) you should be able to determine what words would be good candidates in a custom stoplist. Would that make sense? Is it at all theoretically possible? If it is, and if someone would create that tool, I guess it would be of benefit for many users with non-english (but space delimited) data.

As a parenthesis, I did try having a perl script extract all words from the table (not the index) and count their frequencies. It works fine, but is very slow and dull. If my idea is doable, I picture it would be blazingly fast and usable on huge existing tables.

Ideas? Comments? Thanks.
Previous Topic:Search Not Using Index Across Join
Next Topic:Sphinx with Arabic
Goto Forum:
  



Current Time: Wed Jan 7 19:35:03 EST 2009

Total time taken to generate the page: 0.62656 seconds
.:: Contact :: Home :: MySQL Support by Percona.com ::.

Powered by: FUDforum 2.7.5.
Copyright ©2001-2006 FUD Forum Bulletin Board Software

MySQL Performance | Forum authority Badge