What's New in 4: Fast full-text searching
RT is commonly used to manage a lot of data. Unfortunately, prior to RT 4, it didn't support effective indexing of the content of comments or correspondence. RT 4 makes use of the full-text indexing capabilities of PostgreSQL and Oracle, or the external Sphinx library for MySQL, to efficiently index this content, allowing for high-speed searching of textual content. Due to bugs in version 4.0.0, you'll need version 4.0.1 or higher to take advantage of the full-text indexing detailed below.
By default, with no full-text indexes enabled, RT 4 does not allow
searching of the "Content" field at all, owing to how resource-intensive
unindexed full-text searches are. If you wish to enable non-indexed
full-text searching, like previous versions of RT allowed, you will need
to add the following to your
Set(%FullTextSearch, Enable => 1, Indexed => 0, );
The real magic comes from enabling your database's full-text indexes, however. For the section below, we'll be showing the configuration for PostgreSQL; we'll briefly address MySQL, which is slightly more complicated, at the end of this post.
If you intend to enable full-text indexed searches, you should also read
docs/full_text_indexing.pod file which shipped with your RT
install, which will provide roughly the same information as below, but
is additionally guaranteed to be up-to-date with your version of RT.
To start off with, you'll need to configure your database to store the
index. On PostgreSQL, this consists of adding a new column and creating
an index on it. To do this, run
$ /opt/rt4/sbin/rt-setup-fulltext-index Enter the name of a DB table that will be used to store the Pg tsvector. You may either use the existing Attachments table, or create a new table. [Attachments]: Enter the name of a column that will be used to store the Pg tsvector: [ContentIndex]:
Generally, you'll want to store the new column in the Attachments table, right next to the data it is indexing. If you want to back up the index separately, or have other storage constraints, you may wish to create a new table to store it, but the default is almost certainly correct. Similarly, the default column name is also most likely correct.
You may choose between GiST or GIN indexes; the former is several times slower to search, but takes less space on disk and is faster to update. [GiST]:
You can read more about the differences between the index types in the PostgreSQL manual.
Going to run the following in the DB: ALTER TABLE Attachments ADD COLUMN ContentIndex tsvector Going to run the following in the DB: CREATE INDEX ContentIndex_idx ON Attachments USING gist(ContentIndex) You can now configure RT to use the newly-created full-text index by adding the following to your RT_SiteConfig.pm: Set( %FullTextSearch, Enable => 1, Indexed => 1, Column => 'ContentIndex', Table => 'Attachments', );
At this point, your database has been configured. Ensure that you add
Set(...) block that
rt-setup-fulltext-index output (which may be
different from the above!) to your
RT_SiteConfig.pm before continuing.
Your index now exists, but is empty. In order to fill it, we must find
and index every textual attachment in RT using
--all. This can be a very time-consuming task, so be prepared for it
to take a while. It can be interrupted and resumed safely, however.
Once it has finished, you will need to schedule
run at regular intervals to pick up new additions to the database. This
is easiest done by running
ln -s /opt/rt4/sbin/rt-fulltext-indexer
/etc/cron.hourly, which will ensure that it runs once an hour, but many
other alternatives are possible. Depending on how fast tickets are
created in your system, running it as frequently as once every two to
five minutes may be possible.
The story on MySQL is unfortunately considerably more complicated, as
MySQL does not natively support a full-text index format. It integrates
with the external Sphinx search library, but
this requires recompiling MySQL to include the SphinxSE engine which
allows MySQL queries to retrieve data from the Sphinx
While many vendors package the external Sphinx tools, none of them ships
a MySQL package which has been compiled with SphinxSE at the time of
writing. As such, this is most likely not for the faint of heart, or
those only just getting their feet wet with system administration.
Despite that, the process of recompiling MySQL with SphinxSE is fairly straightforward. Full instructions are provided in the Sphinx documentation. Sphinx 2.0.1 has been tested to work with both MySQL 5.0 and 5.1; it is not compatible with MySQL 5.5 at this time.
Once a SphinxSE-enabled MySQL has been installed, the configuration
proceeds roughly as for PostgreSQL, above. First, you must run
rt-setup-fulltext-index, which will ensure that your MySQL is
correctly configured with SphinxSE, then prompt you for additional
$ /opt/rt4/sbin/rt-setup-fulltext-index Enter name of a new MySQL table that will be used to connect to the Sphinx server: [AttachmentsIndex]: Enter URL of the sphinx search server; this should be of the form sphinx://<server>:<port>/<index name> [sphinx://localhost:3312/rt]:
In order to communicate with the Sphinx
searchd server, we create a
virtual SphinxSE table in the database. This table hence needs to know
where your Sphinx
searchd server is running. The default is usually
correct if you are planning to run
searchd on your database server on
the default port.
Maximum number of matches to return; this is the maximum number of attachment records returned by the search, not the maximum number of tickets. Both your RT_SiteConfig.pm and your sphinx.conf must agree on this value. Larger values cause your Sphinx server to consume more memory and CPU time per query. :
This is an unfortunate limitation imposed by the design of Sphinx; 10000
should be sufficient for most purposes, but you may need to increment
this number if you have tickets with an extremely large number of
transactions which match your common search criteria. If you wish to
change this number later, you must alter it in both
Going to run the following in the DB: CREATE TABLE AttachmentsIndex ( id INTEGER UNSIGNED NOT NULL, weight INTEGER NOT NULL, query VARCHAR(3072) NOT NULL, INDEX(query) ) ENGINE=SPHINX CONNECTION="sphinx://localhost:3312/rt" CHARACTER SET utf8 You can now configure RT to use the newly-created full-text index by adding the following to your RT_SiteConfig.pm: Set( %FullTextSearch, Enable => 1, Indexed => 1, Table => 'AttachmentsIndex', MaxMatches => '10000', );
It will also output a sample
sphinx.conf file. At this point, your
database has been configured. Ensure that you add the
rt-setup-fulltext-index output (which may be different from the
above!) to your
RT_SiteConfig.pm before continuing.
Your index now exists, but is empty. In order to fill it, we must run the Sphinx indexer:
$ indexer rt Sphinx 2.0.1-beta (r2792) Copyright (c) 2001-2011, Andrew Aksyonoff Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx.conf'... indexing index 'rt'... collected 133809 docs, 1015.6 MB sorted 137.2 Mhits, 100.0% done total 133809 docs, 1015645236 bytes total 144.967 sec, 7006036 bytes/sec, 923.02 docs/sec total 400 reads, 6.474 sec, 637.6 kb/call avg, 16.1 msec/call avg total 582 writes, 2.473 sec, 959.3 kb/call avg, 4.2 msec/call avg
Finally, you will need to start the
$ searchd Sphinx 2.0.1-beta (r2792) Copyright (c) 2001-2011, Andrew Aksyonoff Copyright (c) 2008-2011, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx.conf'... listening on all interfaces, port=3312 precaching index 'rt' precached 1 indexes in 0.033 sec
You will need to re-index the data using
indexer --rotate rt at
regular intervals. Since Sphinx does not do incremental indexing, but
rather re-indexes all content each time it is run, a "main+delta"
split of the
data may be necessary to reduce index update time sufficiently to allow
close to real-time indexing.