Solr is an open source enterprise search server based on the Lucene Java search
library, with XML/HTTP and JSON APIs, hit highlighting, faceted search,
caching, replication, and a web administration interface.
Screenshots
Installation
The below installation procedure assumes that you are going to install Solr as well as Foswiki on the same server using Linux.
Foswiki plugin installation
You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.
Open configure, and open the "Extensions" section. "Extensions Operation and Maintenance" Tab → "Install, Update or Remove extensions" Tab. Click the "Search for Extensions" button.
Enter part of the extension name or description and press search. Select the desired extension(s) and click install. If an extension is already installed, it will not show up in the
search results.
You can also install from the shell by running the extension installer as the web server user: (Be sure to run as the webserver user, not as root!)
cd /path/to/foswiki
perl tools/extension_installer <NameOfExtension> install
cd /var/solr/data
cp -r <foswiki-dir>/solr_9/cores .
mkdir configsets
cd configsets
ln -s <foswiki-dir>/solr_9/configsets/foswiki_configs
chown -R solr.solr /var/solr
Updating from a previous configuration set
An updated SolrPlugin might come with a newer configuration set, i.e. a newer schema.xml pr solrconfig.xml files. Make sure that these files coming with an update are installed to
the solr server as well. This will be taken care of when the foswiki_configs directory is linked into the solr server's configsets directory. Note however that any local changes
you made to these files will be overwritten by the update. You might eigher create a config set of your own and adjust the core definition accordingly to make use of the newly
created config set, or you need to merge changes into the standard foswiki_configs set of files.
Increasing the security limits
You may get a warning when starting the solr service in the next step saying along the lines of
[WARN] *** Your open file limit is currently 1024.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
To increase the file limit create a file /etc/security/limit.d/solr.conf with
solr soft nofile 65000
solr hard nofile 65000
solr soft nproc 65000
solr hard nproc 65000
The warning should be gone when starting the service.
Start solr service
service solr start
Test
cd <foswiki-dir>/tools
./solrindex topic=Main.WebHome
… should produce Indexing Main.WebHome
cd <foswiki-dir>/bin
./rest /SolrPlugin/search
… should return a JSON response from Solr showing the recently indexed topic
Skin integration
SolrPlugin comes with a skin overlay - called solr - that will replace the upper left search boxes in PatternSkin with a solr-driven auto-suggest search box. To switch that on use
* Set SKIN = solr, pattern
in your SitePreferences.
Note that you won't need to enable the solr skin overlay in case you are using NatSkin as it comes with support for SolrPlugin out of the box.
Preference settings
There are a couple of preference settings that you may set in your SitePreferences in order to customize some basic parameters of the solr search user interface:
date format for search results, see JQMomentContrib for documentation
dddd, Do, MMMM YYYY, HH:mm
SOLR_DEFAULTSORT
default sorting order of search results
score desc
SOLR_DEFAULTWEB
the web to search for, e.g. %BASEWEB%. all defaults to a global search
all
SOLR_EXACTSEARCH
boolean switch to select between two kinds of search; set this to true to get a sharper result set based on your query
false
SOLR_EXTRAFILTER
solr query filter added ontop of the user-specified query
SOLR_INSTANTSEARCH
boolean switch to fire up search while you type
false
SOLR_TOPICSEARCH
boolean switch to enable topic search by default
false
SOLR_NUMROWS
default number of search results returned per page
10
SOLR_QUERYFIELDS
specify the qf solr parameter; note that this will disable SOLR_EXACTSEARCH settings
see solrconfig.xml
SOLR_INCLUDEWEB
regular expression of webs to be listed in the web facet
SOLR_EXCLUDEWEB
regular expression of webs not to be listed in the web facet
Commandline scripts
There is a set of tools to interact with the Solr index from the commandline. These can either be
used to index Foswiki manually - as we did in above tests - as well as for searching or deleting specific documents in the index.
The set of tools comes in two variants, one for normal single-host Foswiki installations and for virtual hosting using VirtualHostingContrib
The virtual-hosting aware scripts have a prefix virtualhost-... and take an optional host=<domain> parameter to specify the virtual domain to interact with.
When not specified will the script be executed for each domain in turn as configured in VirtualHostingContrib. Only exception is solrjob (see below).
the web to be indexed; if undefined all webs will be indexed
all
topic="<web>.<topic>"
the topic to be indexed; use this parameter to index one specific topic
mode="full/delta"
mode of operation: full will unconditionally index all content as specified by web or topic; delta will only index content that has changed since the last time the script was run
delta
optimize="on/off"
optimize the Solr database by de-fragmenting its internal segments for better performance; this is normally not required unless a full indexing of larger chunks of content is performed; note that optimizing the Solr index might require considerable time and I/O resources on the filesystem of the server
This tool is a wrapper around solrindex and will use either solrindex or virtualhost-solrindex
depending on the host commandline parameter. It is mainly used in cronjobs.
In contrast to solrindex a locking & throttling strategy is used to prevent multiple indexers being started simulataneously.
specifies the virtual domain to operate on (only makes sense when running VirtualhostingContrib); Or specify all to perform the operation on all known virtual hosts
-m / --mode full/delta
mode of operation (see solrindex above)
delta
-t / --throttle <seconds>
number of seconds to wait until the indexing process is started; note that any other calls to solrjob are prevented from entering the indexing loop as well
5
Using Solr search on the commandline
cd <foswiki-dir>/bin
./rest /SolrPlugin/search ...
Before using SolrSearch and get back results you will need to index your content completely and do so repeatedly to keep up with changes in the Foswiki content base.
This is basically achievable in various ways:
full indexing: index all of the content from start to end
delta indexing: index topics that changed since the last time (delta) indexing was performed
realtime indexing: monitor changes in the Foswiki store and fire up indexing as close to the actual change event as possible
online indexing: index content changes as part of the content being saved
We will discuss these strategies and line out their advantages. A combination of a few of the above ways will then make up the recommended indexing strategy for Foswiki content.
Full indexing
./solrindex mode=full optimize=on
This will crawl all webs, topics and attachment and submit them to the Solr server, which will build up the search index. This can take a considerable amount of time
depending on the amount of content and number of users registered to your site, so you may prefer to do it at a quiet time.
Note that full indexing is required the first time you installed SolrPlugin. From there on will you be able to use delta indexing to update the index incrementally as
content changes in Foswiki.
It is recommended to only perform a full indexing again once in a week or preferably in longer intervals.
Delta indexing
./solrindex
This will inspect all of the content base and check for changes since the last time the content was added to Solr. Any update content will be added to the index
as required. The delta indexing procedure will also look up all of the index and delete those documents from it where the original topic in the Foswiki content base
has been removed.
Delta indexing is a relatively fast operation that is best performed every 15 minutes or so. Don't shorten the intervals of delta indexing too much as that would
create additional load on the server where no content is found to be delta-indexed.
Realtime indexing
This mode of operation requires a separate service to be installed called foswiki-watch. This is a perl script that monitors any actions in Foswiki's event.log.
Note that this is only a "near-realtime" indexing behavior as the used script to perform the indexing is configured to throttle the procedure for a given amount of time defaulting to 5 seconds. So any change to the content will then show up within 5 seconds after the event.
Assuming you are running Foswiki on a Linux server with a systemd master server, use the following commands to install the foswiki-watch service
Configure /etc/default/foswiki-watch to match your installation. Available settings:
FOSWIKI_ROOT: the path to your foswiki, e.g. /var/www/foswiki
FOSWIKI_WATCH_EVENTS_LOG: file to watch, e.g. /var/www/foswiki/working/logs/events.log
FOSWIKI_WATCH_PARALLEL: number of indexers to start in parallel at max, default 1
FOSWIKI_WATCH_THROTTLE: number of seconds to wait before starting an indexerm default 1 second
FOSWIKI_WATCH_VHOSTING: boolean switch to enable operate in a virtual hosting setting, default 0
FOSWIKI_WATCH_DEBUG: boolean switch to enable debugging
If you are running Foswiki using VirtualHostingContrib and all your vhosts are located in /var/www/vhosts then set the FOSWIKI_WATCH_EVENTS_LOG to a glob path such as /var/www/vhosts/*/working/logs/events.log to watch all event logs of all vhosts. Don't forget to enable FOSWIKI_WATCH_VHOSTING.
Finally enable and start the service with
Indexing is then reported to the system's log service.
Online indexing
Not recommended, however …
This mode of operation refers to a way to update the search index immediately as part of the save operation performed by Foswiki on behalf of the user.
The biggest advantage here is that changes to the content base will immediately show up in the search index reflecting the exact changes being made to the
content base. Note however that this can significantly cause performance issues interacting with Foswiki as indexing a topic an take quite some time.
There are a couple of flags to switch on/off online indexing in your configuration.
Enable / disable indexing content as part of a save operation:
Add --host all to index all virtual hosts, or --host <hostname> to index a single virtual host.
Recommendations
By now we are able to orchestrate a couple of ways how to keep up with changes in Foswiki while indexing it into an external database such as Solr.
There are a couple of pros and cons to keep in mind innate to every of the above methods. Also, your own business requirements might significantly shift any decision
how and when to schedule crawling the content. Some of the criteria to keep in mind are:
size of content base
speed of indexing content determined by server resources
interactive performance as perceived by the user
real-time requirements for updates in search results
changes in access control structures such as:
new users being registered to Foswiki,
changing member ship in user groups,
changing clearance of user groups for specific content
What to keep in mind for full indexing
Especially changes in access control structures might affect clearance to content in a broader scale. As the indexing procedure caches the current authorization for a specific
piece of content along with it, will a change to access control -- independent to any change of the content itself -- render access control incorrect as cached into the Solr index
unless this content is indexed again. This is not a problem when the ACL of a single document is altered as this document is re-indexed again as part of the change event.
No such re-indexing is triggered automatically when a user group changes or is granted more or less authorization for content. This will indeed only be reflected the next
time a full indexing is performed.
Access control structures might be changing totally outside of Foswiki when using LdapContrib or PluggableAuthContrib where users and groups are provided by an external identity provideer.
These user and group records immediately affect Foswiki granting access to documents (there is some caching involved here as well, but let's ignore this for now). Only after
indexing affected documents again will a search on the index exclude / include new content users have access to when visiting the page directly.
Therefore a regular full indexing is required, presumably once a week or once a day during off times.
The runtime of a full indexing run depends on the size of your content base as well as the size of the user base. Both directly affect the throughput indexing content.
It is strongly recommended to plan full indexing during off times when the system isn't used otherwise. Also, make sure that two full indexing runs don't overlap as that
would constantly increase load on the involved servers.
In those cases where a full indexing run over all of the content base exceeds off times (e.g. starting Friday night, doesn't finish on Monday morning) will you need to
add more server resources. There are multiple ways to do so. Step one would be to use separate servers for both Foswiki and Solr. Please read up on how to scale Solr beyond
a single-node installation as has been outlined in above configuration.
Correctness of search index
A search index might show "incorrect" results for example when the content it indexes doesn't actually exist anymore. So users get a positive search hit but won't be able
to access the content anymore: both content base and search index are out of sync. Keeping the search index "correct" is of importance for any indexing strategy.
A search index might also be "incorrect" when it doesn't reflect the access rights a users has got on the content itself. That is: the search engine shall only return
search results for content that the user has clearance for. No such search result shall ever be returned for content that the user isn't allowed to access of even
get to know that it exists.
In SolrPlugin any Foswiki ACLs are added to the Solr database while content is indexed. So ACLs are checked as an additional filter on any search operation that an
authenticated user might perform.
Correctness of the search index as we discuss it now is more concerned with the time it takes for to keep any content change in Foswiki in sync as it is being indexed and
added to the Solr database.
There are two general categories for indexing content that we want to compare now:
online indexing: index content as part of the interaction performed by the user
offline indexing: perform content indexing independent from the user interacting with the system online
Offline indexing is performed by the solrindex script as well as the solrjob wrapper. Both might be used in a cronjob or by the foswiki-watch service as described above.
Looking at online indexing there is a price in doing so that we should keep in mind before switching it on.
Indexing will be part of a save, delete or rename operation performed by the user
and thus directly increase the perceived time for the user to interact with the system while applying content changes.
You may decide yourself when trading interactive performance against negative side-effects due to "incorrect" search indexes. It is recommended to rather sacrifice
a short period of time for the search index not being quite up-to-date rather than slowing down the interactive performance of the system by hooking the indexing procedure
into the online operations of Foswiki.
It is recommended to replace Foswiki's default AutoViewTemplatePlugin with AutoTemplatePlugin. This will allow you to replace the default WebSearch, WebChanges and SiteChanges as well as WikiUsers with a Solr-driven interface for better usability and performance.
Configure AutoTemplatePlugin by adding the following {ViewTemplateRules}
Any topic that has got a UserForm attached to it will participate in the person search interface at %USERWEB%.WikiUsers. Note that the value at {SolrPlugin}{PersonDataForm} specifies a Solr filter query
that might be customized and extended as required. For example, to also include any topic that has got a PersonTopicDataForms attached to it use:
$Foswiki::cfg{SolrPlugin}{PersonDataForm} = '(*PersonTopic OR *UserForm)';
Finally, you'll need to make this configuration accessible in wiki applications such as the WikiUsers view template. Add '{SolrPlugin}{PersonDataForm}' to the {AccessibleCFG} list as in
SolrPlugin comes with a set of search macros tailored to the extensive capabilities of Solr's responses to search queries.
All of them make use of the same set of options to render a response as listed in SOLRSEARCH.
SOLRSEARCH
This is the most important macro. It allows you to interact with the Solr server and display results within wiki applications.
An example search looks like this:
This will list the 10 most recently changed topics that match the string "test".
To list the 20 most recently changed topics topics that have the string "test" in their name use:
SOLRSEARCH allows you to use the full power of the Lucene query language. This
works with syntactically correct boolean queries like "title:foo OR body:foo".
Consult the Lucene Query Syntax guide to learn more about how to form more complicated queries.
SOLRSEARCH also allows you to run a query in dismax mode. The dismax query parser only supports a subset of the Lucene syntax, but is highly tolerant of all sorts of strange user input. The query syntax is uses is familiar to many search engine users, and supports +/- and quotes for groupings words. The edismax mode adds several more powerful features, though still short of what is offered by the full Lucene syntax.
a search can be cached optionally for the time of the current request, for example using id="solr1". further calls to %SOLRFORMAT can make use of the cached solr response to render it independent from the location of the %SOLRSEARCH call on the wiki page
search
query string: depending on the search type this can either be a free-form text (type=dismax), a valid lucene query (type=standard) or a combination of both (edismax)
*:*
type
dismax/edismax/standard: query type
standard
fields
list of fields to be returned in the result; by default all fields in solr documents are returned; communication between Foswiki and the solr search can be optimized by specifying only those fields that you are interested in while rendering the response
*, score
Flags:
jump
on/off: jump to the topic specified explicitly in the seach string
on
lucky
on/off: jump to the first result found
off
highlight
switch on/off highlighting of found terms
off
spellcheck
switch on/off spellchecking to propose alternative spellings in case no search result was found
off
Pagination:
start
integer index within the result from where to start listing results
0
rows
maximum number of documents to return
10
Filter parameters:
web
filter by web: this can be any webname
all
contributor
filter by contributor to a topic
filter
lucene query to filter results
extrafilter
additional lucene filter query (see SolrSearchBaseTemplate for the difference in filter and extrafilter
reverse
on/off - reverts sorting if switched on; note: this overrides sorting order specified in sort
off
sort
sorting expression; examples: score desc, date desc, createdate, topic_sort
checktopics
on/off - if enabled found topics that don't exist anymore are excluded
off
Dismax Parameter:
boostquery
a raw query string (in the solr query syntax) that will be included with the user's query to influence the score. example: type:topic^1000 will boost results of type topic
list of fields and their boosts giving each field a significance when a term was found in them. the format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 … this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree
list of fields and their boosts similar to queryfields. this parameter may contain fields and boosts that pharses (specified in quotes) are matched against. boosting those fields higher than their counterpart in queryfields allows you to prefer phrase matches over separate word matches
list of facets to be rendered during search; each facet can be a title=name pair specifying the facet name and the title label used to display it in the result; example: %MAKETEXT{"Webs"}%=web, %MAKETEXT{"Topic type"}%=field_TopicType_first_s
facetquery
query to be used for a facet query
facetoffset
used to page through a list of facets being returned by a search
facetlimit
maximum number of values to be displayed per facet; this is a list of pairs name=integer specifying a per-facet limit; example: 50, tag=100, contributor=10, category=10 will constraint the global limit of facet values to be returned to 50, tags to 100, list the top 10 contributors in the hit set as well as the 10 most used categories
100
facetmincount
minimum frequency of a facet to be included in the result
1
facetprefix
prefix string of a facet to be included
facetdatestart
part of a date facet describing the start of a time interval
NOW/DAY-7DAYS
facetdateend
part of a date facet describing the end of a time interval
NOW/DAY+1DAYS
facetdateother
part of a date facet describing the time intervals excluding the one specified with facetdatestart and facetdateend
before
hidesingle
comma separated list of facets to be hidden if there's only one choice left
disjunctivefacets
list of facets that are queried using OR; so searching within one facet will expand the search instead of drilling down
facet values are combined using AND
combinedfacets
list of facets where values are queried in each of them using OR; for example listing field_ProjectMembers_lst and field_ProjectManager_s will result in a lucne filter of the form field_ProjectMembers_lst:WikiGuest OR field_ProjectManager_s:WikiGuest
Formating results:
correction
format string for corrections proposed by the spellchecker
Did you mean <a href='$url'>$correction</a>
header
format string prepended to the result
format
format string used to render each hit in the result set
nullformat
format string used when no results were found
separator
format string used to separate hit results rendered using format
footer
format string appended to the result
header_interesting
format string prepended to more-like-this queries (see %SOLRSIMILAR)
format_interesting
format string used to render more-like-this results
separator_interesting
format string used to separate hit results in more-like-this queries
footer_interesting
format string appended to more-like-this queries
include_interesting
regular expression terms must match in a more-lile-this result
exclude_interesting
regular expression terms must not match in a more-lile-this result
header_group
format string for grouped results
format_group
format string for grouped results
separator_group
format string to separate results in grouped results
footer_group
format string for grouped results
include_group
regular expression groups must match
exclude_group
regular expression groups must not match
header_<facet>
format string prepended to a facet result
format_<facet>
format string used to render a facet value
separator_<facet>
format string used to separate facet values
footer_<facet>
format string appended to facet results
include_<facet>
regular expression facet values must match to be displayed
exclude_<facet>
regular expression facet values must not match to be displayed
SOLRFORMAT
When a solr response has been cached using the id parameter to SOLRSEARCH, it can be reused by subsequent calls to %SOLRFORMAT.
SolrPlugin comes with a custom schema to index general Foswiki data as defined
in the <solr-home-dir>conf/schema.xml file. It offers support for generic
DataForm values, so adding any new DataForm definition will allow to use
those formfields for faceting directly without changing configurations or having to reindex
the content.
The process of indexing content is configured on the Foswiki side which will crawl all webs, topics
and their attachments thus creating lucene documents which are then sent over to the solr server.
A lucene document is made up of fields of a certain type which defines the way the document should be processed
by the solr server. This is configured in the schema.xml file.
While the schema is able to cover all Foswiki related data it is still kept generic enough to be used for non-wiki
content as well.
Field types
This is the list of the most common field types used in the default schema.
See the schema.xml for more exotic field types like point and location,
useful for spatial search.
the data should be sent/retrieved in as Base64 encoded strings
int, float, long, double
default numeric field types. for faster range queries, consider the tint/tfloat/tlong/tdouble types
date
the format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime. The trailing "Z" designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Note: for faster range queries, consider the tdate type
text_ws
a text field that only splits on whitespace for exact matching of words
text
a general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt", and down cases. At query time only, it also applies synonyms.
text_std
same as text but without processing stopwords an synonyms
a general unstemmed text field - good if one does not know the language of the field. this field type is usful when searching for parts of a WikiWord |
text_generic
same as text but also splits words on case change while generating word parts.
text_substr
general substring decomposition
text_prefix
substring decomposition starting at the front of the string
text_suffix
substring decomposition starting at the back of the string
text_spell
generic text analysis for spell checking
text_sort
this is a text field suitable for sorting alphabetically
text_rev
a general unstemmed text field that indexes tokens normally and also reversed, to enable more efficient leading wildcard queries.
type
a list of strings used to analyse different media types. these are analysed using the system's mime types table and generating meaningfull values; for example a gif image would be of type "gif", "image" and "attachment"
this field controls view access of users to this topic or attachment in the search index; every query is augmented with an ACL check against this field; only users listed in this field are allowed view rights; special value is "all" when there are no view restrictions
edit_granted
string
multivalued
field holding the change rights of a user on this topic or attachment
attachment
string
multivalued
stored
list of all attachment names of this topic
author
string
stored
the name of the person that changed the document most recently
author_title
string
stored
title name of the person that changed the document most recently
catchall
text_generic
multivalued
stored
copy-field that gathers content from (allmost) all fields; this is the default search field for the "standard" query parser; note that fields to be queried can be configured per request using the "dismax" handler
category
string
multivalued
stored
list of categories this document is in; note: this field will only be used if Foswiki:Extensions/ClassificationPlugin is installed; it will populate it with the list of all categories up to TopCategory; content of this field is copied to category_search as well (see generic fields below)
comment
text_generic
stored
comment field of an attachment
concept
string
multivalued
stored
support for uima processing chain
container_id
string
stored
id of containing document, e.g. the topic this is a comment or attachment for
container_title
string
stored
title name of containing document
container_topic
string
stored
topic of containing document
container_url
string
stored
url of containing document
container_web
string
stored
web of containing document
contributor
string
multivalued
stored
list of users that contributed to this topic at some point in time
createauthor
string
stored
author of the initial version of this document
createauthor_title
string
stored
title name of the initial author of this document
createdate
tdate
stored
date when the initial version of this document was created
date
tdate
stored
time the the document was changed last
form
string
stored
name of the form attached to the current topic
icon
string
stored
icon to indetify the rendition for this document
id
string
stored
unique identifier for each document; this is the external id usable in applications; there's an internal solr document id not related to this field
language
string
stored
language of the current document; this may be specified explicitly using the CONTENT_LANGUAGE preference, or set to "detect" to let the solr update chain detect the language automatically
macro
string
multivalued
list of wiki macros being used in this topic
name
string
stored
filename of an attachment
outgoing
string
multivalued
stored
list of all outgoing links; this information is used to detect backlinks
parent
string
stored
parent topic of the current topic
phonetic
phonetic
multivalued
holds the phonetic analysis of the most important search fields
charnorm
text_charnorm
multivalued
result of the character normalization analysis
preference
string
multivalued
stored
this field catches all topic preferences. each preference is captured in a dynamic field as well (see dynamic fields below)
sentence
text_generic
multivalued
stored
support for uima processing chain
size
tint
stored
size of an attachment in bytes
spell
text_spell
multivalued
used for spellchecking
state
string
used by comments or any other application that tracks specific states of a document, such as "new", "unapproved", "approved", "draft", "unpublished", "published", …
text_prefix
text_text_prefix
multivalued
holds substring analysis of the most important search fields, starting at the front
text_suffix
text_text_suffix
multivalued
holds substring analysis of the most important search fields, starting at the back
summary
text_generic
stored
this is a plainified summary of the topic text
tag
string
multivalued
stored
list of tags assigned to this document; note: this field will only be used if Foswiki:Extensions/ClassificationPlugin is installed; content of this field is copied to category_search as well (see generic fields below)
text
text_generic
document text
thumbnail
string
stored
url to thumbnail representation of this document; mostly used for images
timestamp
tint
stored
epoch time when the document was added to the index
title
string
stored
title of a document; a topic title is read from a TopicTitle formfield, a TOPICTITLE preference variable or defaults to the topic name itself; for attachments this is the filename with the extension stripped off
topic
string
stored
name of the topic
type
type
stored
holds the type facet of the document; this is "image" for all kinds of images, "video" for all kinds of videos, "topic" for Foswiki topics and the verbatim file extension for everything else; note: plugins like Foswiki:Extensions/MetaCommentPlugin might use specific types as well (like "comment" in this case)
url
string
stored
url used to access the document being indexed
version
float
current version of the topic
webcat
string
stored
combined web-category facet
web
string
stored
name of the web this document is located in
webtopic
string
stored
concatenation of the web and topic part
Dynamic fields
Dynamic fields are generated based on the content properties of the document to
be indexed. Fields are specified using some kind of wildcard in schema.xml.
When a document is indexed, the wildcard will be expanded to create a proper
field name. Dynamic fields allow to apply specific ways of analyzing fields
based on their name, as well as cover fields that aren't known in advance, like
the name of all formfields of a DataForm that ever could be invented.
When SolrPlugin is about to index a DataForm attached to a topic, it tries to
guess the data type of each formfield. Normally, Foswiki does not specify any
type information within a DataForm definition. Exceptions are
(1) date: these are mapped to a *_dt field for the iso date and an *_i field for the epoch seconds
(2) checkbox, select, radio, textboxlist: these are potentially multi-value fields and are thus indexed in a *_lst field.
Every other formfield is stored into an *_s field as well as into a *_search, *_prefix and , *_substr, *_sort and *_std fields.
These capture the exact content while a slightly different analysis of the text.
DataForm formfields are mapped to lucene document fields by prepending the field_*
prefix to prevent name clashes with other dynamic fields generated on the fly.
So for example a formfield ProjectManager will be stored in field_ProjectManager_s
and field_ProjectManager_search. Likewise a select+multi formfield ProjectMembers
will be stored in field_ProjectMembers_lst as it is a multivalued field.
If a formfield name already comes with one of the below suffixes (_i, _l, _f, _dt, etc)
then this suffix will be used instead of any heuristics trying to derive the best
field type for the lucene field. That way DataForm fields although untyped by Foswiki
can be indexed type-specific nevertheless.
Similarly topic preferences are indexed using a preference_* prefix.
fields with a _i suffix are indexed as an integer number
*_l
tlong
stored
fields with a _l suffix are indexed as a long integer
*_f
tfloat
stored
fields with a _f suffix are indexed as a float
*_d
tdouble
stored
fields with a _d suffix are indexed as a double precision float
*_b
boolean
stored
true, false
*_s
string
stored
dynamic field for unanalyzed text
*_std
string
not stored
dynamic field for standard analysis, i.e. stopwords not being removed
*_t
text_generic
stored
generic text
*_dt
tdate
stored
a dateTime value
*_lst
string
multivalued
stored
this field is used for any multi-valued formfield in DataForms like, select, radio, checkbox, textboxlist
preference_*
string
stored
preference values such as preference_NAMEOFPREFERENCE_t
*_search
text_generic
stored
generic text, optimized for searching
*_sort
text_sort
stored
text optimized for sorting alphabetically
Copy fields
Finally, after having defined all field type there are some fields that are created by copying some
source field to a destination field using the copyField feature of solr. So while most of a lucene document
to be indexed is created by the crawler and indexer explicitly, some more are created automatically to facilitate
specific search applications. The destination fields are then analysed using the dynamic field definitions as given above.
---++ Templates
---+++ Structure of !SolrSearchBaseTemplate
---+++ Replacing !WebSearch and !WebChanges
---+++ Creating custom search interfaces
Dependencies
Name
Version
Description
Foswiki::Plugins::MultiLingualPlugin
>=4.10
Required
Foswiki::Contrib::JQMomentContrib
>=1.0
Required
Foswiki::Contrib::JQPhotoSwipeContrib
>=1.0
Required
Foswiki::Contrib::JQSerialPagerContrib
>=2.0
Required
Foswiki::Contrib::JQTwistyContrib
>=1.0
Required
Foswiki::Contrib::StringifierContrib
>=6.00
Required
Foswiki::Plugins::AutoTemplatePlugin
>=1.0
Optional
Foswiki::Plugins::ClassificationPlugin
>=1.0
Optional
Foswiki::Plugins::DBCachePlugin
>=1
Optional
Foswiki::Plugins::FilterPlugin
>=2.0
Required
Foswiki::Plugins::FlexWebListPlugin
>=1.91
Required
Foswiki::Plugins::ImagePlugin
>=3.0
Required
Foswiki::Plugins::JQueryPlugin
>=6.00
Required
Foswiki::Contrib::CacheContrib
>=0
Required
Linux::Inotify2
>=2
Required
HTML::Entities
>=3.64
Required
JSON::XS
>=2.231
Required
LWP::UserAgent
>=5.820
Required
Moo
>=2.00
Required
Types::Standard
>=1.00
Required
XML::Easy
>0
Required
Foswiki::Plugins::TopicTitlePlugin
>1.00
Required for Foswiki < 2.2
Change History
11 Mar 2025:
fixed processing of query fields thus fixing people search
24 Feb 2025:
improved default weights for query scores; fixed indexing of autofill formfields with a type cast; improved css vars for highlighting search results ; fixed cmdline args for solrjob script
27 Jan 2025:
shorten hightlight fragment; use nobody.png from JQueryPlugin instead of NatSkin
17 Jan 2025:
added support for solr-9
14 Mar 2023:
replaced iwatch with foswiki-watch service
26 Jan 2022:
gave up on stopwords: removed stopwords filter from the solr schema
26 Sep 2019:
performance improvements of indexer; implemented instant search; new crawler interface to index not only wiki content but also external sources; new filesystem crawler; eased configuring the search interface with preference variables; extended solr schema to cope with multiple data sources; improved handling of substring searches in text fields; require validation and authentication in rest handlers; removed hardcoded WorkflowPlugin support (plugins need to hook into the indexing api instead); added support for formfields of type number, percent, currency and bytes; improved indexing of used makros in a page; improved detection of outgoing links while indexing; changed handling of admin rights on content, i.e. not granting admin rights on external sources; improved autosuggestion dropdown search; added api to iterate over facet values while ignoring access rights
31 Jan 2019:
reduce amount of presumably unrelated search results; improved language detection in solr; added fields name_std and name_search for better searchability of attachments; don't display wiki markup in search result summaries; added field macro to capture use of wiki macros
10 Oct 2018:
mime types are now multivalued, e.g. and image is now tagged type: ["gif", "image", "attachment"]; better support for attachments listed in the autosuggest drop down box; the rudimentary type mapping is now based on the system mime types table and not using a typemap file in solr's config anymore; removed dependency on Image::Magick; fixed error exceeding the max string length in solr; the form name will now be used when no TopicType field is present to construct the TopicType facet; fixed support for ALLOWWEBVIEW = *
13 Aug 2018:
new alphabetical navigation for wiki users; fixed searching for summary; replaced jquery.scrollto with native scroll api; make number of items suggested configurable in jquery.autosuggest drop-down box
07 Jun 2018:
new index fields author_title, createauthor_title, title_first_letter; added support indexing arbitrary meta data; added support for ListyPlugin; added toggle "exact search" to search interface; depending on new TopicTitlePlugin now; fixed keyboard interaction of autosuggest box; fixed sorting facet values by title; much improved relavancy sorting
09 Jan 2018:
added support for jquery.i18n; improved solr schema for better findability; fixed solr sidebar in subwebs
18 Sep 2017:
replacing text_substring with text_prefix and text_suffix to improve substring matching; truncate document values larger than 32k to prevent solr from crashing; use flexbox for people search interface; fixed creating urls to ImagePlugin rest interface to generate thumbnail previews
23 Jan 2017:
converted WebServices::Solr to Moo; fixed documentation for iwatch realtime indexing; documentation of SOLRSCRIPTURL macro; using jquery.i18n for javascript translations now; new facet filter to search in facet values; improved indexing of user profile pages and their thumbnail image; indexing image geometry now; improved jquery.autosugest widget; improved ToggleFacetWidget; improved boosting of query ingrediences; mapping all office documents to a combined attachment type (document, presentation, spreadsheet, chart, …); better support for plenv in system services and cron jobs
18 Oct 2015:
fixed backwards compatibility with pre-unicode Foswiki; bring back solr::queryfields in SolrSearchBaseTemplate; fixed language facet to properly match language tags to their name; improved layout of search results as well as autosuggestion widget; removed workflow facet from default search; fixed icon mapping for topics that don't come with an icon defined in their TopicType; don't try to encode html entities without a code point in utf8; don't remove all macros from topic text, just some; removed dependency on MimeIconsPlugin as we are using fontawesome now; improved formula for sorting results by reference; fixed sorting in ajax-solr; fixed exposing/hiding parameters in ajax-solr; improved findability of content; i.e. when containing stop words only in the title; removed unused /browse search handler from solr config
01 Oct 2015:
improve default layout of search results; moved unsafe inline-javascript into a js file of its own
21 Sep 2015:
cache stringified attachments using Cache::FileCache now and added api to purge/clear cache regularly; removed IndexExtensions config parameter to let the stringifier decide on supported file formats; added support for Foswiki:Extensions/LikePlugin boosting search results by social preferences
17 Jul 2015:
added support for Foswiki-2.0 ; indexing workflow and state facets supporting Foswiki:Extensions/WorkflowPlugin; added author_url to solr schema; added google image and video mime types mapping them to "image" while indexing
27 Feb 2015:
upgraded to solr-5.0.0
29 Sep 2014:
moved to jsrender for templating, replacing the deprecated jquery.tmpl
29 Aug 2014:
fix mailto links in WikiUsers view template; fully specify rest security; fixed creating of working area for timestamps db; improved indexing of list values; fixed encoding error in SOLRSEARCH/FORMAT; use SOLR_EXTRAFILTER preference setting in auto-suggest widget as well; fixed applying strings and defaults in solrDictionary class; fixed applying extra-filters in SolrSearch; harvest facet headings for translations;
28 May 2014:
implemented new ACL style compatible with Foswiki >= 1.2
improved indexing performance; added configurable http timeouts takling to the solr backend; fixed language mappings for multilingual content; fixes due to latest changes in jquery.moment
17 Oct 2011:
fixed WebServices::Solr to only encode to utf8 if needed; fixed handling character encoding on a pure utf8 foswiki; fixed schema for spell correction
29 Sep 2011:
improved schema.xml: replaced StandardTokenizer with WhitespaceTokenizer, using new ClassicTokenizer and ClassicFilter to feed the spellchecker, switched spellchecker to JaroWinklerDistance and lowered the frequency threshold for a term to be added to the spellchecker; building the spellchecker when optimizing the index now; fixed detecting the content language
28 Sep 2011:
added multilanguage support per document; fixed default values in %SOLRSIMILAR; speeding up indexing by better caching ACLs; implemented mapping facet values to any other label; during query time; added Language facet to default search interface
26 Sep 2011:
improved default boosting in dismax to prefer topic hits a lot stronger than attachments; improved default cache settings for better default performace; added support to distribute updates and search in a master-slave setup; added boostquery, queryfields, phrasefields parameter to customize boosting and sorting; improved default schema while documenting it
21 Sep 2011:
upgrading to solr-3.4.0; fixed utf8 handling; added jump and i-feel-lucky options; made hidesingle configurable per facet; added disjunctivefacets and combinedfacets; fixed handling of date fields; support new ui::autocomplete in JQueryPlugin; using type-specific icons in Foswiki:Extensions/MimeIconPlugin if installed; fixed quoting lucene queries; indexing outgoing links to support fast backlinks; adding fields createauthor, language and collection to schema; disabling phonetic boost in schema by default; be more robust in case of mallformed DataForm definitions; copying every string field into a search field also to allow exact as well as fuzzy search; enhancing normalizeWebTopicName to create uniform web names using dots, not slashes everywhere; fixed parsing inline topic permissions; externalized sidebar pager into a new plugin of its own: Foswiki:Extensions/JQSerialPagerContrib; upgrading to WebService::Solr-0.14 … which now requires CPAN:XML::Easy instead of CPAN:XML::Generator; lots of improvements to SolrSearchBaseTemplate; now supporting Foswiki:Extensions/InfiniteScrollContrib in SolrSearch; documentation improvements
19 Apr 2011:
shipping a multicore setup by default; added support for Foswiki:Extensions/VirtualHostingContrib; fixed utf8 recoding; some usability improvements to faceted search interface; fixing illegal control characters in output (Oliver Schaub)
16 Dec 2010:
added state field to schema used for approval workflows; added solrjob to ease cronjobbing indexing; added docu how to use iwatch for almost-realtime indexing; fixed dependencies to include Foswiki:Extensions/FilterPlugin as well; fixed mapping facet values to their display title in search interface; fixed delta updates not properly removing outdated attachment entries when these where moved/renamed; and some minor html improvements
03 Dec 2010:
fixed solr-based WebChanges and SiteChanges using PatternSkin
01 Dec 2010:
adjustments due to changes in stringifier api; fixed removal of deleted webs from search index