xapian-core
1.4.24
|
Class representing a query. More...
#include <query.h>
Public Types | |
enum | op { OP_AND = 0 , OP_OR = 1 , OP_AND_NOT = 2 , OP_XOR = 3 , OP_AND_MAYBE = 4 , OP_FILTER = 5 , OP_NEAR = 6 , OP_PHRASE = 7 , OP_VALUE_RANGE = 8 , OP_SCALE_WEIGHT = 9 , OP_ELITE_SET = 10 , OP_VALUE_GE = 11 , OP_VALUE_LE = 12 , OP_SYNONYM = 13 , OP_MAX = 14 , OP_WILDCARD = 15 , OP_INVALID = 99 , LEAF_TERM = 100 , LEAF_POSTING_SOURCE , LEAF_MATCH_ALL , LEAF_MATCH_NOTHING } |
Query operators. More... | |
enum | { WILDCARD_LIMIT_ERROR , WILDCARD_LIMIT_FIRST , WILDCARD_LIMIT_MOST_FREQUENT } |
Public Member Functions | |
Query () | |
Construct a query matching no documents. | |
~Query () | |
Destructor. | |
Query (const Query &o) | |
Copying is allowed. | |
Query & | operator= (const Query &o) |
Copying is allowed. | |
Query (const std::string &term, Xapian::termcount wqf=1, Xapian::termpos pos=0) | |
Construct a Query object for a term. | |
Query (Xapian::PostingSource *source) | |
Construct a Query object for a PostingSource. | |
Query (double factor, const Xapian::Query &subquery) | |
Scale using OP_SCALE_WEIGHT. | |
Query (op op_, const Xapian::Query &subquery, double factor) | |
Scale using OP_SCALE_WEIGHT. | |
Query (op op_, const Xapian::Query &a, const Xapian::Query &b) | |
Construct a Query object by combining two others. | |
Query (op op_, const std::string &a, const std::string &b) | |
Construct a Query object by combining two terms. | |
Query (op op_, Xapian::valueno slot, const std::string &range_limit) | |
Construct a Query object for a single-ended value range. | |
Query (op op_, Xapian::valueno slot, const std::string &range_lower, const std::string &range_upper) | |
Construct a Query object for a value range. | |
Query (op op_, const std::string &pattern, Xapian::termcount max_expansion=0, int max_type=WILDCARD_LIMIT_ERROR, op combiner=OP_SYNONYM) | |
Query constructor for OP_WILDCARD queries. | |
template<typename I > | |
Query (op op_, I begin, I end, Xapian::termcount window=0) | |
Construct a Query object from a begin/end iterator pair. | |
const TermIterator | get_terms_begin () const |
Begin iterator for terms in the query object. | |
const TermIterator | get_terms_end () const |
End iterator for terms in the query object. | |
const TermIterator | get_unique_terms_begin () const |
Begin iterator for unique terms in the query object. | |
const TermIterator | get_unique_terms_end () const |
End iterator for unique terms in the query object. | |
Xapian::termcount | get_length () const |
Return the length of this query object. | |
bool | empty () const |
Check if this query is Xapian::Query::MatchNothing. | |
std::string | serialise () const |
Serialise this object into a string. | |
op | get_type () const |
Get the type of the top level of the query. | |
size_t | get_num_subqueries () const |
Get the number of subqueries of the top level query. | |
const Query | get_subquery (size_t n) const |
Read a top level subquery. | |
std::string | get_description () const |
Return a string describing this object. | |
const Query | operator&= (const Query &o) |
Combine with another Xapian::Query object using OP_AND. | |
const Query | operator|= (const Query &o) |
Combine with another Xapian::Query object using OP_OR. | |
const Query | operator^= (const Query &o) |
Combine with another Xapian::Query object using OP_XOR. | |
const Query | operator*= (double factor) |
Scale using OP_SCALE_WEIGHT. | |
const Query | operator/= (double factor) |
Inverse scale using OP_SCALE_WEIGHT. | |
Query (Query::op op_) | |
Construct with just an operator. | |
Static Public Member Functions | |
static const Query | unserialise (const std::string &serialised, const Registry ®=Registry()) |
Unserialise a string and return a Query object. | |
Static Public Attributes | |
static const Xapian::Query | MatchNothing |
A query matching no documents. | |
static const Xapian::Query | MatchAll |
A query matching all documents. | |
Class representing a query.
anonymous enum |
enum Xapian::Query::op |
Query operators.
Enumerator | |
---|---|
OP_AND | Match only documents which all subqueries match. When used in a weighted context, the weight is the sum of the weights for all the subqueries. |
OP_OR | Match documents which at least one subquery matches. When used in a weighted context, the weight is the sum of the weights for matching subqueries (so additional matching subqueries will mean a higher weight). |
OP_AND_NOT | Match documents which the first subquery matches but no others do. When used in a weighted context, the weight is just the weight of the first subquery. |
OP_XOR | Match documents which an odd number of subqueries match. When used in a weighted context, the weight is the sum of the weights for matching subqueries (so additional matching subqueries will mean a higher weight). |
OP_AND_MAYBE | Match the first subquery taking extra weight from other subqueries. When used in a weighted context, the weight is the sum of the weights for matching subqueries (so additional matching subqueries will mean a higher weight). Because only the first subquery determines which documents are matched, in a non-weighted context only the first subquery matters. |
OP_FILTER | Match like OP_AND but only taking weight from the first subquery. When used in a non-weighted context, OP_FILTER and OP_AND are equivalent. In older 1.4.x, the third and subsequent subqueries were ignored in some situations. This was fixed in 1.4.15. |
OP_NEAR | Match only documents where all subqueries match near each other. The subqueries must match at term positions within the specified window size, in any order. Currently subqueries must be terms or terms composed with OP_OR. When used in a weighted context, the weight is the sum of the weights for all the subqueries. |
OP_PHRASE | Match only documents where all subqueries match near and in order. The subqueries must match at term positions within the specified window size, in the same term position order as subquery order. Currently subqueries must be terms or terms composed with OP_OR. When used in a weighted context, the weight is the sum of the weights for all the subqueries. |
OP_VALUE_RANGE | Match only documents where a value slot is within a given range. This operator never contributes weight. |
OP_SCALE_WEIGHT | Scale the weight contributed by a subquery. The weight is the weight of the subquery multiplied by the specified non-negative scale factor (so if the scale factor is zero then the subquery contributes no weight). |
OP_ELITE_SET | Pick the best N subqueries and combine with OP_OR. If you want to implement a feature which finds documents similar to a piece of text, an obvious approach is to build an "OR" query from all the terms in the text, and run this query against a database containing the documents. However such a query can contain a lots of terms and be quite slow to perform, yet many of these terms don't contribute usefully to the results. The OP_ELITE_SET operator can be used instead of OP_OR in this situation. OP_ELITE_SET selects the most important ''N'' terms and then acts as an OP_OR query with just these, ignoring any other terms. This will usually return results just as good as the full OP_OR query, but much faster. In general, the OP_ELITE_SET operator can be used when you have a large OR query, but it doesn't matter if the search completely ignores some of the less important terms in the query. The subqueries don't have to be terms. If they aren't then OP_ELITE_SET could potentially pick a subset which doesn't actually match any documents even if the full OR would match some (because OP_ELITE_SET currently selects those subqueries which can return the highest weights). This is probably rare in practice though. You can specify a parameter to the query constructor which controls the number of subqueries which OP_ELITE_SET will pick. If not specified, this defaults to 10 (Xapian used to default to <code>ceil(sqrt(number_of_subqueries))</code> if there are more than 100 subqueries, but this rather arbitrary special case was dropped in 1.3.0). For example, this will pick the best 7 terms: <pre> Xapian::Query query(Xapian::Query::OP_ELITE_SET, subqs.begin(), subqs.end(), 7); </pre> If the number of subqueries is less than this threshold, OP_ELITE_SET behaves identically to OP_OR. When used with a sharded database, OP_ELITE_SET currently picks the subqueries to use separately for each shard based on the maximum weight they can return in that shard. This means it probably won't select exactly the same terms, and so the results of the search may not be exactly the same as for a single database with equivalent contents. |
OP_VALUE_GE | Match only documents where a value slot is >= a given value. Similar to @a OP_VALUE_RANGE, but open-ended. This operator never contributes weight. |
OP_VALUE_LE | Match only documents where a value slot is <= a given value. Similar to @a OP_VALUE_RANGE, but open-ended. This operator never contributes weight. |
OP_SYNONYM | Match like OP_OR but weighting as if a single term. The weight is calculated combining the statistics for the subqueries to approximate the weight of a single term occurring with those statistics. |
OP_MAX | Pick the maximum weight of any subquery. Matches the same documents as @a OP_OR, but the weight contributed is the maximum weight from any matching subquery (for OP_OR, it's the sum of the weights from the matching subqueries). Added in Xapian 1.3.2. |
OP_WILDCARD | Wildcard expansion. Added in Xapian 1.3.3. |
OP_INVALID | Construct an invalid query. This can be useful as a placeholder - for example @a RangeProcessor uses it as a return value to indicate that a range hasn't been recognised. |
LEAF_TERM | Value returned by get_type() for a term. |
LEAF_POSTING_SOURCE | Value returned by get_type() for a PostingSource. |
LEAF_MATCH_ALL | Value returned by get_type() for MatchAll or equivalent. This is returned for any <code>Xapian::Query(std::string())</code> object. |
LEAF_MATCH_NOTHING | Value returned by get_type() for MatchNothing or equivalent. This is returned for any <code>Xapian::Query()</code> object. |
|
inline |
Construct a query matching no documents.
MatchNothing is a static instance of this.
When combined with other Query objects using the various supported operators, Query()
works like false
in boolean logic, so Query() & q
is Query()
, while Query() | q
is q
.
Referenced by operator&=(), operator^=(), and operator|=().
|
inline |
Copying is allowed.
The internals are reference counted, so copying is cheap.
Xapian::Query::Query | ( | const std::string & | term, |
Xapian::termcount | wqf = 1 , |
||
Xapian::termpos | pos = 0 |
||
) |
Construct a Query object for a term.
term | The term. An empty string constructs a query matching all documents (MatchAll is a static instance of this). |
wqf | The within-query frequency. (default: 1) |
pos | The query position. Currently this is mainly used to determine the order of terms obtained via get_terms_begin(). (default: 0) |
Xapian::Query::Query | ( | double | factor, |
const Xapian::Query & | subquery | ||
) |
Scale using OP_SCALE_WEIGHT.
factor | Non-negative real number to multiply weights by. |
subquery | Query object to scale weights from. |
Xapian::Query::Query | ( | op | op_, |
const Xapian::Query & | subquery, | ||
double | factor | ||
) |
Scale using OP_SCALE_WEIGHT.
In this form, the op_ parameter is totally redundant - use Query(factor, subquery) in preference.
op_ | Must be OP_SCALE_WEIGHT. |
factor | Non-negative real number to multiply weights by. |
subquery | Query object to scale weights from. |
|
inline |
Construct a Query object by combining two others.
op_ | The operator to combine the queries with. |
a | First subquery. |
b | Second subquery. |
|
inline |
Construct a Query object by combining two terms.
op_ | The operator to combine the terms with. |
a | First term. |
b | Second term. |
Xapian::Query::Query | ( | op | op_, |
Xapian::valueno | slot, | ||
const std::string & | range_limit | ||
) |
Construct a Query object for a single-ended value range.
op_ | Must be OP_VALUE_LE or OP_VALUE_GE currently. |
slot | The value slot to work over. |
range_limit | The limit of the range. |
Xapian::Query::Query | ( | op | op_, |
Xapian::valueno | slot, | ||
const std::string & | range_lower, | ||
const std::string & | range_upper | ||
) |
Construct a Query object for a value range.
op_ | Must be OP_VALUE_RANGE currently. |
slot | The value slot to work over. |
range_lower | Lower end of the range. |
range_upper | Upper end of the range. |
Xapian::Query::Query | ( | op | op_, |
const std::string & | pattern, | ||
Xapian::termcount | max_expansion = 0 , |
||
int | max_type = WILDCARD_LIMIT_ERROR , |
||
op | combiner = OP_SYNONYM |
||
) |
Query constructor for OP_WILDCARD queries.
op_ | Must be OP_WILDCARD |
pattern | The wildcard pattern - currently this is just a string and the wildcard expands to terms which start with exactly this string. |
max_expansion | The maximum number of terms to expand to (default: 0, which means no limit) |
max_type | How to enforce max_expansion - one of WILDCARD_LIMIT_ERROR (the default), WILDCARD_LIMIT_FIRST or WILDCARD_LIMIT_MOST_FREQUENT. When searching multiple databases, the expansion limit is currently applied independently for each database, so the total number of terms may be higher than the limit. This is arguably a bug, and may change in future versions. |
combiner | The Query::op to combine the terms with - one of OP_SYNONYM (the default), OP_OR or OP_MAX. |
|
inline |
Construct a Query object from a begin/end iterator pair.
Dereferencing the iterator should return a Xapian::Query, a non-NULL Xapian::Query*, a std::string or a type which converts to one of these (e.g. const char*).
If begin == end then there are no subqueries and the resulting Query won't match anything.
op_ | The operator to combine the queries with. |
begin | Begin iterator. |
end | End iterator. |
window | Window size for OP_NEAR and OP_PHRASE, or 0 to use the number of subqueries as the window size (default: 0). |
|
inlineexplicit |
Construct with just an operator.
op_ | The operator to use - currently only OP_INVALID is useful. |
const Query Xapian::Query::get_subquery | ( | size_t | n | ) | const |
Read a top level subquery.
n | Return the n-th subquery (starting from 0) - only valid when 0 <= n < get_num_subqueries(). |
const TermIterator Xapian::Query::get_terms_begin | ( | ) | const |
Begin iterator for terms in the query object.
The iterator returns terms in ascending query position order, and will return the same term in each unique position it occurs in. If you want the terms in sorted order and without duplicates, see get_unique_terms_begin().
const TermIterator Xapian::Query::get_unique_terms_begin | ( | ) | const |
Begin iterator for unique terms in the query object.
Terms are sorted and terms with the same name removed from the list.
If you want the terms in ascending query position order, see get_terms_begin().
|
inline |
Scale using OP_SCALE_WEIGHT.
factor | Non-negative real number to multiply weights by. |
|
inline |
Inverse scale using OP_SCALE_WEIGHT.
factor | Positive real number to divide weights by. |
Copying is allowed.
The internals are reference counted, so assignment is cheap.
|
static |
Unserialise a string and return a Query object.
serialised | the string to unserialise. |
reg | Xapian::Registry object to use to unserialise user-subclasses of Xapian::PostingSource (default: standard registry). |
|
static |
A query matching all documents.
This is a static instance of Xapian::Query(std::string())
. If you are constructing Query objects which use MatchAll in different threads then the reference counting of the static object can get messed up by concurrent access so you should instead use Xapian::Query(std::string())
directly.
|
static |
A query matching no documents.
This is a static instance of a default-constructed Xapian::Query object. It is safe to use concurrently from different threads, unlike MatchAll (this is because MatchNothing has a NULL internal object so there's no reference counting happening).
When combined with other Query objects using the various supported operators, MatchNothing works like false
in boolean logic, so MatchNothing & q
is MatchNothing
, while MatchNothing | q
is q
.