Discussion:
Running query against a single document
Aurélien MAZOYER
2018-09-21 12:57:52 UTC
Permalink
Hi,

We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it against
two documents :
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
We expect to get these entries from the highlighter :
D1 eglise saint quentin
D2 <B>saint</B> <B>quentin</B> deladadoupa
But we got
eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use some
other way to achieve this functionality.
FYI we use Lucene 6.5.1.

Thank you for your help,

Regards,

Aurelien and Andrey
Tchiota GMBH
Tom Mortimer
2018-09-21 13:16:21 UTC
Permalink
Hi,

Have you considered using MemoryIndex
<https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html>
?

cheers,
Tom


tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
Post by Aurélien MAZOYER
Hi,
We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it against
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
D1 eglise saint quentin
D2 <B>saint</B> <B>quentin</B> deladadoupa
But we got
eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use some
other way to achieve this functionality.
FYI we use Lucene 6.5.1.
Thank you for your help,
Regards,
Aurelien and Andrey
Tchiota GMBH
Erick Erickson
2018-09-21 14:56:19 UTC
Permalink
bq. We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this,

I don't really understand this at all. How are you using the
highlighter component without creating an index? Custom code?

But that aside, there are dozens, if not hundreds of examples of this
in the Solr test code. You could write a Solr junit test, which
is "just some Java code" and run that.

To execute this within the test framework, you have two options:
1> from the top level "ant -Dtestcase=custom_test test", which takes a
long time to run
2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
have to have compiled your code of course for this to work.

BTW, if you skip all that and just use a Solr instance, one very
useful trick is to use &debug=true&debug.explainOther
(https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
That will show you exactly how the doc was
scored _whether or not_ it would have been returned by the primary query.

Best,
Erick
Post by Tom Mortimer
Hi,
Have you considered using MemoryIndex
<https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html>
?
cheers,
Tom
tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
Post by Aurélien MAZOYER
Hi,
We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it against
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
D1 eglise saint quentin
D2 <B>saint</B> <B>quentin</B> deladadoupa
But we got
eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use some
other way to achieve this functionality.
FYI we use Lucene 6.5.1.
Thank you for your help,
Regards,
Aurelien and Andrey
Tchiota GMBH
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-***@lucene.apache.org
For additional commands, e-mail: java-user-***@lucene.apache.org
Aurélien MAZOYER
2018-09-27 08:58:28 UTC
Permalink
Hi Tom and Erick,
Thank you a lot for your answers.

@Tom : Yes, we have considered MemoryIndex. But as far as I understood, we
will have to create a MemoryIndex that contains 1 single document every
time we will want to test our query against a document. I think we'll have
to perform some tests to be sure that this is efficient.
@Erick :
We use this piece of code to run the highlighter directly on a TokenStream
created from a text string (fieldTextValue) :

QueryScorer queryScorer = new QueryScorer(luceneQuery);
TokenStream stream = TokenSources.getTokenStream(fieldName, fieldTextValue,
analyzer);
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
queryScorer);
TextFragment[] frag = highlighter.getBestTextFragments(stream,
fieldTextValue, true, 1000);

It seems to work pretty well for some queries, but I am afraid it works on
a kind of per-token basis and doesn't consider the context (I mean the
adjacent terms) to detect if a term is involved in the match or not.
The lucene explainer can totally address our needs, but as far as I know
it, it is not very efficient in term of performance. We will test it as
well.
We can combine Tom's suggestion about using MemoryIndex with the documents
and then run the explainer on this index.

Aurelien and Andrey
Tchiota GMBH
Post by Erick Erickson
bq. We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this,
I don't really understand this at all. How are you using the
highlighter component without creating an index? Custom code?
But that aside, there are dozens, if not hundreds of examples of this
in the Solr test code. You could write a Solr junit test, which
is "just some Java code" and run that.
1> from the top level "ant -Dtestcase=custom_test test", which takes a
long time to run
2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
have to have compiled your code of course for this to work.
BTW, if you skip all that and just use a Solr instance, one very
useful trick is to use &debug=true&debug.explainOther
(https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
That will show you exactly how the doc was
scored _whether or not_ it would have been returned by the primary query.
Best,
Erick
Post by Tom Mortimer
Hi,
Have you considered using MemoryIndex
<
https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html
Post by Tom Mortimer
?
cheers,
Tom
tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
On Fri, 21 Sep 2018 at 13:58, Aurélien MAZOYER <
Post by Aurélien MAZOYER
Hi,
We would like to know if there is a way to test a query against a
document
Post by Tom Mortimer
Post by Aurélien MAZOYER
without creating an index. We were thinking that maybe we could use
lucene
Post by Tom Mortimer
Post by Aurélien MAZOYER
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it
against
Post by Tom Mortimer
Post by Aurélien MAZOYER
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
D1 eglise saint quentin
D2 <B>saint</B> <B>quentin</B> deladadoupa
But we got
eglise <B>saint</B> <B>quentin</B> for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use
some
Post by Tom Mortimer
Post by Aurélien MAZOYER
other way to achieve this functionality.
FYI we use Lucene 6.5.1.
Thank you for your help,
Regards,
Aurelien and Andrey
Tchiota GMBH
---------------------------------------------------------------------
Loading...