Running query against a single document

Discussion:

Aurélien MAZOYER

2018-09-21 12:57:52 UTC

Hi,

We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it against
two documents :
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
We expect to get these entries from the highlighter :
D1 eglise saint quentin
D2 saint quentin deladadoupa
But we got
eglise saint quentin for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use some
other way to achieve this functionality.
FYI we use Lucene 6.5.1.

Thank you for your help,

Regards,

Aurelien and Andrey
Tchiota GMBH

Tom Mortimer

2018-09-21 13:16:21 UTC

Permalink

Hi,

Have you considered using MemoryIndex
<https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html>
?

cheers,
Tom

tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer

Post by AurÃ©lien MAZOYER
Hi,
We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it against
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
D1 eglise saint quentin
D2 saint quentin deladadoupa
But we got
eglise saint quentin for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use some
other way to achieve this functionality.
FYI we use Lucene 6.5.1.
Thank you for your help,
Regards,
Aurelien and Andrey
Tchiota GMBH

Erick Erickson

2018-09-21 14:56:19 UTC

Permalink

bq. We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this,

I don't really understand this at all. How are you using the
highlighter component without creating an index? Custom code?

But that aside, there are dozens, if not hundreds of examples of this
in the Solr test code. You could write a Solr junit test, which
is "just some Java code" and run that.

To execute this within the test framework, you have two options:
1> from the top level "ant -Dtestcase=custom_test test", which takes a
long time to run
2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
have to have compiled your code of course for this to work.

BTW, if you skip all that and just use a Solr instance, one very
useful trick is to use &debug=true&debug.explainOther
(https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
That will show you exactly how the doc was
scored _whether or not_ it would have been returned by the primary query.

Best,
Erick

Post by Tom Mortimer
Hi,
Have you considered using MemoryIndex
<https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html>
?
cheers,
Tom
tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-***@lucene.apache.org
For additional commands, e-mail: java-user-***@lucene.apache.org

Aurélien MAZOYER

2018-09-27 08:58:28 UTC

Permalink

Hi Tom and Erick,
Thank you a lot for your answers.

@Tom : Yes, we have considered MemoryIndex. But as far as I understood, we
will have to create a MemoryIndex that contains 1 single document every
time we will want to test our query against a document. I think we'll have
to perform some tests to be sure that this is efficient.
@Erick :
We use this piece of code to run the highlighter directly on a TokenStream
created from a text string (fieldTextValue) :

QueryScorer queryScorer = new QueryScorer(luceneQuery);
TokenStream stream = TokenSources.getTokenStream(fieldName, fieldTextValue,
analyzer);
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
queryScorer);
TextFragment[] frag = highlighter.getBestTextFragments(stream,
fieldTextValue, true, 1000);

It seems to work pretty well for some queries, but I am afraid it works on
a kind of per-token basis and doesn't consider the context (I mean the
adjacent terms) to detect if a term is involved in the match or not.
The lucene explainer can totally address our needs, but as far as I know
it, it is not very efficient in term of performance. We will test it as
well.
We can combine Tom's suggestion about using MemoryIndex with the documents
and then run the explainer on this index.

Aurelien and Andrey
Tchiota GMBH

Post by Erick Erickson
bq. We would like to know if there is a way to test a query against a document
without creating an index. We were thinking that maybe we could use lucene
highlighter component
to achieve this,
I don't really understand this at all. How are you using the
highlighter component without creating an index? Custom code?
But that aside, there are dozens, if not hundreds of examples of this
in the Solr test code. You could write a Solr junit test, which
is "just some Java code" and run that.
1> from the top level "ant -Dtestcase=custom_test test", which takes a
long time to run
2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
have to have compiled your code of course for this to work.
BTW, if you skip all that and just use a Solr instance, one very
useful trick is to use &debug=true&debug.explainOther
(https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
That will show you exactly how the doc was
scored _whether or not_ it would have been returned by the primary query.
Best,
Erick

Post by Tom Mortimer
Hi,
Have you considered using MemoryIndex
<

https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html

Post by Tom Mortimer
?
cheers,
Tom
tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
On Fri, 21 Sep 2018 at 13:58, AurÃ©lien MAZOYER <

Post by AurÃ©lien MAZOYER
Hi,
We would like to know if there is a way to test a query against a

document

Post by Tom Mortimer

lucene

Post by Tom Mortimer

Post by AurÃ©lien MAZOYER
highlighter component
to achieve this, but it seems it doesn't work as expected with complex
queries.
For example, we create a SpanQuery (+spanFirst(field:saint, 1)
+spanNear([field:saint, field:quentin], 0, true)) and we tested it

against

Post by Tom Mortimer

Post by AurÃ©lien MAZOYER
D1={field=eglise saint quentin}
D2={field=saint quentin deladadoupa}
D1 eglise saint quentin
D2 saint quentin deladadoupa
But we got
eglise saint quentin for D1, which is unexpected from our
perspective because it doesn't match our SpanQuery.
Do you have any ideas if this approach is correct or if we better use

some

Post by Tom Mortimer

Post by AurÃ©lien MAZOYER
other way to achieve this functionality.
FYI we use Lucene 6.5.1.
Thank you for your help,
Regards,
Aurelien and Andrey
Tchiota GMBH

---------------------------------------------------------------------