RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index

Discussion:

Matthias Müller

2018-09-25 08:43:53 UTC

Hi,

Lucene provides different storage options for in-memory indexes. I
found three structures that would qualify for the task:

* RamDirectory (which I currently use for prototyping, but wonder if it
is the ideal choice for my task)
* MemoryIndex, which claims to have better performance and resource use
for small documents
* MMapDirectory which should outperform RamDirectory for huge indices
(what is "huge?")

My plan is to periodically index some properties (string codes, longs,
lat/lng points) of a larger database content with Lucene for quicker
lookups (compared to slow SQL queries).

What would be the most efficient (or intended) storage option for such
an index in terms of lookup speed and CPU/memory use? Below [1] is a
brief summary of the index contents and I hope these figures are
sufficient to get a recommendation. But I am also happy to study more
detailed documentation on the matter.

- Matthias

[1]: Summary of index contents and intended use
* Total documents: 500.000 - 1.000.000, may grow to 10.000.000 records
in mid future.
* Document fields (all of them single value fields):
* String (9x), usually 1-10 characters long, mostly recurring
values (5% distinct)
* LongPoint (4x), two fields contain mostly distinct values, one
lostly recurring values (5-10% distinct), one field acts as a primary
key
* LatLonPoint (1x), 30% distinct
* Refresh interval: 1..5 minutes (I currently create a fresh index
instance on each update and discard the old one)
* Most queries are range queries and exact matches on several
properties, sometimes I need to retrieve the property fields of a
single document based on a primary key value.

Dawid Weiss

2018-09-25 08:46:40 UTC

Permalink

Use MMapDirectory on a temporary location, Matthias. If you really
need in-memory indexes, a new Directory implementation is coming
(RAMDirectory will be deprecated, then removed), but the difference
compared to MMapDirectory is typically not worth the hassle. See this
issue for more discussion.

https://issues.apache.org/jira/browse/LUCENE-8438

Dawid
On Tue, Sep 25, 2018 at 10:44 AM Matthias Müller

Post by Matthias MÃ¼ller
Hi,
Lucene provides different storage options for in-memory indexes. I
* RamDirectory (which I currently use for prototyping, but wonder if it
is the ideal choice for my task)
* MemoryIndex, which claims to have better performance and resource use
for small documents
* MMapDirectory which should outperform RamDirectory for huge indices
(what is "huge?")
My plan is to periodically index some properties (string codes, longs,
lat/lng points) of a larger database content with Lucene for quicker
lookups (compared to slow SQL queries).
What would be the most efficient (or intended) storage option for such
an index in terms of lookup speed and CPU/memory use? Below [1] is a
brief summary of the index contents and I hope these figures are
sufficient to get a recommendation. But I am also happy to study more
detailed documentation on the matter.
- Matthias
[1]: Summary of index contents and intended use
* Total documents: 500.000 - 1.000.000, may grow to 10.000.000 records
in mid future.
* String (9x), usually 1-10 characters long, mostly recurring
values (5% distinct)
* LongPoint (4x), two fields contain mostly distinct values, one
lostly recurring values (5-10% distinct), one field acts as a primary
key
* LatLonPoint (1x), 30% distinct
* Refresh interval: 1..5 minutes (I currently create a fresh index
instance on each update and discard the old one)
* Most queries are range queries and exact matches on several
properties, sometimes I need to retrieve the property fields of a
single document based on a primary key value.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-***@lucene.apache.org
For additional commands, e-mail: java-user-***@lucene.apache.org

Matthias Müller

2018-09-25 12:32:01 UTC

Permalink

Thanks Dawid, glad I asked!

Post by Dawid Weiss
Use MMapDirectory on a temporary location, Matthias. If you really
need in-memory indexes, a new Directory implementation is coming
(RAMDirectory will be deprecated, then removed), but the difference
compared to MMapDirectory is typically not worth the hassle. See this
issue for more discussion.
https://issues.apache.org/jira/browse/LUCENE-8438
Dawid
On Tue, Sep 25, 2018 at 10:44 AM Matthias Müller

Post by Matthias MÃ¼ller
Hi,
Lucene provides different storage options for in-memory indexes. I
* RamDirectory (which I currently use for prototyping, but wonder if it
is the ideal choice for my task)
* MemoryIndex, which claims to have better performance and resource use
for small documents
* MMapDirectory which should outperform RamDirectory for huge indices
(what is "huge?")
My plan is to periodically index some properties (string codes, longs,
lat/lng points) of a larger database content with Lucene for
quicker
lookups (compared to slow SQL queries).
What would be the most efficient (or intended) storage option for such
an index in terms of lookup speed and CPU/memory use? Below [1] is a
brief summary of the index contents and I hope these figures are
sufficient to get a recommendation. But I am also happy to study more
detailed documentation on the matter.
- Matthias
[1]: Summary of index contents and intended use
* Total documents: 500.000 - 1.000.000, may grow to 10.000.000 records
in mid future.
* String (9x), usually 1-10 characters long, mostly recurring
values (5% distinct)
* LongPoint (4x), two fields contain mostly distinct values, one
lostly recurring values (5-10% distinct), one field acts as a primary
key
* LatLonPoint (1x), 30% distinct
* Refresh interval: 1..5 minutes (I currently create a fresh index
instance on each update and discard the old one)
* Most queries are range queries and exact matches on several
properties, sometimes I need to retrieve the property fields of a
single document based on a primary key value.

---------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-***@lucene.apache.org
For additional commands, e-mail: java-user-***@lucene.apache.org