Recently I have to implement a search for XML tags and attributes. To be able to achieve that, a search engine is required which can index content and can execute queries over it. After some refinement of the available options, I found that Lucene is pretty good. It is an open source Apache project which is being develop for a long time with a great community.
A few words about how Lucene works or how I use it 🙂
Lucene is composed of two major modules: Indexer and Searcher. Both of them works over directory called index. In some abstraction, the indexer represent a writer of the index and the searcher – a reader.
The search mechanism is pretty simple. First the indexer must index (write in the index) some content and after that the searcher can perform queries returning found results.
Lucene works with objects called Document. They are the units of search and index. The index can be consisted from one or more Documents. Indexing involves adding documents to an IndexWriter and searching involves retrieving documents via IndexSearcher.
The documents in Lucene context are an abstraction, for example if I want to create index with searchable users, every user has to be added to the index as document. But how Lucene will know how to find the correct user? The answer is fields. The document consists of one or more fields. The field represents name-value pair. If we continue with the example, to be able to search for an user by username and email, the document which will be indexed must have username and email field and their corresponding values.
In summary, indexing involves creation of Documents containing one or more fields, and adding these documents to an IndexWriter.
Searching requires already built index. It involves creating a Qeury and passing it to an IndexSearcher, which returns a list of Hits (found results).
The example is very simple, it creates IndexWriter providing indexing and IndexSearcher providing searching. The example is written in Kotlin because I’m interested in this language and wanted to try it, but maybe I’ll add another post for that 🙂