What is ElasticSearch?
Elasticsearch is a flexible and
powerful open source, distributed, real-time search and analytics engine.
Architected from the ground up for use in distributed environments where
reliability and scalability are must haves, Elasticsearch gives you the ability
to move easily beyond simple full-text search. Through its robust set of APIs
and query DSLs, plus clients for the most popular programming languages,
Elasticsearch delivers on the near limitless promises of search technology.
What is WordNet?
WordNet is a lexical database for
the English language. It groups English words into sets of synonyms called
synsets, provides short, general definitions, and records the various semantic
relations between these synonym sets. The purpose is twofold: to produce a
combination of dictionary and thesaurus that is more intuitively usable, and to
support automatic text analysis and artificial intelligence applications. The
database and software tools have been released under a BSD style license and
can be downloaded and used freely. The database can also be browsed online.
What is ElasticSearch Analyzers?
Analyzers are composed of a single
Tokenizer and zero or more TokenFilters. The tokenizer may be preceded by one
or more CharFilters. The analysis module allows you to register Analyzers under
logical names which can then be referenced either in mapping definitions or in
certain APIs.
What is ElasticSearch Filters?
Filters can be a great candidate
for caching. Caching the result of a filter does not require a lot of memory,
and will cause other queries executing against the same filter (same
parameters) to be blazingly fast.
Some
filters already produce a result that is easily cacheable, and the difference
between caching and not caching them is the act of placing the result in the
cache or not. These filters, which include the term, terms, prefix, and range
filters, are by default cached and are recommended to use (compared to the
equivalent query version) when the same filter (same parameters) will be used across
multiple different queries (for example, a range filter with age higher than
10).
Steps
to Configure the WordNet in ElasticSearch
After installing the elasticsearch , you
need to configure the WordNet to access the synonyms.
Step 1: Create a directory called "analysis" in the
elasticsearch config directory.
Step
3: Extract the Zip file.
Step 4: Copy the "wn_s.pl" file from the Wordnet extracted folder and Paste to elasticsearch "analysis" folder.
Step 5: Start the elasticsearch server.
ElasticSearch Synonyms Filter
using WordNet
The
following example is used to create a ElasticSearch Synonyms Filter with
WordNet.
PUT
Requests:
Create a
Index with Wordnet Mappings.
{
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
}
}
}
},
"mappings" : {
"_default_": {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "synonym"
}
}
}
}
}
Add a
values in that index,
http://localhost:9200/projects/project/1{
"name" : "child"
}
http://localhost:9200/projects/project/2
{
"name" : "baby"
}
POST
Request:
http://localhost:9200/projects/_search?pretty=true
{
"query" : {
"match": {
"name": {
"query": "child"
}
}
}
}
Output:
{
"took": 2,
"timed_out": false,
"_shards":
{
"total": 1,
"successful": 1,
"failed": 0
},
"hits":
{
"total": 2,
"max_score": 2.3731742,
"hits":
[
{
"_index": "projects",
"_type": "project",
"_id": "1",
"_score": 2.3731742,
"_source":
{
"name": "child"
}
},
{
"_index": "projects",
"_type": "project",
"_id": "2",
"_score": 0.028331274,
"_source":
{
"name": "baby"
}
}
]
}
}
NOTE: I am using firefox rest client to run this example.
Well Explained. Thanks.
ReplyDeleteVery helpful man!!
ReplyDeleteYou can check that the index contains synonyms for a given word like this:
ReplyDeletecurl -XGET 'localhost:9200/projects/_analyze?pretty&analyzer=synonym' -d 'child'
Note that the index with synonyms takes 3-4x disk space than the one without.
was struggling to find the Wordnet file to integrate with Elastic search,
ReplyDeleteWorked Like a charm, Thanks man!.