2.X
Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library.
Elasticsearch is document oriented, meaning that it stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable.
As users, we can talk to any node in the cluster, including the master node. Every node knows where each document lives and can forward our request directly to the nodes that hold the data we are interested in. Whichever node we talk to manages the process of gathering the response from the node or nodes holding the data and returning the final response to the client. It is all managed transparently by Elasticsearch.
Shard
A shard is a low-level worker unit that holds just a slice of all the data in the index
Document
Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it.
update API, which can be used to make partial updates to a document. This API appears to change documents in place, but actually Elasticsearch is following exactly the same process as described previously:
- Retrieve the JSON from the old document
- Change it
- Delete the old document
- Index a new document
Write Conflicts
- Pessimistic concurrency control
Widely used by relational databases, this approach assumes that conflicting changes are likely to happen and so blocks access to a resource in order to prevent conflicts. A typical example is locking a row before reading its data, ensuring that only the thread that placed the lock is able to make changes to the data in that row. - Optimistic concurrency control
Used by Elasticsearch, this approach assumes that conflicts are unlikely to happen and doesn’t block operations from being attempted. However, if the underlying data has been modified between reading and writing, the update will fail. It is then up to the application to decide how it should resolve the conflict. For instance, it could reattempt the update, using the fresh data, or it could report the situation to the user.
Distributed Document Store
- shard = hash(routing) % number_of_primary_shards
Links:
References
- https://www.youtube.com/watch?v=PpX7J-G2PEo
- https://www.elastic.co/guide/en/elasticsearch/guide/current/_document_oriented.html