Thursday, July 24, 2014

ElasticSearch, LogStash, and Kibana - Beginners guide | ElasticSearch

Part 4 to ElasticSearch, LogStash, and Kibana - Beginners guide


 This is where the meat of the operation is. Elasticsearch. Elasticsearch houses your imported data, indexes it, and is the way you interface with it. You can interface with ElasticSearch via its api (essentially using curl), or via things like kibana and other plugins.

But for now I'm just going over installing elasticsearch and the configuration I use. One extra thing I will touch on is a tribe set up which is awesome. I'll explain further down about tribes.


Installing:
# Get the deb file
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb

# Do some java exporting
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
# or you may need to do this one
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/

# Install it
sudo dpkg -i elasticsearch-1.2.1.deb

And that's it. Control it like any other service: service elasticsearch [start,stop,restart,status]


Configuration:

There is a lot you can configure in Elasticsearch. But here is my config file (minus the comments and empty lines). Very basic, but I'll go over the parts.

File: /etc/elasticsearch/elasticsearch.yml
cluster.name: elastic-arsh
node.name: "Elastic Arlin"
node.master: true
node.data: true
index.number_of_shards: 10
index.number_of_replicas: 1
discovery.zen.ping.unicast.hosts: ["10.100.129.31"]



And that's it! Simple. Now I'll break it down.


cluster.name: es-cluster
This is the name of your elasticsearch cluster. this is the same name to use in logstash for the cluster name. It can be whatever you want.

node.name: "Elastic Server 1"
Another arbitrary name. This one is the name of the individual server running elasticsearch.

node.master: true
The node master is the one keeping track of which node has what data and where that node is located. If you have a server just hosting data, it doesn't need to be a master. If you have multiple masters, they will decide which master to use automatically.

node.data: true
This decides whether the server keeps any of the indexes on it. You can have a server be perhaps a master but not have any data on it.

index.number_of_shards: 10
index.number_of_replicas: 1
This is where things can get complicated. Shards and replicas are very related to eachother. A shard can be considered a chunk of your data. Let's say you have the default of 5 shards. This means for a days index, Elasticsearch has 5 shards (which each one is a lucene instance) to divide your data among. More shards means slower indexing, but easier to divide among the nodes in a cluster. Replicas are an exact copy of a shard. It is a copy. Which means Elasticsearch doesn't write data to it directly. Instead it writes to a shard which gets copied to a replica.

This seems redundant but it's great for clustering. Let's say you have 2 nodes in a cluster.

Node 1 has shards 1-3 and node 2 has 4-5. 
Node 1 has replicas 4-5 and node 2 has 1-3.

So each node has a full set of data on it in case one node is lost. BUT you have the processing and indexing power of 2 separate nodes. When one node is lost searching is still possible because elasticsearch will rebuild the index based on the replica data it has.

There are sites that go in to more detail obviously but the correlation to remember is:
More shards = Slower Indexing
More replicas = Faster searching

discovery.zen.ping.unicast.hosts: ["10.100.10.14"]
This is the IP of the master (or comma seperated masters) for a cluster.




Tribes:

Tribes can be very useful. But they can also be a little tricky to get going. Here is the situation I use them in.

Elasticsearch transfers a lot of data between nodes. But let's say you have a datacenter in London and one in Texas. In my case no one would want the node data having to constantly go that far between two datacenters due to bandwidth usage. But setting up 2 seperate clusters would mean 2 sets of similar data with 2 interfaces to work with. And that sucks.

Enter in Tribes. An Elasticsearch tribe is when you have multiple clusters that you need to work with but don't want to share data between. So if I'l indexing logs in London and logs in Texas, I can use a Tribe setup so that both are searchable within one interface (Kibana) and they don't have to replicate their data to eachother.

You do this by adding in what I call the Tribe Chief. Set up your clusters like normal. Once they are set up, you configure the tribe chief's elasticsearch file to communicate with those clusters. The setup is very similar to a normal elasticsearch set up.


node.name: "Tribe Chief"

tribe.London.cluster.name: "London-Elasticsearch"
tribe.Texas.cluster.name: "Texas-Elasticsearch"
node.master: flase
node.data: false
discovery.zen.ping.unicast.hosts: ["10.100.10.14",
"10.200.20.24"]

This tells the Tribe Chief node that it doesn't hold data and doesn't have a cluster, but to communicate with the unicast IP's listed (masters for both clusters) and use their data combined. So the Tribe Chief is what you would want to run your Kibana interface on.

There is a catch to this though. You can't have indexes that share the same name. If you do, the tribe chief will pick one index to use and ignore any others with the same name. There is a place in the logstash output where you can specify an index name. Just make sure the naming is unique per cluster.

Making the indexes from multiple clusters searchable from Kibana (running on the chief) is easy. 

Open the Dashboard settings
 

Click on Index at the top  

 

And then just add in the indexes from your clusters. Use the same naming scheme you used in the Logstash output. And that's it. You will see the chief join the clusters as a node and the chief will be able to search all specified indexes in any cluster without the need for clusters sharing data.

No comments:

Post a Comment