I am putting various kind of logs into Elasticsearch with daily time-based index (somedata-YYYY.MM.DD).Recently, I have started to put many other kind of logs into ES, then ES started to logs many ProcessClusterEventTimeoutException after 00:00 AM.

ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [fluentd]) within 30s] at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:349) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Actually, I have more than 200 indices every day, so when the date changes, almost all of them will need new index for the next day.

I suspected that some node (maybe master node) was under high CPU load, but as show below, CPU usage is under 5% in all nodes(servers), and CPU usage is dropped during this time period (00:00-00:10 AM)

Elasticsearch's cluster state management is single threaded for simplicity so I wouldn't expect to see the load average really spike.

Are you using dynamic mappings? Those can cause lots of extra cluster state changes as new properties are dynamically added. It is usually much quicker to set up the mapping before hand either by creating the index before it is needed with the mapping you want or by setting up templates. Creating the indexes before they are needed is a fairly nice thing to do because you can stagger them or just set the timeout to some super high number.

200 daily indices sounds like a lot. What is the rationale behind having so many? How many shards does that result in on a daily basis? What is the average shard size? How long do you keep your data in the cluster?