Re: Nodejs connector high latency

Hi Andy

Thanks for the response.

I have 2 DC's of 5 nodes each. I'm doing write against DC2 with local quorum consistency. nodetool confirmed that all space is consumed on DC2 and I did not set any replication for DC1. This is the expected behavior.

Regarding node.js behavior, I validated that node is not responsible for the delay since the application is deployed on a single server where latencies are high(25%-40% higher).

I turned on trace and found that there is some communication going on across DC, which introduced this delay.

Here is the attached events for one such session:

activity

source

source_elapsed

thread

READ message received from /DC2-node3

DC1-node2

5

MessagingService-Incoming-/DC2-node3

Executing single-partition query on roles

DC1-node2

179

ReadStage-2

Acquiring sstable references

DC1-node2

212

ReadStage-2

Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones

DC1-node2

247

ReadStage-2

Key cache hit for sstable 4

DC1-node2

273

ReadStage-2

Key cache hit for sstable 6

DC1-node2

301

ReadStage-2

Merged data from memtables and 2 sstables

DC1-node2

352

ReadStage-2

Read 1 live rows and 0 tombstone cells

DC1-node2

384

ReadStage-2

Read 1 live rows and 0 tombstone cells

DC1-node2

411

ReadStage-2

Enqueuing response to /DC2-node3

DC1-node2

428

ReadStage-2

Sending REQUEST_RESPONSE message to /DC2-node3

DC1-node2

475

MessagingService-Outgoing-/DC2-node3-Small

REQUEST_RESPONSE message received from /DC1-node2

DC2-node3

119313

MessagingService-Incoming-/DC1-node2

Processing response from /DC1-node2

DC2-node3

119367

RequestResponseStage-3

reading digest from /DC2-node4

DC2-node3

119489

Native-Transport-Requests-8

reading digest from /DC2-node1

DC2-node3

119518

Native-Transport-Requests-8

reading digest from /DC1-node2

DC2-node3

119538

Native-Transport-Requests-8

Sending READ message to /DC2-node4

DC2-node3

119536

MessagingService-Outgoing-/DC2-node4-Small

Executing single-partition query on roles

DC2-node3

119554

ReadStage-5

Sending READ message to /DC2-node1

DC2-node3

119562

MessagingService-Outgoing-/DC2-node1-Small

Acquiring sstable references

DC2-node3

119570

ReadStage-5

Sending READ message to /DC1-node2

DC2-node3

119579

MessagingService-Outgoing-/DC1-node2-Small

Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones

DC2-node3

119589

ReadStage-5

Key cache hit for sstable 1

DC2-node3

119609

ReadStage-5

Merged data from memtables and 1 sstables

DC2-node3

119651

ReadStage-5

Read 1 live rows and 0 tombstone cells

DC2-node3

119671

ReadStage-5

REQUEST_RESPONSE message received from /DC2-node4

DC2-node3

120682

MessagingService-Incoming-/DC2-node4

Processing response from /DC2-node4

DC2-node3

120714

RequestResponseStage-10

REQUEST_RESPONSE message received from /DC2-node1

DC2-node3

120995

MessagingService-Incoming-/DC2-node1

Processing response from /DC2-node1

DC2-node3

121088

RequestResponseStage-10

READ message received from /DC2-node3

DC1-node2

5

MessagingService-Incoming-/DC2-node3

Executing single-partition query on roles

DC1-node2

67

ReadStage-4

Acquiring sstable references

DC1-node2

90

ReadStage-4

Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones

The cassandra native protocol supports multiple simultaneous requests per connection. Most drivers by default only create one connection to each C* host in the local data center. That being said, that shouldn't be a problem, particularly if you are only executing 20 concurrent requests, this is something both driver clients and C* handles well. The driver does do some write batching to reduce the amount of system calls, but I'm reasonably confident this is not an issue.

It may be worth enabling client logging to see if that shines any light. You can also enable tracing on your requests by specifying traceQuery as a query option (example) to see if the delay is caused by C*-side processing.

Also keep in mind that all user code in node.js is handled in a single thread. If you have callbacks tied to your responses that do non-trivial work, that can delay subsequent requests from being processed, which may give impression that some queries are slow.

I used cassandra driver provided by datastax (3.5.0) library in nodejs. I've 5 nodes cluster. I'm writing to a table with quorum.

I observed that there is some spike in write. In ~20 writes, 2-5 writes are taking longer(~200ms). I debugged one of the node process with strace and found that longer latencies are batched and they use same fd to connect to cassandra. This may be the multiplexing.