Transcript

3.
Real Time Bidding (RTB)
●
Real-time bidding is a dynamic auction process where each
impression is a bid for in (near) real time versus a static auction
●
Kenshoo is engaged In Facebook Exchange (FBX)
●
In FBX, each bid has a life-time of 120ms. All transactions have to
complete within that period, and the winning ad is presented to the
user.
●
Kenshoo employs ad re-targeting, where search engine campaigns
are extended to the social network, thus giving a much higher ROI for
our customers

11.
C* Cluster Storage
●
We started with Amazon EBS:
○
○
○
●
With small #nodes (up to 4 nodes): You want persistent
storage; avoid running repairs if you lose a node
4xEBS devices in RAID10 configuration: Provide up to 1000
IOPs and bursts of up to 2000 IOPS
Cheap in AWS
8 nodes with Ephemeral Devices:
○
○
○
○
Lower risk: if you lose a node, recovery isn’t as heavy on the
whole cluster
We used RAID0
Higher performance (double than EBS)
Free, bundled within the instances

12.
C* Cluster Storage continued
●
16 nodes with Ephemeral Devices:
○
○
○
●
When load became heavy we grew to 16 nodes
Compactions and repairs harmed the cluster latency
We had to use Provisioned IOPs devices for C* maintenance
C3 Instance type with SSD:
○ Came just in time providing ephemeral SSD storage
○ They solved our performance problems and enabled
seamless compactions and repairs
○ Amazon currently has scarce deployment of this H/W and
nodes are not stable
○ Not available yet in all regions
○ C3 Nodes Deployment are not always a possiblity due to AWS
capacity issues
○ Amazon promised to resolve the C3 issues next month

14.
Monitoring
●
We heavily rely on DataStax OpsCenter
●
We grab OpsCenter Metrics out for graphings
●
We wrote our own Read/Write Speed Test on separate dedicated KeySpace on
each node to detect bottlenecks and problematic nodes
●
We Sample the data separately from the Application to detect if the problem
origins are C* or the application

15.
What have we learned
●
●
●
●
Storage:
○ Use SSD:
■ It provides high and stable disk performance
■ Neutralizes Compaction and Repair effects on the cluster
■ Worth the money
Network:
■ Use highest bandwidth VPN possible
■ GRE is great (lacks encryption, but provides best bandwidth)
Maintenance:
○ Run Compact Daily: It does miracle to performance on heavy loads
○ If you are not on SSD, disable thrift on the node before running compaction
○ Do compactions in sequence, node by node
○ On high-load systems, avoid repair as possible, it’s better to decommission
and recommission a node than to run repair!
○ If you have to repair, always use “-pr” flag and if possible use the
incremental repair option (requires heavy scripting)
Monitoring:
○ Write a sampler and speed tester for each node to detect bottlenecks and
performance issues sources