Scylla tries to maximize the resource usage of all system components. The shard-per-core approach allows linear scale-up with the number of cores. As you have more cores, it makes sense to balance the other resources, from memory to network.

Scylla is CPU intensive. Do not run additional CPU intensive tasks on the same server/cores as Scylla.

Scylla requires modern Intel CPUs that support the SSE4.2 instruction set and will not boot without it.

In terms of number of cores, any number will work since Scylla scales up with the number of cores. A practical approach is to use a large number of cores as long as the hardware price remains reasonable. 20-60 logical cores (including hyperthreading) is a good number, but any number fits. When using virtual machines, containers, or public cloud, remember that each virtual CPU is mapped to a single logical core, or hyperthread.

The more memory you have, the better Scylla will perform, since Scylla can use all of it for caching. The wider your rows in your schema, the more memory you’ll need. 64GiB-256GiB is good range for a medium or high workload.

We highly recommend SSD and local disks. Scylla is built for a large volume of data and large storage per node. The rule of thumb is using 30:1 Disk/RAM ratio, for example, 30TB of storage requires 1TB of RAM.
When there are multiple drives, we recommend a RAID-0 setup and a replication factor of 3 within the local datacenter (RF=3).

HDDs are supported but may become a bottleneck. Some workloads may work with HDDs, especially if they play nice and minimize random seeks. An example of an HDD-friendly workload is a write-mostly (98% writes) workload, with minimal random reads. If you use HDDs, try to allocate a separate disk for the commit log (not needed with SSDs).

Overall, it’s recommended to have a balanced setup—if you have just 4-8 lcores, you may not need large disks or 10Gbps networking. It goes the opposite direction as well. Here are assorted common recommended setups:

We highly recommend i2 instances—High I/O. This family includes the High Storage Instances that provide very fast SSD-backed instance storage optimized for very high random I/O performance, and provide high IOPS at a low cost. We recommend on using enhanced networking that exposes the physical network cards to the VM.

Pick a zone where you can find Haswell CPUs. Local SSD performance offers, according to Google, less than 1 ms of latency and up to 680,000 read IOPS and 360,000 write IOPS. We recommend the CentOS 7.x image with NVMe disk interface. (More info)

scylla.yaml is equivalent to the Apache Cassandra cassandra.yaml configuration file, and it is compatible for relevant parameters. Below is a subset of scylla.yaml with parameters you are likely to update. For full list of parameters, look at the file itself.

# The name of the cluster. This is mainly used to prevent machines in# one logical cluster from joining another.cluster_name:'TestCluster'# This defines the number of tokens randomly assigned to this node on the ring# The more tokens, relative to other nodes, the larger the proportion of data# that this node will store. You probably want all nodes to have the same number# of tokens assuming they have equal hardware capability.## If you already have a cluster with 1 token per node, and wish to migrate to# multiple tokens per node, see http://wiki.apache.org/cassandra/Operationsnum_tokens:256# Directory where Scylla should store data on disk.data_file_directories:-/var/lib/scylla/data# commit log. when running on magnetic HDD, this should be a# separate spindle than the data directories.commitlog_directory:/var/lib/scylla/commitlog# seed_provider class_name is saved for future use.# seeds address are mandatory!seed_provider:# Addresses of hosts that are deemed contact points.# Scylla nodes use this list of hosts to find each other and learn# the topology of the ring. You must change this if you are running# multiple nodes!-class_name:org.apache.cassandra.locator.SimpleSeedProviderparameters:# seeds is actually a comma-delimited list of addresses.# Ex: "<ip1>,<ip2>,<ip3>"-seeds:"127.0.0.1"# Address or interface to bind to and tell other Scylla nodes to connect to.# You _must_ change this if you want multiple nodes to be able to communicate!## Setting listen_address to 0.0.0.0 is always wrong.listen_address:localhost# Address to broadcast to other Scylla nodes# Leaving this blank will set it to the same value as listen_address# broadcast_address: 1.2.3.4# port for the CQL native transport to listen for clients on# For security reasons, you should not expose this port to the internet. Firewall it if needed.native_transport_port:9042# Uncomment to enable experimental features# experimental: true

By default scylla.yaml is located at /etc/scylla/scylla.yaml. Note that the file will open as read only unless you edit it as the root user or by using sudo.

All ports above need to be open to external clients (CQL), external admin systems (JMX), and other nodes (RPC). REST API port can be kept closed for external incoming connections.

The JMX service, scylla-jmx, runs on port 7199. It is required in order to manage Scylla using nodetool and other Apache Cassandra-compatible utilities. The scylla-jmx process must be able to connect to port 10000 on localhost. The JMX service listens for incoming JMX connections on all network interfaces on the system.

It is possible that a client, or another node, may need to use a different IP address to connect to a Scylla node from the address that the node is listening on. This is the case when a node is behind port forwarding. Scylla allows for setting alternate IP addresses.

Do not set any IP address to 0.0.0.0.

Address

Content

Default

listen_address

IP address of interface for inter-node connections, as seen from localhost.

No default (required)

broadcast_address

IP address of interface for inter-node connections, as seen from other nodes in the cluster.

listen_address

rpc_address

IP address of interface for client connections, as seen from localhost

No default (required)

broadcast_rpc_address

IP address of interface for client connections, as seen from clients

rpc_address

If other nodes can connect directly to listen_address, then broadcast_address does not need to be set.

If clients can connect directly to rpc_address, then broadcast_rpc_address does not need to be set.

Scylla places any core dumps in var/lib/scylla/coredump. They are not visible with the coredumpctl command. See the System Configuration Guide for details on core dump configuration scripts. Check with Scylla support before sharing any core dump, as they may contain sensitive data.

Scylla is designed for high performance before tuning, for fewer layers that interact in unpredictable ways, and to use better algorithms that do not require manual tuning. The following items are found in the manuals for other data stores, but do not need to appear here.

Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.