Navigation

When you first deploy a cluster without creating a pool, Ceph uses the default
pools for storing data. A pool provides you with:

Resilience: You can set how many OSD are allowed to fail without losing data.
For replicated pools, it is the desired number of copies/replicas of an object.
A typical configuration stores an object and one additional copy
(i.e., size=2), but you can determine the number of copies/replicas.
For erasure coded pools, it is the number of coding chunks
(i.e. m=2 in the erasure code profile)

Placement Groups: You can set the number of placement groups for the pool.
A typical configuration uses approximately 100 placement groups per OSD to
provide optimal balancing without using up too many computing resources. When
setting up multiple pools, be careful to ensure you set a reasonable number of
placement groups for both the pool and the cluster as a whole.

CRUSH Rules: When you store data in a pool, placement of the object
and its replicas (or chunks for erasure coded pools) in your cluster is governed
by CRUSH rules. You can create a custom CRUSH rule for your pool if the default
rule is not appropriate for your use case.

Snapshots: When you create snapshots with cephosdpoolmksnap,
you effectively take a snapshot of a particular pool.

To organize data into pools, you can list, create, and remove pools.
You can also view the utilization statistics for each pool.

The total number of placement groups for the pool. See Placement
Groups for details on calculating a suitable number. The
default value 8 is NOT suitable for most systems.

Type:

Integer

Required:

Yes.

Default:

8

{pgp-num}

Description:

The total number of placement groups for placement purposes. This
should be equal to the total number of placement groups, except
for placement group splitting scenarios.

Type:

Integer

Required:

Yes. Picks up default or Ceph configuration value if not specified.

Default:

8

{replicated|erasure}

Description:

The pool type which may either be replicated to
recover from lost OSDs by keeping multiple copies of the
objects or erasure to get a kind of
generalized RAID5 capability.
The replicated pools require more
raw storage but implement all Ceph operations. The
erasure pools require less raw storage but only
implement a subset of the available operations.

Type:

String

Required:

No.

Default:

replicated

[crush-rule-name]

Description:

The name of a CRUSH rule to use for this pool. The specified
rule must exist.

Type:

String

Required:

No.

Default:

For replicated pools it is the rule specified by the osdpooldefaultcrushrule config variable. This rule must exist.
For erasure pools it is erasure-code if the defaulterasure code profile is used or {pool-name} otherwise. This
rule will be created implicitly if it doesn’t exist already.

[erasure-code-profile=profile]

Description:

For erasure pools only. Use the erasure code profile. It
must be an existing profile as defined by
osd erasure-code-profile set.

Type:

String

Required:

No.

When you create a pool, set the number of placement groups to a reasonable value
(e.g., 100). Consider the total number of placement groups per OSD too.
Placement groups are computationally expensive, so performance will degrade when
you have many pools with many placement groups (e.g., 50 pools with 100
placement groups each). The point of diminishing returns depends upon the power
of the OSD host.

See Placement Groups for details on calculating an appropriate number of
placement groups for your pool.

[expected-num-objects]

Description:

The expected number of objects for this pool. By setting this value (
together with a negative filestore merge threshold), the PG folder
splitting would happen at the pool creation time, to avoid the latency
impact to do a runtime folder splitting.

Pools need to be associated with an application before use. Pools that will be
used with CephFS or pools that are automatically created by RGW are
automatically associated. Pools that are intended for use with RBD should be
initialized using the rbd tool (see Block Device Commands for more
information).

For other cases, you can manually associate a free-form application name to
a pool.:

cephosdpoolapplicationenable{pool-name}{application-name}

Note

CephFS uses the application name cephfs, RBD uses the
application name rbd, and RGW uses the application name rgw.

The effective number of placement groups to use when calculating
data placement.

Type:

Integer

Valid Range:

Superior to pg_num current value.

pgp_num

Description:

The effective number of placement groups for placement to use
when calculating data placement.

Type:

Integer

Valid Range:

Equal to or less than pg_num.

crush_rule

Description:

The rule to use for mapping object placement in the cluster.

Type:

Integer

allow_ec_overwrites

Description:

Whether writes to an erasure coded pool can update part
of an object, so cephfs and rbd can use it. See
Erasure Coding with Overwrites for more details.

Type:

Boolean

Version:

12.2.0 and above

hashpspool

Description:

Set/Unset HASHPSPOOL flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

Version:

Version 0.48 Argonaut and above.

nodelete

Description:

Set/Unset NODELETE flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

Version:

Version FIXME

nopgchange

Description:

Set/Unset NOPGCHANGE flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

Version:

Version FIXME

nosizechange

Description:

Set/Unset NOSIZECHANGE flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

Version:

Version FIXME

write_fadvise_dontneed

Description:

Set/Unset WRITE_FADVISE_DONTNEED flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

noscrub

Description:

Set/Unset NOSCRUB flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

nodeep-scrub

Description:

Set/Unset NODEEP_SCRUB flag on a given pool.

Type:

Integer

Valid Range:

1 sets flag, 0 unsets flag

hit_set_type

Description:

Enables hit set tracking for cache pools.
See Bloom Filter for additional information.

Type:

String

Valid Settings:

bloom, explicit_hash, explicit_object

Default:

bloom. Other values are for testing.

hit_set_count

Description:

The number of hit sets to store for cache pools. The higher
the number, the more RAM consumed by the ceph-osd daemon.

Type:

Integer

Valid Range:

1. Agent doesn’t handle > 1 yet.

hit_set_period

Description:

The duration of a hit set period in seconds for cache pools.
The higher the number, the more RAM consumed by the
ceph-osd daemon.

Type:

Integer

Example:

3600 1hr

hit_set_fpp

Description:

The false positive probability for the bloom hit set type.
See Bloom Filter for additional information.

Type:

Double

Valid Range:

0.0 - 1.0

Default:

0.05

cache_target_dirty_ratio

Description:

The percentage of the cache pool containing modified (dirty)
objects before the cache tiering agent will flush them to the
backing storage pool.

Type:

Double

Default:

.4

cache_target_dirty_high_ratio

Description:

The percentage of the cache pool containing modified (dirty)
objects before the cache tiering agent will flush them to the
backing storage pool with a higher speed.

Type:

Double

Default:

.6

cache_target_full_ratio

Description:

The percentage of the cache pool containing unmodified (clean)
objects before the cache tiering agent will evict them from the
cache pool.

Type:

Double

Default:

.8

target_max_bytes

Description:

Ceph will begin flushing or evicting objects when the
max_bytes threshold is triggered.

Type:

Integer

Example:

1000000000000 #1-TB

target_max_objects

Description:

Ceph will begin flushing or evicting objects when the
max_objects threshold is triggered.

Type:

Integer

Example:

1000000 #1M objects

hit_set_grade_decay_rate

Description:

Temperature decay rate between two successive hit_sets

Type:

Integer

Valid Range:

0 - 100

Default:

20

hit_set_search_last_n

Description:

Count at most N appearance in hit_sets for temperature calculation

Type:

Integer

Valid Range:

0 - hit_set_count

Default:

1

cache_min_flush_age

Description:

The time (in seconds) before the cache tiering agent will flush
an object from the cache pool to the storage pool.

Type:

Integer

Example:

600 10min

cache_min_evict_age

Description:

The time (in seconds) before the cache tiering agent will evict
an object from the cache pool.

Type:

Integer

Example:

1800 30min

fast_read

Description:

On Erasure Coding pool, if this flag is turned on, the read request
would issue sub reads to all shards, and waits until it receives enough
shards to decode to serve the client. In the case of jerasure and isa
erasure plugins, once the first K replies return, client’s request is
served immediately using the data decoded from these replies. This
helps to tradeoff some resources for better performance. Currently this
flag is only supported for Erasure Coding pool.

Type:

Boolean

Defaults:

0

scrub_min_interval

Description:

The minimum interval in seconds for pool scrubbing when
load is low. If it is 0, the value osd_scrub_min_interval
from config is used.

Type:

Double

Default:

0

scrub_max_interval

Description:

The maximum interval in seconds for pool scrubbing
irrespective of cluster load. If it is 0, the value
osd_scrub_max_interval from config is used.

Type:

Double

Default:

0

deep_scrub_interval

Description:

The interval in seconds for pool “deep” scrubbing. If it
is 0, the value osd_deep_scrub_interval from config is used.

To set the number of object replicas on a replicated pool, execute the following:

cephosdpoolset{poolname}size{num-replicas}

Important

The {num-replicas} includes the object itself.
If you want the object and two copies of the object for a total of
three instances of the object, specify 3.

For example:

cephosdpoolsetdatasize3

You may execute this command for each pool. Note: An object might accept
I/Os in degraded mode with fewer than poolsize replicas. To set a minimum
number of required replicas for I/O, you should use the min_size setting.
For example:

cephosdpoolsetdatamin_size2

This ensures that no object in the data pool will receive I/O with fewer than
min_size replicas.