OCFS2/ReflinkTest

OCFS2 Reflink Test

Introduction

This document aims at scheduling the testing plan against refcount which we're currently implementing on Ocfs2,and also capture/record the testing result to trace the status during test where we can easily position a bug. In an addition,we also explain how all of the testcases will be organized and the workflow of their corresponding testing tool.

Test Limitation

The limitation defined by desgin roadmap during refcount implementation on ocfs2 should be claimed here to make a clear understanding of how can we expect our testing tool most.

Testcases

we concentrated on feature,stress and boundary test for kernel fs both in single and multiple nodes,and then formalize these into a testing tool package.currently, a combination tests with EA also would deserve our concern.

1.Single-node Tests

All testcases scheduled in reflink_tests_run.sh,basically these testing scenarios were provided by reflink_tests.c while some other cases built by shell.

1. Basic functional test, it includes:
1) Add refcount tree to inodes
2) set inode's refcount tree
3) Remove inodes refcount tree
4) Increment and decrement refcount
5) Cause CoW
Reflink numbers and filesize are tunable here to specify workload, use write(),ftruncate(),unlink(), append etc to cause CoW into reflinked files. we also want to make sure if mmap() works fine with reflinks.
How to verify: orignal and reflinked will be compared to assure the sanity of reflink operation,and what's more, original(unchanged) file also will be checked after CoWs happened to reflinks to see if it was in a original format. and we also borrow the verifying method from fill_verify_holes.c to verify random writes against reflinked inodes.
2. Random tests:
Here we randomize almost all of the factors during the tests to find bugs, such as random original extent size and numbers, random writesize and offset to cause CoW etc. it also perform a random read among reflinks.
3. Concurrent tests, tests with a fixed number of processes to concurrently manipulate the reflinked inodes:
1) 1/4 child processes do reflinks to increment refount.
2) 1/4 child processes do CoW to decrement refcount .
3) 1/4 child processes do truncate and append.
4) 1/4 child processes to verify original to see if it is in a right format.
5) father do unlinks and verify original file too.
4. Boundary test:
1) Files with size 0 (for xattrs, dx, data) should not have a refcount tree. When they are truncated, the code will remove the refcount tree.
2) Files with inline data and no external xattr tree should also have no refcount tree.
3) Extents less than 1MB should be CoWd in their entirety.
4) Extents larger than 1MB should be CoWd in 1MB hunks. So if you have a 1GB extent and write a byte, the surrounding 1MB should be CoWd.
You'll want to test this in the first 1MB of the 1GB extent, the last 1MB of the extent, and somewhere in the middle of the extent.
5. Combination tests with xattr.
6. Combination tests with fill_verify_holes tests incorporated.
7. Stress tests:
1) enormous refcount tree in fs, it means we have lots of reflink pairs.
2) Lots of inodes shared one refcount tree
3) Reflinks on a HUGE file(like in oraclevm), and perform writes/reads to cause CoW from original and reflinks
4) Reflinks on inodes with HUGE(size) extents
5) Reflinks on inodes with MANY(number) extents in 1MB hunk or other size
8. Bash & Tools utility tests, use dd,reflink command to test reflinks with a combination of extent size.
9.Combination test with OracleVM:
Here I personally think we just need to simulate some kinds of HUGE file(like imagefile on oraclevm),then schedule tests on it, is there anything specific that we need to concern?
10. Destructive test, to fill up the volume with enormous shared reflink inodes and refcount trees.

2.Multi-nodes Tests

NOTE:To launch the multi-nodes test,need to install openmpi first,and a configuration of ssh/rsh passwordless access also needed in advance.

1. Test lock contention for shared refcount tree among nodes:
Such as one node decrement the refcount, while one increase the refcount. each node keep a series of shared inodes, then do CoW/read concurrently among nodes.
2. Destructive & Recovery Test:
The rule should be that a reflinked inode does not appear in the filesystem until it's completely constructed. So we need a couple node-kill tests. What I'd really like is two tests, but I don't think we can do them only from userspace:
1). Kill a node while reflink is busy refcounting the source inode. When the other nodes recover, the file should still be valid and in some state of refcounting.
2). Kill a node while reflink is building the target inode in the orphan dir. The other nodes should remove it from the orphan dir, as if it never existed.