wbtest


author

Brett Russ @ EMC Corp.

target

file system.

invalidation

File system may have bug in write barrier, and may not drive write cache safely.

symptom

File System or device driver might not wait for write cache flush to finish.

As result, if a heavily loaded file system running with write barrier functionality enabled on a write cache enabled drive suddenly loses power, data loss and/or corruption is possible if the write barrier code is broken.

detection strategy

Prepare 2 drives: one test drive, with write cache (WC) enabled and one safe drive, with WC disabled.

wbtest then writes test data to test drive and corresponding checkpoint files to safe drive. Each checkpoint is a one line representation of large chunks of data being written to files on the test drive.

While wbtest is running, randomly turn off the system power.

After reboot, check for file consistency between safe drive and test drive. If there is any inconsistency there is a problem.

comment from author

Write barrier functionality in the Linux kernel allows you to enable the drive write cache safely. All writes are guaranteed to be flushed from drive cache upon fsync() and operation order is preserved where needed.

wbtest aims to stress this functionality by sending many write operations to the file system/disk layer, recording each as complete on safe media when the fsync() returns. At a random point in time, power is cut to the box causing all caches to be lost. At this point, all writes that have been recorded as complete will be verified after the next system boot. Any corruptions and/or mismatches are noted.

Read README file in wbtest-1.0.tar.gz for more information.

tools

wbtest 1.0 is now available.

measurement results

Waiting for results from you.

history

version date author
log
1.0 2004/09/25 (JST) Brett Russ @ EMC Corp.
First version