-
Notifications
You must be signed in to change notification settings - Fork 492
Application-level crash resistance #4430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can implement your own vfs.FS which does nothing on fsync-type methods. But there may be consistency issues after recovery if the machine does crash. |
You might also be able to achieve this through configuration - like having a separate partition for the Pebble store and mounting it with special options that disable fsync. You can definitely do this in linux, not sure about MacOS. |
I happened to be experimenting in this area last week. Disabling all syncs from Pebble is problematic on Linux because doing so allows a large amount of dirty data to build up in the OS which then gets flushed at arbitrary times (this is why we trickle out syncs for sstables while writing them). Additionally, if you disable syncs at the filesystem level then the func (w *LogWriter) syncWithLatency() (time.Duration, error) {
- start := crtime.NowMono()
- err := w.s.Sync()
- syncLatency := start.Elapsed()
- return syncLatency, err
+ return time.Duration(0), nil
+ // start := crtime.NowMono()
+ // err := w.s.Sync()
+ // syncLatency := start.Elapsed()
+ // return syncLatency, err
} When running in this mode you'd want to set |
Since running without syncs isn’t safe, it would be nice to detect situations when Pebble might have lost some writes. @petermattis and I discussed this a little, and below is a sketch for the approach. The idea is to have a “proof” that Pebble has been synced, or the system has been running continuously between consecutive Pebble starts and couldn't lose writes (with high probability). To detect that the system has been running continuously, we need some kind of “epoch” that changes with every system start. A good candidate for this is the system boot time, which can be found with Example on my machine: $ sysctl kern.boottime
kern.boottime: { sec = 1742501182, usec = 290861 } Thu Mar 20 20:06:22 2025 When Pebble starts up, check for a
|
@petermattis it will be a reasonable approach! Using WriteOption with sync=false can be very useful on SSDs that aren’t particularly powerful, or on platforms with very slow fsync operations (e.g., macOS). With WALBytesPerSync, the background flusher will periodically issue an fsync to the WAL file, helping to limit data loss in the case of a machine-level failure. Any plan to ship this new feature? |
May I also request another feature? Would it be possible to expose an external API to explicitly fsync the WAL? This would be very useful in our use case, where we have two independent storage engines: (a) Pebble and (b) a set of raw files. We need to ensure that the write order between the two is always respected. We have this kind of API for (b) to fsync all the uncommitted files and would be nice to also have it from pebble. |
@rjl493456442 There are no current plans to work on this functionality. |
Pebble supports sync and async write options.
Sync write: The write operation blocks until the WAL (Write-Ahead Log) is fsync’d
Async write: The operation returns immediately once the data is cached in the
queue, without waiting for it to be written to the WAL.
Sync mode ensures the durability across the machine crash; While in Async mode,
the recent write can be lost if the application is crashed.
Can we provide the third option in which the write operation will be blocked until
the data is written into the WAL without fsync. As in the platform (e.g. MacOS)
we have observed that fsync is very expensive.
Application level crash resistance is fairly important. Is it a considerable option?
Jira issue: PEBBLE-367
The text was updated successfully, but these errors were encountered: