8000 feat: TruncateFront & TruncateBack support truncating all entries by RangerCD · Pull Request #31 · tidwall/wal · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: TruncateFront & TruncateBack support truncating all entries #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RangerCD
Copy link
@RangerCD RangerCD commented Feb 17, 2025

Currently, it seems like there is no way to truncate entire log, which causes the duplicate transaction issue (#20).

This PR provides the ability of truncating all entries, by extending the index range 1 step further than the last(TruncateFront) or 1 step before the first(TruncateBack) entry.

For example, if the log has 3 entries[10, 11, 12]:

  • TruncateFront(13) will truncate all entries, and left firstIndex as 13 and lastIndex as 12.
  • TruncateBack(9) will truncate all entries, and left firstIndex as 10 and lastIndex as 9.

Why this feature is needed

While I was working on a project that uses this package as WAL implementation, I've met the situation that described in #20. I can't simply mark all jobs as done if the program exits unexpectedly, and next time it starts, it will duplicate the last job. So I need a way to remove this last job from wal once it's done.

With this PR, I can call TruncateFront(index + 1) when I've popped indexth job from wal, and let the program wait for incoming index + 1th job.

What's the difference between #29 and this PR

Thanks to @DStrand1 for providing a solution. However, #29 can't fulfill my requirements, by providing a Clear function.

Complexity of usage

In my use case, I'm running a long-lived goroutine as a consumer(Read), and a gin server as producers(Write).

With Clear from #29, after the consumer finishes all jobs, it will first detect whether wal is empty (by either calling Read(index + 1) with ErrNotFound returned, or checking a seperate counter), and then call Clear to remove all entries. The problem is that, these two steps are not atomic, so I have to add another lock to protect them which is complicated.

With this PR, both TruncateFront and TruncateBack hold internal lock, so they are atomic. And caller don't need to care about how many jobs left in wal, just call Read and TruncateFront in a loop until Read returns ErrNotFound, then wait for incoming jobs.

Corruption-safty

Clear function in #29 isn't corruption-safe, like #29 (comment) pointed out.

This PR modifies TruncateFront and TruncateBack, following the original corruption-safe mechanism.

Index continuity

#29 clears all entries, also resets firstIndex and lastIndex to 1 and 0. Both Write and Read have to reset its own index counter after Clear.

This PR keeps firstIndex and lastIndex continuous, Write and Read can keep using previous index counter.

@tidwall
Copy link
Owner
tidwall commented Mar 16, 2025

This looks good on the surface.
We'll need some tests to ensure correctness.

@RangerCD
Copy link
Author

This looks good on the surface. We'll need some tests to ensure correctness.

I'll add some tests, might also adjust current test cases if needed.

@RangerCD
Copy link
Author

While I was writing tests, a question raised that needs your help to answer.

With this change, the WAL may be in a state without any entries during use. As a result, the behavior of FirstIndex and LastIndex may need further clarification or modification.

In the following two comments, these two functions are expected to return 0 when the log is empty.

wal/wal.go

Lines 484 to 486 in 56126d5

// FirstIndex returns the index of the first entry in the log. Returns zero
// when log has no entries.
func (l *Log) FirstIndex() (index uint64, err error) {

wal/wal.go

Lines 502 to 504 in 56126d5

// LastIndex returns the index of the last entry in the log. Returns zero when
// log has no entries.
func (l *Log) LastIndex() (index uint64, err error) {

However, if the WAL is in a continuous read/write state, returning 0 might cause some confusion for users.


Let's temporarily ignore the special handling of empty entries in these two functions and consider the following usage scenario:

There are two Go routines—one continuously appends entries to the WAL, while the other continuously retrieves entries from the WAL. The code is as follows:

func main() {
	notify := make(chan struct{}, 1)
	l, _ := wal.Open("wal", nil)

	go func() {
		for {
			time.Sleep(1 * time.Second)
			idx, _ := l.LastIndex()
			idx++
			l.Write(idx, []byte(strconv.FormatUint(idx, 10)+"hello world"))
			select {
			case notify <- struct{}{}:
			default:
			}
		}
	}()

	go func() {
		for {
			select {
			case <-notify:
			}
			for {
				idx, _ := l.FirstIndex()
				data, err := l.Read(idx)
				if errors.Is(err, wal.ErrNotFound) {
					break
				}
				fmt.Println(string(data))
				l.TruncateFront(idx + 1)
			}
		}
	}()

	select {}
}

Relatively speaking, this is a straightforward implementation of the producer-consumer pattern, where each Go routine only needs to focus on one end of the queue.

In this case, FirstIndex can be understood as the "Next index to read," while LastIndex represents the "Last index that has been written."

Now, returning to the special handling of empty entries in these functions, the above code is actually just one step away from running perfectly:

Since FirstIndex will initially return 0, we need to replace idx, _ := l.FirstIndex() with:

idx, _ := l.FirstIndex()
if idx == 0 {
	idx = 1
}

This isn't particularly complicated, but it may confuse users until they carefully read the source code of FirstIndex to figure out why they can't read the first entry, it's because FirstIndex doesn't only just return the first index to read. And that's what I had experienced a month ago.


Currently, the implementation of these two functions only includes special handling for one specific case: when l.lastIndex == 0, FirstIndex returns 0 instead of the actual l.firstIndex.

For LastIndex, the if branch does not actually change its behavior, because when l.lastIndex == 0, returning 0 and returning l.lastIndex are effectively the same.

Disregarding historical reasons, a feasible implementation would be to return an additional boolean value to indicate whether the WAL is empty.

So the key question that needs to be clarified is:

  • whether to keep the current behavior and write test cases around it
  • or introduce a breaking change, either by removing the special handling or by returning an additional value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0