-
Notifications
You must be signed in to change notification settings - Fork 12
[Draft] Error Policy for import data with stash-and-continue for ingestion errors #2639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -537,6 +540,12 @@ func importData(importFileTasks []*ImportFileTask) { | |||
} | |||
} | |||
|
|||
exportDirDataDir := filepath.Join(exportDir, "data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: move this to a common func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In import data file, data dir is a configurable argument so its not inside export-dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but that's not a concern, right? we always create a data/
folder. in import-data-file, it will be empty, and that's okay?
limitations under the License. | ||
*/ | ||
|
||
package errorpolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably an overkill of a package. Thinking about this in a follow-up PR.
Perhaps just put it inside constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is not a heavy component that has a lot of functionality, as the actual functionality is in Import handlers, so it doesn't feel like it should be a separate package. I think some constants and functions in utils should suffice for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I was thinking of two options:
- put it in constants package. (or create a new enums package)
- put it in importdata package.
Going with importdata package as it's only going to be used in that package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed it thoroughly, some comments on overall structure and flow.
return &ImportDataAbortHandler{} | ||
} | ||
|
||
func (handler *ImportDataAbortHandler) ShouldAbort() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why will we have a check Abort in all handlers, that's the type of Handler right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed,
- The alternative would be to type assert to the concrete type before we can check.
ShouldAbort()
feels slightly cleaner and it also makes sense to be part of the handler interface. - Theoretically, you could have more than one handlers wherein we might need to abort (for example, a
SendSlackNotificationAndAbort
handler ). Therefore, a funcShouldAbort()
makes sense.
if err2 != nil { | ||
utils.ErrExit("handling error for batch: %q into %s: %s", batch.FilePath, batch.TableNameTup, err2) | ||
} | ||
if fti.errorHandler.ShouldAbort() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are having this function to check if it's a AbortHandler, should we check this first thing here and not do any handling etc..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although theoretically we can have handlers where we might need to handle and then abort, I'll simplify this flow for now.
Makes sense to handle only if we are not aborting. I will clean this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ^
limitations under the License. | ||
*/ | ||
|
||
package errorpolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is not a heavy component that has a lot of functionality, as the actual functionality is in Import handlers, so it doesn't feel like it should be a separate package. I think some constants and functions in utils should suffice for this.
for _, batch := range batches { | ||
if batch.IsDone() { | ||
doneCount++ | ||
} else if batch.IsInterrupted() { | ||
interruptedCount++ | ||
} else if batch.IsErrored() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone fixes the batch and it gets ingested properly, will it then change the status, etc, for that scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the workflow for dealing with stashed errors. That is out of scope for this PR (and overall task)
@@ -537,6 +540,12 @@ func importData(importFileTasks []*ImportFileTask) { | |||
} | |||
} | |||
|
|||
exportDirDataDir := filepath.Join(exportDir, "data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In import data file, data dir is a configurable argument so its not inside export-dir
} | ||
|
||
func (handler *ImportDataStashAndContinueHandler) getTaskErrorsFolderPath(batch ErroredBatch, importFileTaskId string) string { | ||
return filepath.Join(handler.dataDir, "errors", batch.GetTableName().ForMinOutput(), importFileTaskId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't use the data dir for this as datadir is user's directory in case of import data file (local option) and is S3/GCS/etc.. buckets in cloud option.So, we should use import data state which is voyager's metadir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but
- the metainfo directory is an internal directory for voyager to maintain some state and metadata. i don't think it's a good idea to expose that to the user. We should be free to change that as we please. (i know we already do expose that, but only when there is an error).
- It's cleaner if the directory that we point the user to for the errors, is dedicated and at the top-level (
data/errors
as opposed to under/metainfo/import_data_state/target_db_importer/
which seems pretty internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see any concerns if we put it under data/errors
? As i mentioned in the other comment, in case of import-data-file, that directory will be empty, but i don't see that necessarily as a problem.
Describe the changes in this pull request
https://docs.google.com/document/d/1KqQbIoQqQmlxIHql5Zd8aOyw8bP4j82T7Inw7tYMomg/edit?tab=t.0#heading=h.4p1zl0z1krwh
Overview of change:
ImportDataStashAndContinueHandler.HandleBatchIngestionError
does the followingimport-data-state
folder. A new folder underdata/errors/<table-name>/<file-name>
will contain symlinks to the errored batch files. This is to give a single view only of the errored files. In future this folder will also contain the processing errors.Overall Task list:
Describe if there are any user-facing changes
How was this pull request tested?
Does your PR have changes in callhome/yugabyted payloads? If so, is the payload version incremented?
Does your PR have changes that can cause upgrade issues?