8000 Option for row_to_names to find the first complete row of names · Issue #429 · sfirke/janitor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Option for row_to_names to find the first complete row of names #429 8000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfirke opened this issue Feb 9, 2021 · 5 comments · Fixed by #443
Closed

Option for row_to_names to find the first complete row of names #429

sfirke opened this issue Feb 9, 2021 · 5 comments · Fixed by #443
Assignees
Labels
in progress work is currently underway on this issue

Comments

@sfirke
Copy link
Owner
sfirke commented Feb 9, 2021

Feature request

I am working with data that looks like this:

image

And want to get the row.names from line 160. It would be nice if I could tell row_to_names to use the first complete row for the names. Often the clutter above the row names is descriptive text that only occupies some columns, so would get skipped in favor of the first row with no NA values.

I'm not sure how to work it in - it could be:

  • Default behavior if no row_number is specified, printing a message of "no row number specified, using the first complete row (row X)`
  • row_number could accept "first_complete", or 0 as a value requesting this behavior.

I prefer the second bullet, though I'm not sure if first_complete or 0 is better. Either way it would be documented.

@billdenney
Copy link
Collaborator

That makes sense and seems pretty straight-forward to me. That said, if we're going to extend the feature, I'd like to consider one more use case:

I often know the value to search for in the names row. From your example, I'd want to find the text "Value" in column 1. My typical method would be to do something like:

Assuming the that data has been loaded into my_data:

header_row <- which(my_data[[1]] %in% "Value")
stopifnot(length(header_row) == 1)
my_clean_data <- row_to_names(my_data, header_row)

As I think about the solution here, I think that a separate function may be the right answer. Specifically, something that finds a header row like:

find_header(data, ...)

If no ... argument is set, then it finds the complete row as in your example. If a character string is set, then it looks for that string in the first column (find_header(my_data, "Value") would do the same as my code above. If a single named argument is set, then it would look for the named value in the numbered column (find_header(my_data, Value=1) or find_header(my_data, Label=2) would search for "Value" in column 1 or "Label" in column 2).

What do you think: Am I adding too much indirection and making row_to_names() too kludgy?

@billdenney
Copy link
Collaborator

And, another thought is that you could have it all:

  • If row_number is set to a positive integer, work as it currently does.
  • If row_number is set to "first_complete", call the proposed find_header() function as suggested with no arguments.
  • Add a ... argument to row_to_names(), and if row_number is missing and one of the ... arguments is set, pass that argument to find_header(). (If row_number is not missing and a ... argument is given, raise an error.)

That is probably the best of all worlds: Simple enough to use, and it doesn't add much complexity to the row_to_names() code.

@sfirke
Copy link
Owner Author
sfirke commented Feb 9, 2021

I like it! As you note, it keeps the original function simple while opening up more possibilities in a new function. The only hangup I see right now is that if row_number is not specified, the function will think whatever the ... argument is is actually row_number. So I think the user will have to put something there - maybe "find_header".

@billdenney
Copy link
Collaborator

Good catch. Let's go with your addition: row_number="find_header" pushes finding the row to the new find_header() function with the dat and ... arguments.

@billdenney
Copy link
Collaborator

FYI, I'm working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress work is currently underway on this issue
Projects
None yet
2 participants
0