-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fs/dup3: If "newfd" is open, close it gracefully to prevent panic #16326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are two race conditions present in the current implementation: 1. File is partially open / getting closed and dup2/3 is called - This condition arises when one thread opens and a second thread duplicates a fixed file descriptor number that happens to collide with the file being opened. 2. File I/O is active but someone calls close / dup2/3 on the descriptor. - This condition occurs when one thread writes and another calls close or dup2/3 on an open descriptor. If the write operation is not finished before closing the file, it will violate the NuttX f_refs counter which protects the file I/O from getting its file struct wiped from under its feet. Fix these extremely rare race conditions by: 1. Returning an error (-EBUSY) to the user 2. Waiting for the file I/O to finish before closing and re-using the descriptor "newfd" This would not be an issue if only the userspace would panic due to totally nonsensical use of dup2/3, but in our case the kernel will crash as the file description behind newfd changes unexpectedly -> when the device tries to access the file description it cannot find its inode nor its private data. Signed-off-by: Ville Juven <ville.juven@unikie.com>
[Experimental Bot, please feedback here] The PR summary is missing some key information required by the NuttX guidelines. While it explains the why (race conditions leading to crashes) and the general how (error handling and synchronization), it lacks specifics. The impact section also lacks detail. Here's a breakdown of what's missing and how to improve it: Summary - Missing Information:
Impact - Missing Information:
Testing - Missing Information:
Revised Example (Illustrative – Adapt to your actual changes): SummaryThis PR fixes two race conditions in The fix introduces a mutex ( Fixes: #[NuttX Issue Number (if applicable)] Impact
TestingBuild Host(s): Linux (Ubuntu 20.04), x86_64, GCC 9.4.0 Target(s): RISC-V (rv-virt:ksmp64, rv-virt:knsh64), MPFS configuration. Testing logs before change:
Testing logs after change:
By providing this level of detail, you make it much easier for reviewers to understand, verify, and approve your PR. Remember to adapt the example to your specific implementation. |
filep = files_fget_by_index(list, | ||
fd2 / CONFIG_NFILE_DESCRIPTORS_PER_BLOCK, | ||
fd2 % CONFIG_NFILE_DESCRIPTORS_PER_BLOCK, | ||
&new); | ||
|
||
/* If return value is NULL, it means the file is partially open. This means | ||
* the userspace is racing against itself. To prevent the kernel from | ||
* crashing due to access to invalid file pointer, just make the user try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the spec require that open/close is atomic(https://man7.org/linux/man-pages/man2/dup.2.html):
If the file descriptor newfd was previously open, it is closed
before being reused; the close is performed silently (i.e., any
errors during the close are not reported by dup2()).
The steps of closing and reusing the file descriptor newfd are
performed atomically. This is important, because trying to
implement equivalent functionality using [close(2)](https://man7.org/linux/man-pages/man2/close.2.html) and dup() would
be subject to race conditions, whereby newfd might be reused
between the two steps. Such reuse could happen because the main
program is interrupted by a signal handler that allocates a file
descriptor, or because a parallel thread allocates a file
descriptor.
and shouldn't return the error code(e.g. -EBUSY) in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the spec require that open/close is atomic(https://man7.org/linux/man-pages/man2/dup.2.html):
Yes I know this very well. open/close/dup is a minefield of race conditions and there is no known good way to handle it. My motivation was to prevent the kernel from crashing if the user app is poorly made. I did this with the minimal effort to the current implementation. This removes a kernel crash when using dup & close. I can leave this downstream if the solution is not acceptable.
and shouldn't return the error code(e.g. -EBUSY) in this case.
As per the -EBUSY return value, we could handle EBUSY in libc, so the user doesn't see this. This would be a completely POSIX compliant solution.
and shouldn't return the error code
Btw, a newer POSIX spec allows returning -1 to user, but doesn't mention EBUSY. We could return -1 and not touch errno ?
If the close operation fails to close fildes2, dup2() shall return -1 without changing the open file description to which fildes2 refers. [1]
[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/dup.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But my understand is that the fail should come from the file system self, not vfs layer. At least, we should:
- retry the duplication either inside nx_dup3_from_tcb or dup3 in this specific race condition
- or add the lock between file_close_wait and files_fget_by_index to remove the race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is easily doable, I suggested that we do the retry in libc but we can do it here as well
- You mean add a file lock to file struct ?
@@ -462,6 +462,7 @@ struct file | |||
int f_oflags; /* Open mode flags */ | |||
#ifdef CONFIG_FS_REFCOUNT | |||
atomic_t f_refs; /* Reference count */ | |||
sem_t f_closem; /* Free: file is fully closed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the spec(https://man7.org/linux/man-pages/man2/dup.2.html):
After a successful return, the old and new file descriptors may be
used interchangeably. Since the two file descriptors refer to the
same open file description, they share file offset and file status
flags; for example, if the file offset is modified by using
[lseek(2)](https://man7.org/linux/man-pages/man2/lseek.2.html) on one of the file descriptors, the offset is also
changed for the other file descriptor.
The two file descriptors do not share file descriptor flags (the
close-on-exec flag). The close-on-exec flag (FD_CLOEXEC; see
[fcntl(2)](https://man7.org/linux/man-pages/man2/fcntl.2.html)) for the duplicate descriptor is off.
we should split file into two struct:
struct file
{
#ifdef CONFIG_FS_REFCOUNT
atomic_t f_refs; /* Reference count */
#endif
off_t f_pos; /* File position */
FAR struct inode *f_inode; /* Driver or file system interface */
FAR void *f_priv; /* Per file driver private data */
#if CONFIG_FS_LOCK_BUCKET_SIZE > 0
bool locked; /* Filelock state: false - unlocked, true - locked */
#endif
};
struct fd
{
struct file *file;
int f_oflags; /* Open mode flags */
#ifdef CONFIG_FDSAN
uint64_t f_tag_fdsan; /* File owner fdsan tag, init to 0 */
#endif
#ifdef CONFIG_FDCHECK
uint8_t f_tag_fdcheck; /* File owner fdcheck tag, init to 0 */
#endif
#if CONFIG_FS_BACKTRACE > 0
FAR void *f_backtrace[CONFIG_FS_BACKTRACE]; /* Backtrace to while file opens */
#endif
};
with this separation, we can avoid the crash natively since struct fd
can be overwrite directly without any problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this would be better. However, it is not clear to me which implementation is correct. POSIX says
The dup2() function shall cause the file descriptor fildes2 to refer to the same open file description as the file descriptor fildes and to share any locks, and shall return fildes2.
NuttX dup2/3 uses two file structs for oldfd and newfd, so they point to different file structs and do not share file status or offset.
Linux does this differently, dup2/3 makes filep2 = filep1, and when accessing fd1 and fd2 the file status and position change for both descriptors (as they are the same file struct).
Which is correct ? The POSIX spec is ambiguous in this respect. I think our implementation is wrong but changing this functionality can create a MASSIVE regression for people who depend on our (wrong) implementation of dup2/3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try this separation tomorrow, it seems simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this would be better. However, it is not clear to me which implementation is correct. POSIX says
The dup2() function shall cause the file descriptor fildes2 to refer to the same open file description as the file descriptor fildes and to share any locks, and shall return fildes2.
NuttX dup2/3 uses two file structs for oldfd and newfd, so they point to different file structs and do not share file status or offset.
Linux does this differently, dup2/3 makes filep2 = filep1, and when accessing fd1 and fd2 the file status and position change for both descriptors (as they are the same file struct).
Which is correct ? The POSIX spec is ambiguous in this respect.
Linux implementation is correct, since POSIX spec explicitly state that:
- offset and lock state should be shared after dup
- open flag should be separated after dup
I think our implementation is wrong but changing this functionality can create a MASSIVE regression for people who depend on our (wrong) implementation of dup2/3.
It's always the top priority for NuttX to strictly compliant with POSIX standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux implementation is correct, since POSIX spec explicitly state that:
Linux man pages state this. Of course Linux man pages would describe how the Linux implementation works. The POSIX spec at https://pubs.opengroup.org/onlinepubs/9699919799/functions/dup.html is not this explicit.
But I agree that the Linux implementation must be right, since POSIX says that fd2 should refer to the same file description as fd1, meaning they are the same thing, meaning our implementation is wrong.
I will not fix this though, I'm trying to fix a crash, not re-write the whole file descriptor subsystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but, I prefer to fix the root cause, since it not only more compliant to spec, but also fix your problem more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have time to use the correct approach, @Donny9 could find some resource fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer a comprehensive solution as well, this fix is a bit hacky I agree.
I'm analyzing how much work this would be. My initial estimate was a bit "optimistic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoxiang781216 I'll do the proper fix. Closing this PR as obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, let's make nuttx better and better.
Summary
There are two race conditions present in the current implementation:
Fix these extremely rare race conditions by:
This would not be an issue if only the userspace would panic due to totally nonsensical use of dup2/3, but in our case the kernel will crash as the file description behind newfd changes unexpectedly -> when the device tries to access the file description it cannot find its inode nor its private data.
Impact
Attempt to fix dup2() and dup3() system calls. Currently they have a race condition that causes the kernel to crash.
No impact to other APIs, documentation, user etc.
Testing
MPFS target with BUILD_KERNEL and several processes / threads using dup2, write. Before this patch we get random crashes net_local and socket APIs due to disappearing file descriptors. With this patch we no longer crash.
rv-virt:ksmp64 ostest
rv-virt:knsh64 ostest