8000 fs/dup3: If "newfd" is open, close it gracefully to prevent panic by pussuw · Pull Request #16326 · apache/nuttx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fs/dup3: If "newfd" is open, close it gracefully to prevent panic #16326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pussuw
Copy link
Contributor
@pussuw pussuw commented May 6, 2025

Summary

There are two race conditions present in the current implementation:

  1. File is partially open / getting closed and dup2/3 is called
  • This condition arises when one thread opens and a second thread duplicates a fixed file descriptor number that happens to collide with the file being opened.
  1. File I/O is active but someone calls close / dup2/3 on the descriptor.
  • This condition occurs when one thread writes and another calls close or dup2/3 on an open descriptor. If the write operation is not finished before closing the file, it will violate the NuttX f_refs counter which protects the file I/O from getting its file struct wiped from under its feet.

Fix these extremely rare race conditions by:

  1. Returning an error (-EBUSY) to the user
  2. Waiting for the file I/O to finish before closing and re-using the descriptor "newfd"

This would not be an issue if only the userspace would panic due to totally nonsensical use of dup2/3, but in our case the kernel will crash as the file description behind newfd changes unexpectedly -> when the device tries to access the file description it cannot find its inode nor its private data.

Impact

Attempt to fix dup2() and dup3() system calls. Currently they have a race condition that causes the kernel to crash.
No impact to other APIs, documentation, user etc.

Testing

MPFS target with BUILD_KERNEL and several processes / threads using dup2, write. Before this patch we get random crashes net_local and socket APIs due to disappearing file descriptors. With this patch we no longer crash.

rv-virt:ksmp64 ostest
rv-virt:knsh64 ostest

@github-actions github-actions bot added Area: File System File System issues Size: S The size of the change in this PR is small labels May 6, 2025
There are two race conditions present in the current implementation:

1. File is partially open / getting closed and dup2/3 is called
- This condition arises when one thread opens and a second thread
  duplicates a fixed file descriptor number that happens to collide with
  the file being opened.
2. File I/O is active but someone calls close / dup2/3 on the descriptor.
- This condition occurs when one thread writes and another calls close
  or dup2/3 on an open descriptor. If the write operation is not finished
  before closing the file, it will violate the NuttX f_refs counter which
  protects the file I/O from getting its file struct wiped from under its
  feet.

Fix these extremely rare race conditions by:
1. Returning an error (-EBUSY) to the user
2. Waiting for the file I/O to finish before closing and re-using the
   descriptor "newfd"

This would not be an issue if only the userspace would panic due to
totally nonsensical use of dup2/3, but in our case the kernel will crash
as the file description behind newfd changes unexpectedly -> when the
device tries to access the file description it cannot find its inode nor
its private data.

Signed-off-by: Ville Juven <ville.juven@unikie.com>
@pussuw pussuw force-pushed the fix_dup_syscall branch from 12556dd to c86a7f5 Compare May 6, 2025 14:38
@nuttxpr
Copy link
nuttxpr commented May 6, 2025

[Experimental Bot, please feedback here]

The PR summary is missing some key information required by the NuttX guidelines. While it explains the why (race conditions leading to crashes) and the general how (error handling and synchronization), it lacks specifics. The impact section also lacks detail. Here's a breakdown of what's missing and how to improve it:

Summary - Missing Information:

  • What functional part of the code is being changed? Be precise. Mention the specific files and functions modified (e.g., fs/vfs/fs_dup.c, dup2(), dup3()).
  • How does the change exactly work (what will change and how)? The description "Waiting for the file I/O to finish before closing" is vague. What synchronization mechanism is used (e.g., mutex, semaphore)? How is the "waiting" implemented? Provide more detail about the -EBUSY error handling. Where is it returned? What are the code changes that prevent the "disappearing file descriptors"?

Impact - Missing Information:

  • While the impact claims "no impact," returning -EBUSY is a change in behavior. The PR should clearly document this. Even if unlikely, existing applications relying on the (broken) behavior of silently succeeding in race conditions might break. Acknowledge this possibility. Consider if any error handling should be added to user-space applications that use dup2/3.
  • Impact on hardware: While the testing mentions specific architectures, the Impact section should explicitly state which architectures are affected (RISC-V). This helps reviewers quickly understand the scope.
  • Impact on security: While the fix likely improves stability, consider if it has any security implications. Could the added synchronization mechanisms introduce new vulnerabilities (e.g., deadlocks)? Even if the answer is "NO," explicitly stating it demonstrates that security was considered.
  • The Testing section mentions a "MPFS target." This should also be clearly listed in the Impact section under "hardware."

Testing - Missing Information:

  • Build Host(s): Provide details about the build host environment as requested. This helps reproduce the build and tests.
  • Testing logs before change / after change: The PR says "we no longer crash," but doesn't provide actual logs. Include snippets of logs demonstrating the crash before the fix and the successful operation after the fix. Even short snippets are better than nothing. This provides concrete evidence of the problem and the solution.

Revised Example (Illustrative – Adapt to your actual changes):

Summary

This PR fixes two race conditions in dup2() and dup3() (implemented in fs/vfs/fs_dup.c) that can lead to kernel crashes. These race conditions occur when: (1) a file descriptor is duplicated while another thread is opening the same file; and (2) close() or dup2/3() is called while another thread is performing I/O on the same descriptor.

The fix introduces a mutex (g_dup_lock in fs/vfs/fs_dup.c) to protect the critical section where file descriptor operations are performed. Before manipulating a file descriptor, dup2() and dup3() now acquire this mutex. If the target file descriptor is currently involved in an open operation or I/O, -EBUSY is returned. After the operations are complete, the mutex is released. This prevents concurrent access to the file descriptor table and ensures that file structures are not prematurely freed.

Fixes: #[NuttX Issue Number (if applicable)]

Impact

  • Is new feature added? Is existing feature changed? Existing dup2() and dup3() functionality is fixed.
  • Impact on user: YES. Applications relying on the previous (broken) behavior where dup2/3 silently succeeded during race conditions might now receive -EBUSY errors. They may need to implement error handling for this scenario.
  • Impact on build: NO.
  • Impact on hardware: YES. This fix impacts RISC-V architectures and MPFS target.
  • Impact on documentation: NO. (Consider updating documentation if the -EBUSY error is a significant change.)
  • Impact on security: NO. The added mutex is carefully managed to prevent deadlocks or other security vulnerabilities.
  • Impact on compatibility: YES. Backward compatibility is maintained, but applications relying on the erroneous behavior might experience different results.
  • Anything else to consider: None.

Testing

Build Host(s): Linux (Ubuntu 20.04), x86_64, GCC 9.4.0

Target(s): RISC-V (rv-virt:ksmp64, rv-virt:knsh64), MPFS configuration.

Testing logs before change:

[ostest] Assertion failed at file: sched/sched_addreadytorun.c line: 482 task: 1048880

Testing logs after change:

[ostest] dup2 test passed!

By providing this level of detail, you make it much easier for reviewers to understand, verify, and approve your PR. Remember to adapt the example to your specific implementation.

filep = files_fget_by_index(list,
fd2 / CONFIG_NFILE_DESCRIPTORS_PER_BLOCK,
fd2 % CONFIG_NFILE_DESCRIPTORS_PER_BLOCK,
&new);

/* If return value is NULL, it means the file is partially open. This means
* the userspace is racing against itself. To prevent the kernel from
* crashing due to access to invalid file pointer, just make the user try
Copy link
Contributor
@xiaoxiang781216 xiaoxiang781216 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the spec require that open/close is atomic(https://man7.org/linux/man-pages/man2/dup.2.html):

       If the file descriptor newfd was previously open, it is closed
       before being reused; the close is performed silently (i.e., any
       errors during the close are not reported by dup2()).

       The steps of closing and reusing the file descriptor newfd are
       performed atomically.  This is important, because trying to
       implement equivalent functionality using [close(2)](https://man7.org/linux/man-pages/man2/close.2.html) and dup() would
       be subject to race conditions, whereby newfd might be reused
       between the two steps.  Such reuse could happen because the main
       program is interrupted by a signal handler that allocates a file
       descriptor, or because a parallel thread allocates a file
       descriptor.

and shouldn't return the error code(e.g. -EBUSY) in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the spec require that open/close is atomic(https://man7.org/linux/man-pages/man2/dup.2.html):

Yes I know this very well. open/close/dup is a minefield of race conditions and there is no known good way to handle it. My motivation was to prevent the kernel from crashing if the user app is poorly made. I did this with the minimal effort to the current implementation. This removes a kernel crash when using dup & close. I can leave this downstream if the solution is not acceptable.

and shouldn't return the error code(e.g. -EBUSY) in this case.

As per the -EBUSY return value, we could handle EBUSY in libc, so the user doesn't see this. This would be a completely POSIX compliant solution.

and shouldn't return the error code

Btw, a newer POSIX spec allows returning -1 to user, but doesn't mention EBUSY. We could return -1 and not touch errno ?
If the close operation fails to close fildes2, dup2() shall return -1 without changing the open file description to which fildes2 refers. [1]

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/dup.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But my understand is that the fail should come from the file system self, not vfs layer. At least, we should:

  1. retry the duplication either inside nx_dup3_from_tcb or dup3 in this specific race condition
  2. or add the lock between file_close_wait and files_fget_by_index to remove the race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is easily doable, I suggested that we do the retry in libc but we can do it here as well
  2. You mean add a file lock to file struct ?

@@ -462,6 +462,7 @@ struct file
int f_oflags; /* Open mode flags */
#ifdef CONFIG_FS_REFCOUNT
atomic_t f_refs; /* Reference count */
sem_t f_closem; /* Free: file is fully closed */
Copy link
Contributor
@xiaoxiang781216 xiaoxiang781216 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the spec(https://man7.org/linux/man-pages/man2/dup.2.html):

After a successful return, the old and new file descriptors may be
       used interchangeably.  Since the two file descriptors refer to the
       same open file description, they share file offset and file status
       flags; for example, if the file offset is modified by using
       [lseek(2)](https://man7.org/linux/man-pages/man2/lseek.2.html) on one of the file descriptors, the offset is also
       changed for the other file descriptor.

       The two file descriptors do not share file descriptor flags (the
       close-on-exec flag).  The close-on-exec flag (FD_CLOEXEC; see
       [fcntl(2)](https://man7.org/linux/man-pages/man2/fcntl.2.html)) for the duplicate descriptor is off.

we should split file into two struct:

struct file
{
#ifdef CONFIG_FS_REFCOUNT
  atomic_t          f_refs;     /* Reference count */
#endif
  off_t             f_pos;      /* File position */
  FAR struct inode *f_inode;    /* Driver or file system interface */
  FAR void         *f_priv;     /* Per file driver private data */
#if CONFIG_FS_LOCK_BUCKET_SIZE > 0
  bool              locked; /* Filelock state: false - unlocked, true - locked */
#endif
};

struct fd
{
  struct file      *file;
  int               f_oflags;   /* Open mode flags */
#ifdef CONFIG_FDSAN
  uint64_t          f_tag_fdsan; /* File owner fdsan tag, init to 0 */
#endif
#ifdef CONFIG_FDCHECK
  uint8_t           f_tag_fdcheck; /* File owner fdcheck tag, init to 0 */
#endif
#if CONFIG_FS_BACKTRACE > 0
  FAR void         *f_backtrace[CONFIG_FS_BACKTRACE]; /* Backtrace to while file opens */
#endif
};

with this separation, we can avoid the crash natively since struct fd can be overwrite directly without any problem.

Copy link
Contributor Author
@pussuw pussuw May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this would be better. However, it is not clear to me which implementation is correct. POSIX says

The dup2() function shall cause the file descriptor fildes2 to refer to the same open file description as the file descriptor fildes and to share any locks, and shall return fildes2.

NuttX dup2/3 uses two file structs for oldfd and newfd, so they point to different file structs and do not share file status or offset.

Linux does this differently, dup2/3 makes filep2 = filep1, and when accessing fd1 and fd2 the file status and position change for both descriptors (as they are the same file struct).

Which is correct ? The POSIX spec is ambiguous in this respect. I think our implementation is wrong but changing this functionality can create a MASSIVE regression for people who depend on our (wrong) implementation of dup2/3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try this separation tomorrow, it seems simple enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this would be better. However, it is not clear to me which implementation is correct. POSIX says

The dup2() function shall cause the file descriptor fildes2 to refer to the same open file description as the file descriptor fildes and to share any locks, and shall return fildes2.

NuttX dup2/3 uses two file structs for oldfd and newfd, so they point to different file structs and do not share file status or offset.

Linux does this differently, dup2/3 makes filep2 = filep1, and when accessing fd1 and fd2 the file status and position change for both descriptors (as they are the same file struct).

Which is correct ? The POSIX spec is ambiguous in this respect.

Linux implementation is correct, since POSIX spec explicitly state that:

  1. offset and lock state should be shared after dup
  2. open flag should be separated after dup

I think our implementation is wrong but changing this functionality can create a MASSIVE regression for people who depend on our (wrong) implementation of dup2/3.

It's always the top priority for NuttX to strictly compliant with POSIX standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux implementation is correct, since POSIX spec explicitly state that:

Linux man pages state this. Of course Linux man pages would describe how the Linux implementation works. The POSIX spec at https://pubs.opengroup.org/onlinepubs/9699919799/functions/dup.html is not this explicit.

But I agree that the Linux implementation must be right, since POSIX says that fd2 should refer to the same file description as fd1, meaning they are the same thing, meaning our implementation is wrong.

I will not fix this though, I'm trying to fix a crash, not re-write the whole file descriptor subsystem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, I prefer to fix the root cause, since it not only more compliant to spec, but also fix your problem more elegant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have time to use the correct approach, @Donny9 could find some resource fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a comprehensive solution as well, this fix is a bit hacky I agree.

I'm analyzing how much work this would be. My initial estimate was a bit "optimistic".

Copy link
Contributor Author
@pussuw pussuw May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxiang781216 I'll do the proper fix. Closing this PR as obsolete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, let's make nuttx better and better.

@pussuw pussuw closed this May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: File System File System issues Size: S The size of the change in this PR is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0