-
Notifications
You must be signed in to change notification settings - Fork 353
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS
documentation
Improvements or additions to documentation
module: fsdp
question
Further information is requested
#1147
opened Apr 27, 2025 by
ChenchaoZhao
fully_shard() for huggingface model: pytorch caches too much GPU memory
#1126
opened Apr 21, 2025 by
mingdianliu
[DeepSeek MoE] current workstream planning
enhancement
New feature or request
#1125
opened Apr 21, 2025 by
lessw2020
Llama 4 issue tracking
high priority
triage review
#1118
opened Apr 17, 2025 by
tianyu-l
1 of 12 tasks
FSDP2 root level parameter management
module: fsdp
question
Further information is requested
#1091
opened Apr 11, 2025 by
dingqingy
Torch.compile and TP during multiresolution Training
module: torch.compile
question
Further information is requested
#1081
opened Apr 9, 2025 by
nighting0le01
Is the currnet configuration system over-engineered?
question
Further information is requested
#1055
opened Apr 3, 2025 by
wangkuiyi
Clarify PP split point documentation.
question
Further information is requested
#1054
opened Apr 3, 2025 by
githubsgi
Overflow in
F.scaled_dot_product_attention
when using profiling with deterministic training
#1049
opened Apr 3, 2025 by
JungHoyoun
How are the TP, CP, and PP marked in PyTorch profiler traces ?
#1044
opened Apr 2, 2025 by
githubsgi
Context parallel on Turing GPUs?
module: context parallel
question
Further information is requested
#1034
opened Mar 31, 2025 by
dingqingy
Linear layer weights are in float32 ?
question
Further information is requested
#1027
opened Mar 28, 2025 by
githubsgi
Is a PP+FSDP+TP config + toml available for pre-training 405B model ?
#986
opened Mar 19, 2025 by
githubsgi
Previous Next
ProTip!
Follow long discussions with comments:>50.