8000 Some questions about input file making · Issue #73 · rwdavies/STITCH · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Some questions about input file making #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms 8000 of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RADIOMUMM opened this issue Nov 17, 2022 · 5 comments
Open

Some questions about input file making #73

RADIOMUMM opened this issue Nov 17, 2022 · 5 comments

Comments

@RADIOMUMM
Copy link

Hi Robbie,
I'm a little confused about pos files.
What did this file generate from those files? Is it a reference panel or a VCF file after variant calling?
But if it is generated from a VCF file after variant calling, is the site "3 42331 A G,T" removed?

Best
jennis

@rwdavies
Copy link
Owner

Hi,

pos contains the list of sites you want to impute. One potential source of that is after initial variant calling and filtering. If you have a VCF list of sites you can make the pos file using code something like the following, possibly changing the header to match what's asked for

gunzip -c sites.vcf.gz | cut -f1,2,4,5 > pos.txt

The site "3 42331 A G,T" will not be accepted by STITCH as STITCH can only impute bi-allelic variants for now. So you could do "3 42331 A G" or "3 42331 A T" but not both G and T

Best,
Robbie

@RADIOMUMM
Copy link
Author
RADIOMUMM commented Nov 18, 2022

Hi,

Thanks for your answer, but I have another question:
If I do not do variant calling by chromosomes, then the processed pos file will be a file containing all bi-allelic SNP of 1-12 chromosomes, but STITCH is indeed imputed by chromosome, is there any good suggestion?

Best,
jennis

@suhuan0327
Copy link

Hi,

Thanks for your answer, but I have another question: If I do not do variant calling by chromosomes, then the processed pos file will be a file containing all bi-allelic SNP of 1-12 chromosomes, but STITCH is indeed imputed by chromosome, is there any good suggestion?

Best, jennis

HI,

I have the same issue, how did you solve it.

Thanks,
Su

@rwdavies
Copy link
Owner

I think in this instance, I would just split the file into one file per chromosome, and impute each chromosome seperately

Something like

for CHR in `echo 1 2 3`
do
  gunzip -c sites.vcf.gz ${CHR} | cut -f1,2,4,5 > pos.${CHR}.txt
  # impute here using pos.${CHR}.txt
done

@suhuan0327
Copy link

I think in this instance, I would just split the file into one file per chromosome, and impute each chromosome seperately

Something like

for CHR in `echo 1 2 3`
do
  gunzip -c sites.vcf.gz ${CHR} | cut -f1,2,4,5 > pos.${CHR}.txt
  # impute here using pos.${CHR}.txt
done

Hi,

Thank you for your reply, and I also used the same way to deal with my files.

Best,
Su

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0