[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalized contact value? #80

Open
yanchunzhang opened this issue Jul 13, 2022 · 7 comments
Open

Normalized contact value? #80

yanchunzhang opened this issue Jul 13, 2022 · 7 comments

Comments

@yanchunzhang
Copy link

Hi, do you calculate a normalized contact value in the final bed files? I checked the file but didn't see it. Just like the normalized value in Hi-C contact map.

And what's your recommend cutoff for Qvalue? 0.01 or 0.05?

Thanks,
Yanchun

@souryacs
Copy link
Contributor

Hi @yanchunzhang
We do not explicitly compute a normalized contact count value. However, if you are using HiC-pro to align the input fastq files, you may use HiC-pro output contact matrices generated by ICE normalization.
I'd recommend using q-value = 0.01. You can customize the q-value in the configuration file.

@yanchunzhang
Copy link
Author

Thanks a lot!
I used Q-value 0.01 on a mouse sample and only got around 10k-12k significant loops, which are much less than I expected. According to your experience, does that means the sample preparation is not in good-quality or failed?

Thanks,
Yanchun

@ay-lab
Copy link
Owner
ay-lab commented Jul 19, 2022

Hi @yanchunzhang
Which model did you use to call the loops? Is it the stringent model (P2P=1) or the loose (P2P=0) and which bias correction did you use (ICE or coverage)? I'd recommend using the coverage bias and testing with both loose and stringent background models. But 10k-12k significant loops are quite OK, specifically if it is from the P2P=1 model. What is the total sequencing depth of your library? Regarding QC, you can check the HiCPro output logs, like the number of duplicate reads, number and fraction of CIS reads, the fraction of CIS reads > 10 Kb, etc.

@YichaoOU
Copy link

Hi @ay-lab ,

I have a fithichip result with only 300+ merged peak to peak loops (q-value<=0.05) using the loose (P2P=0) and coverage bias setting. My QC is good according to: https://hichip.readthedocs.io/en/latest/library_qc.html. Total sequencing depth is 200M+, number of PCR duplicate reads is 40% (88M), No-Dup Cis Read Pairs < 1kb is 71M.

image

My total number of peaks provided to the fithichip program is ~40K.

Do you know why I only got 300+ loops?

Thanks,
Yichao

@ay-lab
Copy link
Owner
ay-lab commented Sep 13, 2022

Hi, your QC results show you are effectively left with 11.5M or so valid reads that are >10kb and useful. So you are working with quite sparse data. You haven't mentioned what resolution but likely you may want to consider 20kb or lower resolution for analysis

@YichaoOU
Copy link

@ay-lab Thank you!

@YichaoOU
Copy link

Hi @ay-lab

I tried different bin sizes and q-value/p-value cutoffs for peak-to-all loop calling. Here is a screenshort at our ROI:

image

They just look so different. It's likely that smaller bin (2.5k, 5k) will have lower read counts and thus more variance and low confidence, so they look quite different. But I'm not sure how to explain 20kb significant loops vs 5kb significant loops (0.05 qvalue), there is no overlap at all. Even if we increase to 0.5 qvalue at 5kb bin, those two short interactions in the 20kb track do not show up in the 5kb track.

We have micro-capture-C data at this ROI, so we think currently loops from 5kb bin and 0.5 qvalue look better, but this q-value is so insignificant.

I'm wondering if you have any thoughts.

Thanks,
Yichao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants