8000 merge the dev to master to 2.0 version release by jianhong · Pull Request #86 · nf-core/hicar · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

merge the dev to master to 2.0 version release #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8000
Open
wants to merge 203 commits into
base: master
Choose a base branch
from
Open

merge the dev to master to 2.0 version release #86

wants to merge 203 commits into from

Conversation

jianhong
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/hicar branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

jianhong and others added 30 commits May 3, 2022 14:51
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
@edmundmiller edmundmiller self-requested a review July 12, 2023 20:48
Copy link
@edmundmiller edmundmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great effort, and it's obvious you've put a lot of time into this!

It seems there are a couple of repetitive things that I'd like to see addressed (beyond what @mirpedrol and @SPPearce pointed out)

  • Inline scripts in Nextflow files. Those should be in bin/ or in a custom module. Just a nf-core thing, it makes it easier to maintain. I'm guessing you did that to be able to test the nextflow local modules, but since you've written so many custom scripts, I think it might be better to just test them with testthat, or another CLI tester. They're really extensive, so it would be nice to have confidence in them!
  • I'd like to see the hicexplorer, homer, etc. modules live in nf-core/modules
  • Make an image for those scary java tools 🙈
  • Homer modules need to use fasta input rather than the outside of Nextflow genome install. I'm not sure if it's working as intended as is, it would install the genome in the process' workdir.
  • Clean up subworkflows only outputting versions, and empty outputs. It'll make it more maintainable.

So I don't think this is entirely up to nf-core standards of maintainability, but I think in the name of progress we merge this and maybe tag it with 2.0.0-rc or something if 2.0.0 is controversial. Also, up to @jianhong and how they want to release it.

Before I would approve this PR, I'd like to see a roadmap tracked in the milestones of these issues to be addressed from this PR (and maybe with a list of modules to move scripts out of, for example). I think those will be easier to address in separate, smaller PRs.

We should also make sure proper repo access is given to @jianhong, so they can work on branches in this repo and create smaller PRs, so the review process doesn't take multiple months, and so we can review the scientific ideas in the code changes.

CHANGELOG.md Outdated
Comment on lines 8 to 38
- update to nf-core-template-2.7.2
- replace the dots by '\_' in the samples name
- add support for MseI.
- add TAD, AB compartments, APA analysis (see available tools in usage documentation)
- add additional methods for interaction caller
- adjust default resource requirement
- update the CITAIONS
- re-arrange the output folder to meet the requirements of multiple analysis tools
- updated multipe documentations
- remove the parameter for java resources requirements
- update annotations from 5k,10k bins to nearest R2 peaks
- removed local_modules tests
- add scale factor to atac reads coverage calculation
- change the circos plot style
- update reads_summary table to include unmapped/multiple mapped info
- add the possibility to subsample to balance the input reads
- add parameter to let user input the 5 end sequence to cutadapt step
- change the cutadapt error tolerance from 0 to 0.15
- changes in hipeak caller:
- add filter condition for lowest count number as 1.
- change the cutoff value for hipeak type assignment step from 12 to 3.
- fix multiple bugs:
- the sorting method for huge bed file;
- the post count for hipeak when there is empty Interactions;
- add totalLinks parameter for prepare_circos;
- the issue if bplapply does not work in differential analysis for hipeak;
- fix the space issue for enzyme_cut.nf;
- fix the chromosome style for homer TFEA analysis and annotation by ChIPpeakAnno;
- fix the duplicated imported modules;
- fix multiple typos.
- use local SAMTOOLS_SORT module;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask if there were any links to these. But then I remembered this was all done on a fork. Maybe link some commits?

I think we need to add something to the nf-core docs for people to reach out if they're developing on a pipeline.

WT,2,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T2/WT_rep2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T2/WT_rep2_R2.fastq.gz,,
KD,1,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T2/KD_rep1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T2/KD_rep1_R2.fastq.gz,,
KD,2,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T2/KD_rep2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T2/KD_rep2_R2.fastq.gz,,
WT.dot,1,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T1/WT_rep1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/hicar/data/genomics/homo_sapiens/T1/WT_rep1_R2.fastq.gz,,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would break stuff, or are the groups separated using . ?

conf/test.config Outdated

// Output folder
outdir = 'test_results'
outdir = 'results'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these were removed because they could cause people using it on cloud to write the results to never write to a bucket. It is just the test config, just a consideration.

@@ -18,7 +20,7 @@ params {
input = "${projectDir}/assets/test_full_samplesheet.csv"

// Output folder
outdir = 'full_test_results'
outdir = 'results'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above comment, this one probably doesn't need to be set by default, it'll just result in a foot gun that's worse than users have to specify --outdir results

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this isn't a script in bin/?


input:
tuple val(meta), path(hic)
path hic_tools_jar

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a container. I'm guessing it's https://github.com/aidenlab/HiCTools.

Nvm I looked at it and that looks like a nightmare, I guess I'll get to the config and this points at the jar download?

< 67ED p class="ml-1 mb-2 mt-2" data-show-on-error hidden> Sorry, something went wrong.

Comment on lines +13 to +14
val long_bedpe_postfix
val short_bed_postfix

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be args in a config

Comment on lines 19 to 24
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(echo \$(python --version) | sed 's/Python //')
END_VERSIONS

restriction_enzyme_cutsite.py $enzyme

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(echo \$(python --version) | sed 's/Python //')
END_VERSIONS
restriction_enzyme_cutsite.py $enzyme
restriction_enzyme_cutsite.py $enzyme
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(echo \$(python --version) | sed 's/Python //')
END_VERSIONS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could why this is a local module get documented? What'st the reason for not using a patch and the nf-core module? That will help keep this maintainable.

nextflow.config Outdated

// Boilerplate options
outdir = null
outdir = 'results'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
outdir = 'results'
outdir = null

Copy link
Member
@JoseEspinosa JoseEspinosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge work @jianhong! 😄
Apart from my suggestions in the code, I agree with @emiller88 summary

README.md Outdated
@@ -22,7 +19,7 @@ The pipeline can also handle the experiment of HiChIP, ChIA-PET, and PLAC-Seq. I

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/hicar/results).
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/hicar/results).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/hicar/results).
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/hicar/results).

process {
publishDir = [
path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: false
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to enable it to false by default again and not need to declare some processes just to avoid the publication of its files (e.g. GUNZIP)?


// Output folder
outdir = 'test_results'
input = "${projectDir}/assets/test_multi_samplesheet.csv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be available in test-datasets instead ?

Copy link
@edmundmiller edmundmiller Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like to keep the test samplesheets in the repo as examples as well. That ensures they stay up to date! 😆

@@ -17,9 +17,6 @@ params {
// Input data for full size test
input = "${projectDir}/assets/test_full_samplesheet.csv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my comment above

@@ -20,17 +20,17 @@ params {
// Input data for full size test
input = "${projectDir}/assets/test_multi_samplesheet.csv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, and any other input instance in the configs

genome).tads
ch_versions = HOMER_FINDTADSANDLOOPS_TADS.out.versions


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

HOMER_FINDMOTIFSGENOME(ch_new_bed, additional_param)
ch_versions = ch_versions.mix(HOMER_FINDMOTIFSGENOME.out.versions)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

break
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change



emit:
versions = ch_versions // channel: [ versions.yml ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here only version is emitted, is this correct?

Comment on lines +24 to +25
versions = ch_versions // channel: [ versions.yml ]
v4c = VIRTUAL4C_BY_COOLTOOLS.out.v4c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
versions = ch_versions // channel: [ versions.yml ]
v4c = VIRTUAL4C_BY_COOLTOOLS.out.v4c
v4c = VIRTUAL4C_BY_COOLTOOLS.out.v4c
versions = ch_versions // channel: [ versions.yml ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add channel content description

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Comment on lines +9 to +55
if: >
contains(github.event.comment.html_url, '/pull/') &&
contains(github.event.comment.body, '@nf-core-bot fix linting') &&
github.repository == 'nf-core/hicar'
runs-on: ubuntu-latest
steps:
# Use the @nf-core-bot token to check out so we can push later
- uses: actions/checkout@v3
with:
token: ${{ secrets.nf_core_bot_auth_token }}

# Action runs on the issue comment, so we don't get the PR by default
# Use the gh cli to check out the PR
- name: Checkout Pull Request
run: gh pr checkout ${{ github.event.issue.number }}
env:
GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }}

- uses: actions/setup-node@v3

- name: Install Prettier
run: npm install -g prettier @prettier/plugin-php

# Check that we actually need to fix something
- name: Run 'prettier --check'
id: prettier_status
run: |
if prettier --check ${GITHUB_WORKSPACE}; then
echo "result=pass" >> $GITHUB_OUTPUT
else
echo "result=fail" >> $GITHUB_OUTPUT
fi

- name: Run 'prettier --write'
if: steps.prettier_status.outputs.result == 'fail'
run: prettier --write ${GITHUB_WORKSPACE}

- name: Commit & push changes
if: steps.prettier_status.outputs.result == 'fail'
run: |
git config user.email "core@nf-co.re"
git config user.name "nf-core-bot"
git config push.default upstream
git add .
git status
git commit -m "[automated] Fix linting with Prettier"
git push

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 17 days ago

To fix the issue, we need to add a permissions block to the workflow. Since the workflow involves reading repository contents, checking out pull requests, and pushing changes, we will explicitly define the required permissions. The minimal permissions required are:

  • contents: write (to push changes to the repository).
  • pull-requests: write (to interact with pull requests).

The permissions block should be added at the root level of the workflow to apply to all jobs unless overridden.


Suggested changeset 1
.github/workflows/fix-linting.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml
--- a/.github/workflows/fix-linting.yml
+++ b/.github/workflows/fix-linting.yml
@@ -5,2 +5,6 @@
 
+permissions:
+  contents: write
+  pull-requests: write
+
 jobs:
EOF
@@ -5,2 +5,6 @@

permissions:
contents: write
pull-requests: write

jobs:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +29 to +33
- name: Install Prettier
run: npm install -g prettier @prettier/plugin-php

# Check that we actually need to fix something
- name: Run 'prettier --check'

Check failure

Code scanning / CodeQL

Checkout of untrusted code in a privileged context Critical

Potential execution of untrusted code on a privileged workflow (
issue_comment
)
Comment on lines +42 to +46
- name: Run 'prettier --write'
if: steps.prettier_status.outputs.result == 'fail'
run: prettier --write ${GITHUB_WORKSPACE}

- name: Commit & push changes

Check failure

Code scanning / CodeQL

Checkout of untrusted code in a privileged context Critical

Potential execution of untrusted code on a privileged workflow (
issue_comment
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0