BIO00058M-case-study

Functional Heterogeneity in Mesenchymal Stromal Cell (MSC) Subtypes

Project Description

This project investigates the functional heterogeneity of mesenchymal stromal cell (MSC) subtypes using proteomic analysis. MSCs are a subset of non-hematopoietic adult stem cells with the potential to differentiate into multiple cell types, making them crucial for regenerative medicine. However, MSC populations exhibit heterogeneity in their differentiation and immunomodulatory capacities. This study analyzes the soluble protein fraction of five immortalized MSC lines using mass spectrometry to understand differences in protein expression. The analysis includes data preprocessing, normalization, principal component analysis (PCA), hierarchical clustering, and visualization.

Start Date: 02-10-2018

Last Updated: Analysis 24-11-2021, README 28-02-2025

Contact Information: Emma Rand (emma.rand@york.ac.uk)

Project Organisation

/BIO00058M-case-study
│── data-raw/          # Raw data files
│   ├── Y101_Y102_Y201_Y202_Y101-5.csv  # Normalized protein abundances
│   ├── comparison_p_and_q.csv          # Pairwise comparisons p/q values
│── data/              # Processed data
│── reports/           # Final report i(HTML) linked above
│── R/                 # R scripts for analysis
│   ├── 00-pkg.R       # Loads required packages
│   ├── data_summary.R # Functions for summary statistics
│   ├── 01-labels.R    # Cell lineage and functionality labels
│   ├── 02-palette.R   # Color scheme for visualization
│   ├── 03-banner.R    # Generates a cell-type banner
│── sessioninfo.md     # R session information
│── README.md          # Project documentation
│── BIO00058M-case-study.Rproj  # RStudio project file

Software Requirements

Programming Languages and Environments

R (version 4.1.1, 2021-08-10)
RStudio 2021.09 "Ghost Orchid"
Platform: Windows 10 x64 (build 18363)

R Packages Used

tidyverse (1.3.1)
janitor (2.1.0)
ggplot2 (3.3.5)
GGally (2.1.2)
heatmaply (1.3.0)
plotly (4.10.0)
VennDiagram (1.7.0)
patchwork (1.1.1)
bookdown (0.24)
sessioninfo

To install all required packages, use:

install.packages(c("tidyverse", "janitor", "ggplot2", "GGally", 
                   "heatmaply", "plotly", "VennDiagram", "patchwork", 
                   "bookdown", "sessioninfo"))

Data Description

The dataset consists of mass spectrometry data from five MSC subtypes, focusing on soluble protein fractions.

Data Files

Y101_Y102_Y201_Y202_Y101-5.csv (Normalized protein abundances)
- Rows: Proteins
- Columns:
  - Accession: Protein identifier (Uniprot ID)
  - Peptide count: Number of peptides used to identify the protein
  - Unique peptides: Number of unique peptides used for identification
  - Confidence score: Score representing confidence in protein identification
  - Anova (p): P-value from one-way ANOVA for the effect of cell line
  - q Value: False discovery rate (FDR)-adjusted p-value (Benjamini-Hochberg correction)
  - Max fold change: Maximum fold-change between the highest and lowest mean expression across cell lines
  - Power: Statistical power of the ANOVA test
  - Highest mean condition: Cell line with the highest mean expression
  - Lowest mean condition: Cell line with the lowest mean expression
  - Mass: Protein mass
  - Description: Protein description (includes species origin)
  - Normalized Abundance Columns:
    - Y101_A, Y101_B, Y101_C: Replicates for cell line Y101
    - Y102_A, Y102_B, Y102_C: Replicates for cell line Y102
    - Y201_A, Y201_B, Y201_C: Replicates for cell line Y201
    - Y202_A, Y202_B, Y202_C: Replicates for cell line Y202
    - Y1015_A, Y1015_B, Y1015_C: Replicates for cell line Y101.5
  - >1pep: Binary indicator for whether at least two peptides were detected for a protein.
comparison_p_and_q.csv (Pairwise comparisons between cell lines)
- Columns:
  - Pairwise p-values and q-values for differential protein abundance tests

Instructions for Use

Running the Analysis

Set up the environment
- Open BIO00058M-case-study.Rproj in RStudio.
Generate the report
- Knit main.Rmd in RStudio to generate reports in HTML, PDF, or Word.
- Knit directory of report/main.Rmd is document directory.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
R		R
data-raw		data-raw
report		report
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
BIO00058M-case-study.Rproj		BIO00058M-case-study.Rproj
LICENSE.md		LICENSE.md
README.html		README.html
README.md		README.md
sessioninfo.md		sessioninfo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BIO00058M-case-study

Functional Heterogeneity in Mesenchymal Stromal Cell (MSC) Subtypes

Project Description

Start Date: 02-10-2018

Last Updated: Analysis 24-11-2021, README 28-02-2025

Contact Information: Emma Rand (emma.rand@york.ac.uk)

Project Organisation

Software Requirements

Programming Languages and Environments

R Packages Used

Data Description

Data Files

Instructions for Use

Running the Analysis

About

Releases

Packages

Languages

License

3mmaRand/BIO00058M-case-study

Folders and files

Latest commit

History

Repository files navigation

BIO00058M-case-study

Functional Heterogeneity in Mesenchymal Stromal Cell (MSC) Subtypes

Project Description

Start Date: 02-10-2018

Last Updated: Analysis 24-11-2021, README 28-02-2025

Contact Information: Emma Rand (emma.rand@york.ac.uk)

Project Organisation

Software Requirements

Programming Languages and Environments

R Packages Used

Data Description

Data Files

Instructions for Use

Running the Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages