Last updated: 2022-08-25

Checks: 7 0

Knit directory: rotation2/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220607) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4d8f9d5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Unstaged changes:
    Modified:   .RData
    Modified:   .Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/project_1.Rmd) and HTML (docs/project_1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 268c781 chenh19 2022-08-25 Build site.
html d5675d8 chenh19 2022-08-25 Build site.
html 6f9b2d0 chenh19 2022-08-25 Build site.
html eb7f6e2 chenh19 2022-08-25 Build site.
html 06d8f5b chenh19 2022-08-25 Build site.
html 481c95e chenh19 2022-08-25 Build site.
html 4c40591 chenh19 2022-08-25 update
html 210ce87 chenh19 2022-08-25 Build site.
html 7126807 chenh19 2022-08-25 Build site.
html e576eee chenh19 2022-08-25 Build site.
html 3b152ba chenh19 2022-08-25 Build site.
html abd8577 chenh19 2022-08-25 Build site.
html fb0804c chenh19 2022-08-25 Build site.
html 46c0eee chenh19 2022-08-25 Build site.
html 4bcd287 Hang Chen 2022-08-11 Build site.
html 316e143 Hang Chen 2022-08-11 Build site.
html 2ecff7a Hang Chen 2022-08-11 Build site.
html 1a7329a Hang Chen 2022-08-11 Build site.
html d98b8ed Hang Chen 2022-08-10 Build site.
html afbc7f7 chenh19 2022-08-09 Build site.
html f3ddb60 chenh19 2022-08-09 Build site.
html aeeaff4 chenh19 2022-08-09 Build site.
html faa4371 chenh19 2022-08-09 Build site.
html 40d7285 chenh19 2022-08-09 Build site.
html 51ff922 chenh19 2022-08-09 Build site.
html 60bb670 chenh19 2022-08-09 Build site.
html 5035de6 chenh19 2022-08-09 Build site.
html c710b98 chenh19 2022-08-09 Build site.
html 9c8a125 chenh19 2022-08-09 Build site.
html 759a524 chenh19 2022-08-09 Build site.
html 0137730 chenh19 2022-08-09 Build site.
html 306629d chenh19 2022-08-09 Build site.
html a2f14fc chenh19 2022-08-09 Build site.
html afe9e5e chenh19 2022-08-09 Build site.
html 7f87abe chenh19 2022-08-09 Build site.
html f7a30e6 chenh19 2022-08-08 Build site.
html ae6dc22 chenh19 2022-08-08 Build site.
html 31345e3 chenh19 2022-08-08 Build site.
Rmd 14daeb5 chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html ab573d7 chenh19 2022-08-08 Build site.
html 82d0b8a chenh19 2022-08-08 Build site.
html 63022aa chenh19 2022-08-08 Build site.
html b4ec414 chenh19 2022-08-08 Build site.
html fb4fa31 chenh19 2022-08-08 Build site.
html 03df33f chenh19 2022-08-08 Build site.
html ec5763a chenh19 2022-08-08 Build site.
Rmd d6b5331 chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html 870ec95 chenh19 2022-08-08 Build site.
Rmd 26c6105 chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html 4df0f61 chenh19 2022-08-08 Build site.
html 8bad269 chenh19 2022-08-08 Build site.
Rmd e769869 chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html 6e10041 chenh19 2022-08-08 Build site.
Rmd e57b4ef chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html b045b81 chenh19 2022-08-08 Build site.
Rmd 26a8d3b chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html e22ab6b chenh19 2022-08-08 Build site.
Rmd b57b687 chenh19 2022-08-08 wflow_publish("./analysis/*.Rmd")
html 87b3f9d chenh19 2022-08-08 Build site.
Rmd 9a06a06 chenh19 2022-08-08 update
html 78b6bd6 chenh19 2022-08-08 Build site.
html 60fabb8 Hang Chen 2022-08-08 Build site.
html cee42b8 Hang Chen 2022-08-05 Build site.
html 6927e45 Hang Chen 2022-08-04 Build site.
html 551a34f Hang Chen 2022-08-04 Build site.
html 80908a7 Hang Chen 2022-08-04 Build site.
html 2623d6b Hang Chen 2022-08-04 Build site.
html e9d9966 Hang Chen 2022-08-04 Build site.
html 57d96a8 Hang Chen 2022-08-04 update
Rmd 05b3310 Hang Chen 2022-08-04 update
html 05b3310 Hang Chen 2022-08-04 update
html 37d15c9 chenh19 2022-07-19 Build site.
html 8f6816e chenh19 2022-07-19 Build site.
html 4a94b94 chenh19 2022-07-19 Build site.
Rmd a18fc1f chenh19 2022-07-19 wflow_publish("./analysis/*.Rmd")
html 870115f chenh19 2022-07-19 Build site.
Rmd 6307f5c chenh19 2022-07-19 wflow_publish("./analysis/*.Rmd")
html 9241fe6 chenh19 2022-07-08 Build site.
Rmd abd8a0c chenh19 2022-07-08 wflow_publish("./analysis/*.Rmd")
html 0a7633b chenh19 2022-07-07 Build site.
Rmd 06d53d1 chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html 63535ad chenh19 2022-07-07 Build site.
Rmd fe2a82b chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
Rmd 9898acd chenh19 2022-07-07 update
html feb5923 chenh19 2022-07-07 Build site.
Rmd 9eae283 chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html c849244 chenh19 2022-07-07 Build site.
Rmd 29eb161 chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html 7ab5d71 chenh19 2022-07-07 Build site.
Rmd 2baffc8 chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html 625e7ca chenh19 2022-07-07 Build site.
Rmd bb3e1aa chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html 0e1d92d chenh19 2022-07-07 Build site.
Rmd 138b4fe chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html 24d6bcf chenh19 2022-07-07 Build site.
Rmd 941e24b chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html 3a14766 chenh19 2022-07-07 Build site.
Rmd d425ac9 chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
html ca77db2 chenh19 2022-07-07 Build site.
Rmd 608cdf8 chenh19 2022-07-07 wflow_publish("./analysis/*.Rmd")
Rmd c639389 chenh19 2022-07-06 update
html e0cb7a2 chenh19 2022-06-28 Build site.
Rmd ac33ff0 chenh19 2022-06-28 wflow_publish("./analysis/*.Rmd")
html bf05c98 chenh19 2022-06-28 Build site.
Rmd 1c8bb9a chenh19 2022-06-28 wflow_publish("./analysis/*.Rmd")
html 7cfb685 chenh19 2022-06-28 Build site.
Rmd 82fce8f chenh19 2022-06-28 wflow_publish("./analysis/*.Rmd")
html 43dad9c chenh19 2022-06-28 Build site.
Rmd 20718e9 chenh19 2022-06-28 wflow_publish("./analysis/*.Rmd")
html 7aa3172 chenh19 2022-06-28 Build site.
Rmd e0425d8 chenh19 2022-06-28 wflow_publish("./analysis/*.Rmd")
html 2254608 chenh19 2022-06-28 Build site.
Rmd 754150e chenh19 2022-06-28 wflow_publish("./analysis/*.Rmd")
html 968a4b6 chenh19 2022-06-23 Build site.
html 229f924 chenh19 2022-06-23 Build site.
Rmd c7847bd chenh19 2022-06-23 wflow_publish("./analysis/*.Rmd")
html 9750134 chenh19 2022-06-23 Build site.
Rmd 0064ff3 chenh19 2022-06-23 wflow_publish("./analysis/*.Rmd")
html 4b38bf6 chenh19 2022-06-23 Build site.
Rmd 4eb11c9 chenh19 2022-06-23 wflow_publish("./analysis/*.Rmd")
html 8a7980d chenh19 2022-06-23 Build site.
html 3e1478e chenh19 2022-06-22 Build site.
html 5258612 chenh19 2022-06-22 Build site.
Rmd ecfd58d chenh19 2022-06-22 wflow_publish("./analysis/*.Rmd")
html 2382868 chenh19 2022-06-22 Build site.
Rmd 694470f chenh19 2022-06-22 wflow_publish("./analysis/*.Rmd")
html 1144127 chenh19 2022-06-22 Build site.
html da2c0fe chenh19 2022-06-21 Build site.
html 8aa5960 chenh19 2022-06-21 Build site.
Rmd 3dea9b1 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 6783fa3 chenh19 2022-06-21 Build site.
Rmd 6699b8f chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html d9be701 chenh19 2022-06-21 Build site.
Rmd f93179d chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 82e6e50 chenh19 2022-06-21 Build site.
Rmd f753baa chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html d376ad0 chenh19 2022-06-21 Build site.
Rmd 4a9db6a chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 325a212 chenh19 2022-06-21 Build site.
Rmd d54cc77 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html e14c55c chenh19 2022-06-21 Build site.
Rmd 2ab041a chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html f971bbd chenh19 2022-06-21 Build site.
html dca882f chenh19 2022-06-21 Build site.
html 3a80eaf chenh19 2022-06-21 Build site.
Rmd b6fb1d2 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 33829a5 chenh19 2022-06-21 Build site.
Rmd a949aec chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 5b446cf chenh19 2022-06-21 Build site.
Rmd c4ad45d chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 28a06ee chenh19 2022-06-21 Build site.
Rmd 53f2292 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html d5b4ff0 chenh19 2022-06-21 Build site.
Rmd e982ac7 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 16400a7 chenh19 2022-06-21 Build site.
Rmd d6aa5b2 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html a324166 chenh19 2022-06-21 Build site.
Rmd ab8cbc3 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 27a4d51 chenh19 2022-06-21 Build site.
Rmd 624e791 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html f765024 chenh19 2022-06-21 Build site.
html bc55dbb chenh19 2022-06-21 Build site.
Rmd 8b66c3b chenh19 2022-06-21 update
html b3b7ed6 chenh19 2022-06-21 Build site.
Rmd 405116a chenh19 2022-06-21 update
html 405d57e chenh19 2022-06-21 Build site.
Rmd bbbaab0 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html c10a1a8 chenh19 2022-06-21 Build site.
Rmd a8f7999 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 2443e38 chenh19 2022-06-21 Build site.
Rmd 8e0c8ad chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 5d192d2 chenh19 2022-06-21 Build site.
Rmd 356550c chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 21e9501 chenh19 2022-06-21 Build site.
Rmd 4a8f7ee chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html b002776 chenh19 2022-06-20 Build site.
Rmd 6d78198 chenh19 2022-06-20 wflow_publish("./analysis/*.Rmd")
Rmd 0e1817a chenh19 2022-06-19 update
html 6211c60 chenh19 2022-06-15 Build site.
Rmd 46e8cc3 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html 0da18e2 chenh19 2022-06-15 Build site.
Rmd d999122 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html 0aff555 chenh19 2022-06-15 Build site.
Rmd eda2d56 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html a4e1e73 chenh19 2022-06-15 Build site.
Rmd 7229c17 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html f0e98f9 chenh19 2022-06-14 Build site.
Rmd e0aa022 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html eafa16b chenh19 2022-06-14 Build site.
Rmd 69b29f1 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html dfd60ce chenh19 2022-06-14 Build site.
Rmd 49f1922 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html fd7271e chenh19 2022-06-14 Build site.
html 1b4d12e chenh19 2022-06-14 Build site.
html a6c402d chenh19 2022-06-14 Build site.
html cedad99 chenh19 2022-06-14 Build site.
Rmd 6b3f021 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html e3b9788 chenh19 2022-06-14 Build site.
html aed5eed chenh19 2022-06-14 Build site.
Rmd c552123 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 45bb6ed chenh19 2022-06-14 Build site.
Rmd 81fdf42 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html c23765a chenh19 2022-06-14 Build site.
html b5fb71c chenh19 2022-06-14 Build site.
Rmd 10f6641 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 152325b chenh19 2022-06-14 Build site.
Rmd 1739dd9 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 543ef4c chenh19 2022-06-14 Build site.
html 9b6cb27 chenh19 2022-06-14 Build site.
Rmd d8908c0 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 8674c8a chenh19 2022-06-14 Build site.
Rmd 2972ce6 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html ada4068 chenh19 2022-06-14 Build site.
Rmd 7c5402e chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 0d08121 chenh19 2022-06-14 Build site.
Rmd 6226526 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html dd2046d chenh19 2022-06-14 Build site.
Rmd 71f5d04 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 2c4cab1 chenh19 2022-06-14 Build site.
Rmd 6e73c04 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html d46eaab chenh19 2022-06-14 Build site.
Rmd be56d9d chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 02a0c26 chenh19 2022-06-14 Build site.
Rmd e2c15c0 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html abad46e chenh19 2022-06-14 Build site.
Rmd 68948e3 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 741027b chenh19 2022-06-14 Build site.
Rmd 065d7e9 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html bb15812 chenh19 2022-06-14 Build site.
Rmd b6e0993 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 93a27ae chenh19 2022-06-14 Build site.
Rmd b4a7331 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 27121a9 chenh19 2022-06-14 Build site.
Rmd fce1ffd chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 44517c1 chenh19 2022-06-13 Build site.
Rmd cc6a40a chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html da08f11 chenh19 2022-06-13 Build site.
Rmd 05ccc35 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 572f6ba chenh19 2022-06-13 Build site.
html b8870d3 chenh19 2022-06-13 Build site.
html 719925e chenh19 2022-06-13 Build site.
html e7541fa chenh19 2022-06-13 Build site.
html 9d9615d chenh19 2022-06-13 Build site.
Rmd 04feaa7 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html bbd8978 chenh19 2022-06-13 Build site.
Rmd 0ec2bfa chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html e5a5b52 chenh19 2022-06-13 Build site.
Rmd c43ae1f chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 4d8bd72 chenh19 2022-06-13 Build site.
html 3373521 chenh19 2022-06-13 Build site.
html af21ea8 chenh19 2022-06-13 Build site.
Rmd 6e56d75 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html f653f7b chenh19 2022-06-13 Build site.
Rmd 2723e7f chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html d69c892 chenh19 2022-06-13 Build site.
html 34d877d chenh19 2022-06-13 Build site.
html e72400b chenh19 2022-06-13 Build site.
html c411223 chenh19 2022-06-13 Build site.
html 1daccd2 chenh19 2022-06-13 Build site.
Rmd 63f46d2 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 26adb45 chenh19 2022-06-13 Build site.
html a6022a8 chenh19 2022-06-13 Build site.
Rmd 1215832 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 9abc4b8 chenh19 2022-06-13 Build site.
Rmd 7efcfe0 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html f18d385 chenh19 2022-06-13 Build site.
Rmd a7c1ce0 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html e991f56 chenh19 2022-06-13 Build site.
html 3c9b1d9 chenh19 2022-06-13 Build site.
Rmd ae1553a chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 34e0d02 chenh19 2022-06-13 Build site.
Rmd e69aa83 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 9be31af chenh19 2022-06-13 Build site.
Rmd ead84c2 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 0f41de8 chenh19 2022-06-13 Build site.
html 31ad035 chenh19 2022-06-13 Build site.
html bdf3b44 chenh19 2022-06-13 Build site.
html 8d0890c chenh19 2022-06-13 Build site.
Rmd 26f455b chenh19 2022-06-13 update
html 26f455b chenh19 2022-06-13 update

1. Understand RNA-seq

a. Read about RNA-seq analysis

Yalamanchili et al. 2017: RNA-seq analysis pipeline

Some key points:

Protocol-1 (differential expression of genes):

  • demuxed raw reads (FastQC)
  • trimming reads (awk)
  • aligning reads (TopHat2)
  • counting reads (HTSeq; may filter out genes with low counts before next step)
  • detect DE using counted reads (DEseq2)
  • more QC (PCA/correlation heatmap)

Protocol-2 (differential usage of isoforms):

  • Protocol-1
  • counting isoforms (Kallisto, also check cell ranger)
  • detect DU using counted isoforms (Sleuth)
  • more QC (aslo PCA/correlation heatmap)

Protocol-3 (crypic splicing):

  • Protocol-1
  • detect differential junstions (CrypSplice)

b. Read more about RNA-seq analysis

Luecken et al. 2019: RNA-seq analysis pipeline
Supplementary code

Some key points:

c. Read the Morris paper

Morris et al. 2021: STING-Seq

Some key points:

Some key ideas:

  • STING-seq: Systematic Targeting and Inbition of Noncoding GWAS loci with scRNA-seq
  • prioritizes candidate cis-regulatory elements (cCREs, 1kb<distance to TSS<1Mb) using fine-mapped GWAS
  • selected 88 variants (in 56 loci) with enhancer activity
  • dual CRISPR inhibition: dCas9 as the GPS, MeCP2 and KRAB as the repressors
  • confirming dual CRISPRi efficacy: gRNAs target TSS of MRPS23, CTSB, FSCN1
  • CRIPSRi on the 88 variants: two gRNAs for each variant, both within 200bp of the variant
  • ECCITE-seq: captures gRNAs and epitopes

Some data processing steps and results:

  • QC: remove cells with low total reads or excessive mitochondrial reads, gRNA assignment UMI>5 (9,343 cells after QC)
  • Kallisto: counting read more on the official website
  • Seurat: QC and reference mapping? read more on the official website
  • SCEPTRE: gRNA_to_gene-expression pairwise test
  • non-targeting gRNA-gene pairs: not significant (negative ctrl)
  • TSS-targeting gRNA-gene pairs: expression significantly decreased (positive ctrl)
  • 37 of the 88 variants were significant
  • Trans-regulatory elements: I'll come back later

Note:

2. Prelim QC for raw STING-seq data

a. Download all data

Code: download.sh

b. Perform FastQC on all fastq files

Code: fastqc.sh

SRR14141135:

SRR14141136:

SRR14141137:

SRR14141138:

SRR14141139:

SRR14141140:

SRR14141141:

SRR14141142:

SRR14141143:

SRR14141144:

SRR14141145:

SRR14141146:

A brief summary:

  • length: 26bp or 57bp (trimmed?)
  • depth: 30-35x
  • overall quality: good (within ~40 bp)

d. Kallisto | bustools pipeline

Code: pip3-kb.sh
Code: anaconda_kallisto.sh

3. Analyze QC’ed STING-seq data

a. Install packages

Code: seurat.sh

b. Data overview

Code: overview.R
Note: about sparse matrix

The [Expression] matrix has: 
- 35,606 rows/genes/targets 
- 686,612 columns/barcodes/cells 
- 24,447,506,872 values in total 
- 82,507,471 values that are non-zero 
- 50,421,358 values that are 1 
- 32,086,113 values that are bigger than 1 
- 3,370,699 values that are bigger than 10 
- 259,734 values that are bigger than 100 
- 2,515 values that are bigger than 1,000 
- 0 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
The [gRNA] matrix has: 
- 210 rows/genes/targets 
- 137,347 columns/barcodes/cells 
- 28,842,870 values in total 
- 2,506,474 values that are non-zero 
- 1,510,919 values that are 1 
- 995,555 values that are bigger than 1 
- 121,554 values that are bigger than 10 
- 41,071 values that are bigger than 100 
- 2,232 values that are bigger than 1,000 
- 20 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
The [Hashtag] matrix has: 
- 4 rows/genes/targets 
- 410,228 columns/barcodes/cells 
- 1,640,912 values in total 
- 739,820 values that are non-zero 
- 409,830 values that are 1 
- 329,990 values that are bigger than 1 
- 218,280 values that are bigger than 10 
- 8,155 values that are bigger than 100 
- 282 values that are bigger than 1,000 
- 46 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 

d. Expression (cDNA) dataset

i) Expression barcodes

Code: Expression_barcode_dist_plot.R

Comment: The cell with highest overall detected gene expression

Comment: The cell with lowest overall detected gene expression

Comment: As Xuanyao said, this kind bar plot is too dense and can’t really see the overall distribution, the CDF plot below is more clear

Comment: From this CDF figure I kind of know why there were only ~9000 cells used after Qc’ing with UMI>=850 filter. Less than 10% cells have UMI>=850. But still, why exactly 850 is still a question for me to explore

Comment: Many zeros (consistent with the observation that the matrix was very sparse); UMI>850 is invisible in this plot. As Xuanyao said, I should exclude the outliers or add y-axis break

ii) Expression targets

Code: Expression_target_dist_plot.R

Comment: The highest (mean) expressed gene is WDR45-like (WDR45L) pseudogene (high UMI counts in all cells)

Comment: The lowest (mean) expressed gene is RP4-669L17.1 pseudogene (zero UMI counts in all cells)

Comment: Non-zero UMI counts for all genes (~35k, including mito genes; 686,612 cells intotal)

Comment: CDF plot: ~80% genes have < ~5000 UMI counts in all cells (not all genes captured in each cell, but I guess still a lot)

Comment: PDF plot: same conclusion as above

e. gRNA dataset

i) gRNA barcodes

Code: gRNA_barcode_dist_plot.R

Comment: The cell with highest overall (mean) gRNAs, and it has 15 highly expressed gRNAs

Comment: The cell with lowest overall (mean) gRNAs (transfection/transduction failed in this cell)

Comment: Non-zero UMI counts in all cells (I’d say the transfection/transduction relatively even across all cells)

Comment: CDF plot: ~80% cells have < ~40 UMI counts for each gRNA (note: the authors mentioned MOI ~ 10)

Comment: PDF plot: same conclusion as above

ii) gRNA targets

Code: gRNA_target_dist_plot.R

Comment: The highest (mean) gRNA in all cells (gRNA targeting PPIA-2, which is a control)

Comment: The lowest (mean) gRNA in all cells (likely it’s a low score gRNA site but the authors didn’t have better choices)

Comment: Non-zero UMI counst for all gRNAs (137,347 cells in total; I’d say the transfection/transduction efficiency varies among gRNAs. The authors designed all the gRNAs within 200bp of the targeted variants,there must be limitations in terms of gRNA options)

Comment: CDF plot: ~80% gRNAs have < ~20,000 UMI counts in all cells (137,347 cells in total, ~15% transfection/transduction success rate, acceptable)

Comment: PDF plot: same conclusion as above

f. Hashtag dataset

i) Hashtag barcodes

Code: Hashtag_barcode_dist_plot.R

Comment: The cell with highest (mean) Hashtags (note: the authors used only 4 Hashtags, I might check which antibodies they are when performing association)

Comment: The cell with lowest (mean) Hashtags (not tagged by any of the antibodies)

Comment: This figure is not an error. All cells have 1/2/3/4 UMI counts, and because many of them have 4, it looks like a block when it’s such dense

Comment: CDF plot: ~80% cells have < ~2 UMI counts for each Hashtag (It make sense to me because the authors are likely trying to label different cell types)

Comment: PDF plot: same conclusion as above

i) Hashtag targets

Code: Hashtag_target_dist_plot.R

Comment: The highest (mean) Hashtag (HTO23) in all cells (I would guess this is the relatively more common cell type, also, there were some non-specific antibody binding)

Comment: The lowest (mean) Hashtag (HTO25) in all cells (I would guess this is the relatively less common cell type, also, it dosen’t seem to overlap with HTO25, which is a good thing)

Comment: Non-zero UMI counts for the 4 Hashtags (I’d say the 4 cell types are relatively even)

Comment: CDF plot: ~80% Hashtags have < ~200,000 UMI counts in all cells (410,228 cells in total, I thinking the antibody binding efficiency is pretty good)

Comment: PDF plot: same conclusion as above

g. QC filtering

  • After preliminary filtering, the authors got 14,775 cells with 3,875 median genes per cell.

Previous question:

  1. why percent-mito < 20%?
  2. why UMI > 850?
  3. why no UMI upper limit?

My understanding:

  1. Previously, 5% was usually the default threshold. But a recent paper did systematic evaluation and proposed a default cutoff of 10% for human cells. Also, according to Luecken et al. 2019, we can use a relatively loose QC cutoff at the beginning. Since the percent-mito was just in the first QC filtering step and the author further filtered by HTO and GDO, I think 20% makes sense.
  2. Again, according to Luecken et al. 2019, the dying/dead cells would be a small peak with low UMI counts. By the zoomed-in plot below, we can see 850 is a reasonable cutoff to remove the entire peak.
  3. Doublets was not filtered out by UMI, but by HTO demuxing, therefore the authors didn’t set an upper limit.

Notes:

  • In the original cDNA feature txt file, there are only Ensembl gene IDs. To calculate percent-mito, I tried to convert the IDs to symbols using both this web tool and biomaRT package in R.
  • If I directly use the gene symbols from these two methods and the same filters by Nikita, I get exactly the same results as Nikita (14,813 cells retained).
  • However, after taking a closer look at the converted gene lists, there are still many “Mitochondrially Encoded” genes starting with “MT” rather than “MT-”, so I wrote a script to convert all these genes.
  • Then I performed filtering again and got 14,675 cells with 3,917 median genes per cell.
  • I don’t think we’ll know exactly how the authors filtered the cells unless they release their code.
  • After QC filtering, there were 9,391 cells retained (9,343 cells by the authors in comparison,
  • I did cross comparison. There are 508 cells from authors’ list not in my list.

Code: QC_filter.R
Code: UMI_plot.R

Before UMI count filtering:

1
2

After UMI count filtering:

3
4

Before percent-mito filtering (generated by Seurat):

5
6
7

After percent-mito filtering (generated by Seurat):

8
9
10

Barcodes comparison:

Code: QC_compare.R
QC_by_author.txt
QC_by_hang.txt

Comparison result:

[1] "There are 508 cells filtered out in comparison to authors' list."

g2. PDF y axis issue

Previous question:

  1. Is the small values in the Y-axis of the previous PDF plot wrong?
My understanding: probably NOT wrong.

Point 1: similar results by a package for epdf

Plotted with base R:

hist(barcode_dist, freq=F, breaks=150, main="PDF: UMI counts in each barcode (cell)", xlab="UMI counts in each barcode (cell)", ylab="PDF")

Plotted with EnvStats package:

EnvStats::epdfPlot(barcode_dist, epdf.col = "red")

Point 2: area of the plot is ~1 by eye

Didn’t bother to do calculus, just very roughly calculated 1.5e-04 x 7000 = 1.05

h. zero-inflated plot & regression

Ref: Kim et al. 2020

Code: zero-flated.R

1
2

regression summary

summary(glm(zero_prop ~ target_mean, family = poisson, data = whichmodel_10)) # poisson
Call:
glm(formula = zero_prop ~ target_mean, family = poisson, data = whichmodel_10)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.13432  -0.00250   0.01255   0.01294   1.08289  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.012969   0.005915  -2.193   0.0283 *  
target_mean -0.672805   0.019706 -34.141   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2256.629  on 35374  degrees of freedom
Residual deviance:   88.113  on 35373  degrees of freedom
AIC: Inf

Number of Fisher Scoring iterations: 5
summary(glm.nb(zero_prop ~ target_mean, data = whichmodel_10)) # negative binomial
Call:
glm.nb(formula = zero_prop ~ target_mean, data = whichmodel_10, 
    init.theta = 18539.38634, link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.86791  -0.31620   0.01294   0.06140   1.32706  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.012969   0.005915  -2.193   0.0283 *  
target_mean -0.672803   0.019707 -34.141   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(18539.39) family taken to be 1)

    Null deviance: 6991.6  on 35374  degrees of freedom
Residual deviance: 4823.2  on 35373  degrees of freedom
AIC: 66516

Number of Fisher Scoring iterations: 1


              Theta:  18539 
          Std. Err.:  7723 
Warning while fitting theta: iteration limit reached 

 2 x log-likelihood:  -66509.53 

i. re-QC

Note: this is to remove the dead cells using a more stringent UMI cutoff (850 originally, 1400 here)

Code: QC_filter_2.R
Code: UMI_plot_2.R

Before UMI count filtering:

21
22

After UMI count filtering:

3
4

Before percent-mito filtering (generated by Seurat):

5
6
7

After percent-mito filtering (generated by Seurat):

8
9
10

Barcodes comparison:

Code: QC_compare_2.R
QC_by_author.txt
QC_by_hang_2.txt

Comparison result:

[1] "There are 755 cells filtered out in comparison to authors' list."

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9       compiler_4.2.1   pillar_1.8.0     bslib_0.3.1     
 [5] later_1.3.0      git2r_0.30.1     jquerylib_0.1.4  tools_4.2.1     
 [9] getPass_0.2-2    digest_0.6.29    jsonlite_1.8.0   evaluate_0.15   
[13] tibble_3.1.7     lifecycle_1.0.1  pkgconfig_2.0.3  rlang_1.0.2     
[17] cli_3.3.0        rstudioapi_0.13  yaml_2.3.5       xfun_0.31       
[21] fastmap_1.1.0    httr_1.4.3       stringr_1.4.0    knitr_1.39      
[25] sass_0.4.1       fs_1.5.2         vctrs_0.4.1      rprojroot_2.0.3 
[29] glue_1.6.2       R6_2.5.1         processx_3.6.1   fansi_1.0.3     
[33] rmarkdown_2.14   callr_3.7.0      magrittr_2.0.3   whisker_0.4     
[37] ps_1.7.1         promises_1.2.0.1 htmltools_0.5.2  ellipsis_0.3.2  
[41] httpuv_1.6.5     utf8_1.2.2       stringi_1.7.6