Project-1: STING-seq

Last updated: 2022-08-25

Checks: 7 0

Knit directory: rotation2/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20220607)

The command set.seed(20220607) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 4d8f9d5

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4d8f9d5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Unstaged changes:
    Modified:   .RData
    Modified:   .Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/project_1.Rmd) and HTML (docs/project_1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	268c781	chenh19	2022-08-25	Build site.
html	d5675d8	chenh19	2022-08-25	Build site.
html	6f9b2d0	chenh19	2022-08-25	Build site.
html	eb7f6e2	chenh19	2022-08-25	Build site.
html	06d8f5b	chenh19	2022-08-25	Build site.
html	481c95e	chenh19	2022-08-25	Build site.
html	4c40591	chenh19	2022-08-25	update
html	210ce87	chenh19	2022-08-25	Build site.
html	7126807	chenh19	2022-08-25	Build site.
html	e576eee	chenh19	2022-08-25	Build site.
html	3b152ba	chenh19	2022-08-25	Build site.
html	abd8577	chenh19	2022-08-25	Build site.
html	fb0804c	chenh19	2022-08-25	Build site.
html	46c0eee	chenh19	2022-08-25	Build site.
html	4bcd287	Hang Chen	2022-08-11	Build site.
html	316e143	Hang Chen	2022-08-11	Build site.
html	2ecff7a	Hang Chen	2022-08-11	Build site.
html	1a7329a	Hang Chen	2022-08-11	Build site.
html	d98b8ed	Hang Chen	2022-08-10	Build site.
html	afbc7f7	chenh19	2022-08-09	Build site.
html	f3ddb60	chenh19	2022-08-09	Build site.
html	aeeaff4	chenh19	2022-08-09	Build site.
html	faa4371	chenh19	2022-08-09	Build site.
html	40d7285	chenh19	2022-08-09	Build site.
html	51ff922	chenh19	2022-08-09	Build site.
html	60bb670	chenh19	2022-08-09	Build site.
html	5035de6	chenh19	2022-08-09	Build site.
html	c710b98	chenh19	2022-08-09	Build site.
html	9c8a125	chenh19	2022-08-09	Build site.
html	759a524	chenh19	2022-08-09	Build site.
html	0137730	chenh19	2022-08-09	Build site.
html	306629d	chenh19	2022-08-09	Build site.
html	a2f14fc	chenh19	2022-08-09	Build site.
html	afe9e5e	chenh19	2022-08-09	Build site.
html	7f87abe	chenh19	2022-08-09	Build site.
html	f7a30e6	chenh19	2022-08-08	Build site.
html	ae6dc22	chenh19	2022-08-08	Build site.
html	31345e3	chenh19	2022-08-08	Build site.
Rmd	14daeb5	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	ab573d7	chenh19	2022-08-08	Build site.
html	82d0b8a	chenh19	2022-08-08	Build site.
html	63022aa	chenh19	2022-08-08	Build site.
html	b4ec414	chenh19	2022-08-08	Build site.
html	fb4fa31	chenh19	2022-08-08	Build site.
html	03df33f	chenh19	2022-08-08	Build site.
html	ec5763a	chenh19	2022-08-08	Build site.
Rmd	d6b5331	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	870ec95	chenh19	2022-08-08	Build site.
Rmd	26c6105	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	4df0f61	chenh19	2022-08-08	Build site.
html	8bad269	chenh19	2022-08-08	Build site.
Rmd	e769869	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	6e10041	chenh19	2022-08-08	Build site.
Rmd	e57b4ef	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	b045b81	chenh19	2022-08-08	Build site.
Rmd	26a8d3b	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	e22ab6b	chenh19	2022-08-08	Build site.
Rmd	b57b687	chenh19	2022-08-08	wflow_publish("./analysis/*.Rmd")
html	87b3f9d	chenh19	2022-08-08	Build site.
Rmd	9a06a06	chenh19	2022-08-08	update
html	78b6bd6	chenh19	2022-08-08	Build site.
html	60fabb8	Hang Chen	2022-08-08	Build site.
html	cee42b8	Hang Chen	2022-08-05	Build site.
html	6927e45	Hang Chen	2022-08-04	Build site.
html	551a34f	Hang Chen	2022-08-04	Build site.
html	80908a7	Hang Chen	2022-08-04	Build site.
html	2623d6b	Hang Chen	2022-08-04	Build site.
html	e9d9966	Hang Chen	2022-08-04	Build site.
html	57d96a8	Hang Chen	2022-08-04	update
Rmd	05b3310	Hang Chen	2022-08-04	update
html	05b3310	Hang Chen	2022-08-04	update
html	37d15c9	chenh19	2022-07-19	Build site.
html	8f6816e	chenh19	2022-07-19	Build site.
html	4a94b94	chenh19	2022-07-19	Build site.
Rmd	a18fc1f	chenh19	2022-07-19	wflow_publish("./analysis/*.Rmd")
html	870115f	chenh19	2022-07-19	Build site.
Rmd	6307f5c	chenh19	2022-07-19	wflow_publish("./analysis/*.Rmd")
html	9241fe6	chenh19	2022-07-08	Build site.
Rmd	abd8a0c	chenh19	2022-07-08	wflow_publish("./analysis/*.Rmd")
html	0a7633b	chenh19	2022-07-07	Build site.
Rmd	06d53d1	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	63535ad	chenh19	2022-07-07	Build site.
Rmd	fe2a82b	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
Rmd	9898acd	chenh19	2022-07-07	update
html	feb5923	chenh19	2022-07-07	Build site.
Rmd	9eae283	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	c849244	chenh19	2022-07-07	Build site.
Rmd	29eb161	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	7ab5d71	chenh19	2022-07-07	Build site.
Rmd	2baffc8	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	625e7ca	chenh19	2022-07-07	Build site.
Rmd	bb3e1aa	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	0e1d92d	chenh19	2022-07-07	Build site.
Rmd	138b4fe	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	24d6bcf	chenh19	2022-07-07	Build site.
Rmd	941e24b	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	3a14766	chenh19	2022-07-07	Build site.
Rmd	d425ac9	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
html	ca77db2	chenh19	2022-07-07	Build site.
Rmd	608cdf8	chenh19	2022-07-07	wflow_publish("./analysis/*.Rmd")
Rmd	c639389	chenh19	2022-07-06	update
html	e0cb7a2	chenh19	2022-06-28	Build site.
Rmd	ac33ff0	chenh19	2022-06-28	wflow_publish("./analysis/*.Rmd")
html	bf05c98	chenh19	2022-06-28	Build site.
Rmd	1c8bb9a	chenh19	2022-06-28	wflow_publish("./analysis/*.Rmd")
html	7cfb685	chenh19	2022-06-28	Build site.
Rmd	82fce8f	chenh19	2022-06-28	wflow_publish("./analysis/*.Rmd")
html	43dad9c	chenh19	2022-06-28	Build site.
Rmd	20718e9	chenh19	2022-06-28	wflow_publish("./analysis/*.Rmd")
html	7aa3172	chenh19	2022-06-28	Build site.
Rmd	e0425d8	chenh19	2022-06-28	wflow_publish("./analysis/*.Rmd")
html	2254608	chenh19	2022-06-28	Build site.
Rmd	754150e	chenh19	2022-06-28	wflow_publish("./analysis/*.Rmd")
html	968a4b6	chenh19	2022-06-23	Build site.
html	229f924	chenh19	2022-06-23	Build site.
Rmd	c7847bd	chenh19	2022-06-23	wflow_publish("./analysis/*.Rmd")
html	9750134	chenh19	2022-06-23	Build site.
Rmd	0064ff3	chenh19	2022-06-23	wflow_publish("./analysis/*.Rmd")
html	4b38bf6	chenh19	2022-06-23	Build site.
Rmd	4eb11c9	chenh19	2022-06-23	wflow_publish("./analysis/*.Rmd")
html	8a7980d	chenh19	2022-06-23	Build site.
html	3e1478e	chenh19	2022-06-22	Build site.
html	5258612	chenh19	2022-06-22	Build site.
Rmd	ecfd58d	chenh19	2022-06-22	wflow_publish("./analysis/*.Rmd")
html	2382868	chenh19	2022-06-22	Build site.
Rmd	694470f	chenh19	2022-06-22	wflow_publish("./analysis/*.Rmd")
html	1144127	chenh19	2022-06-22	Build site.
html	da2c0fe	chenh19	2022-06-21	Build site.
html	8aa5960	chenh19	2022-06-21	Build site.
Rmd	3dea9b1	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	6783fa3	chenh19	2022-06-21	Build site.
Rmd	6699b8f	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	d9be701	chenh19	2022-06-21	Build site.
Rmd	f93179d	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	82e6e50	chenh19	2022-06-21	Build site.
Rmd	f753baa	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	d376ad0	chenh19	2022-06-21	Build site.
Rmd	4a9db6a	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	325a212	chenh19	2022-06-21	Build site.
Rmd	d54cc77	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	e14c55c	chenh19	2022-06-21	Build site.
Rmd	2ab041a	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	f971bbd	chenh19	2022-06-21	Build site.
html	dca882f	chenh19	2022-06-21	Build site.
html	3a80eaf	chenh19	2022-06-21	Build site.
Rmd	b6fb1d2	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	33829a5	chenh19	2022-06-21	Build site.
Rmd	a949aec	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	5b446cf	chenh19	2022-06-21	Build site.
Rmd	c4ad45d	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	28a06ee	chenh19	2022-06-21	Build site.
Rmd	53f2292	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	d5b4ff0	chenh19	2022-06-21	Build site.
Rmd	e982ac7	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	16400a7	chenh19	2022-06-21	Build site.
Rmd	d6aa5b2	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	a324166	chenh19	2022-06-21	Build site.
Rmd	ab8cbc3	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	27a4d51	chenh19	2022-06-21	Build site.
Rmd	624e791	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	f765024	chenh19	2022-06-21	Build site.
html	bc55dbb	chenh19	2022-06-21	Build site.
Rmd	8b66c3b	chenh19	2022-06-21	update
html	b3b7ed6	chenh19	2022-06-21	Build site.
Rmd	405116a	chenh19	2022-06-21	update
html	405d57e	chenh19	2022-06-21	Build site.
Rmd	bbbaab0	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	c10a1a8	chenh19	2022-06-21	Build site.
Rmd	a8f7999	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	2443e38	chenh19	2022-06-21	Build site.
Rmd	8e0c8ad	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	5d192d2	chenh19	2022-06-21	Build site.
Rmd	356550c	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	21e9501	chenh19	2022-06-21	Build site.
Rmd	4a8f7ee	chenh19	2022-06-21	wflow_publish("./analysis/*.Rmd")
html	b002776	chenh19	2022-06-20	Build site.
Rmd	6d78198	chenh19	2022-06-20	wflow_publish("./analysis/*.Rmd")
Rmd	0e1817a	chenh19	2022-06-19	update
html	6211c60	chenh19	2022-06-15	Build site.
Rmd	46e8cc3	chenh19	2022-06-15	wflow_publish("./analysis/*.Rmd")
html	0da18e2	chenh19	2022-06-15	Build site.
Rmd	d999122	chenh19	2022-06-15	wflow_publish("./analysis/*.Rmd")
html	0aff555	chenh19	2022-06-15	Build site.
Rmd	eda2d56	chenh19	2022-06-15	wflow_publish("./analysis/*.Rmd")
html	a4e1e73	chenh19	2022-06-15	Build site.
Rmd	7229c17	chenh19	2022-06-15	wflow_publish("./analysis/*.Rmd")
html	f0e98f9	chenh19	2022-06-14	Build site.
Rmd	e0aa022	chenh19	2022-06-14	wflow_publish("./analysis/*.Rmd")
html	eafa16b	chenh19	2022-06-14	Build site.
Rmd	69b29f1	chenh19	2022-06-14	wflow_publish("./analysis/*.Rmd")
html	dfd60ce	chenh19	2022-06-14	Build site.
Rmd	49f1922	chenh19	2022-06-14	wflow_publish("./analysis/*.Rmd")
html	fd7271e	chenh19	2022-06-14	Build site.
html	1b4d12e	chenh19	2022-06-14	Build site.
html	a6c402d	chenh19	2022-06-14	Build site.
html	cedad99	chenh19	2022-06-14	Build site.
Rmd	6b3f021	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	e3b9788	chenh19	2022-06-14	Build site.
html	aed5eed	chenh19	2022-06-14	Build site.
Rmd	c552123	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	45bb6ed	chenh19	2022-06-14	Build site.
Rmd	81fdf42	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	c23765a	chenh19	2022-06-14	Build site.
html	b5fb71c	chenh19	2022-06-14	Build site.
Rmd	10f6641	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	152325b	chenh19	2022-06-14	Build site.
Rmd	1739dd9	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	543ef4c	chenh19	2022-06-14	Build site.
html	9b6cb27	chenh19	2022-06-14	Build site.
Rmd	d8908c0	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	8674c8a	chenh19	2022-06-14	Build site.
Rmd	2972ce6	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	ada4068	chenh19	2022-06-14	Build site.
Rmd	7c5402e	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	0d08121	chenh19	2022-06-14	Build site.
Rmd	6226526	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	dd2046d	chenh19	2022-06-14	Build site.
Rmd	71f5d04	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	2c4cab1	chenh19	2022-06-14	Build site.
Rmd	6e73c04	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	d46eaab	chenh19	2022-06-14	Build site.
Rmd	be56d9d	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	02a0c26	chenh19	2022-06-14	Build site.
Rmd	e2c15c0	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	abad46e	chenh19	2022-06-14	Build site.
Rmd	68948e3	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	741027b	chenh19	2022-06-14	Build site.
Rmd	065d7e9	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	bb15812	chenh19	2022-06-14	Build site.
Rmd	b6e0993	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	93a27ae	chenh19	2022-06-14	Build site.
Rmd	b4a7331	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	27121a9	chenh19	2022-06-14	Build site.
Rmd	fce1ffd	chenh19	2022-06-14	wflow_publish("analysis/*.Rmd")
html	44517c1	chenh19	2022-06-13	Build site.
Rmd	cc6a40a	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	da08f11	chenh19	2022-06-13	Build site.
Rmd	05ccc35	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	572f6ba	chenh19	2022-06-13	Build site.
html	b8870d3	chenh19	2022-06-13	Build site.
html	719925e	chenh19	2022-06-13	Build site.
html	e7541fa	chenh19	2022-06-13	Build site.
html	9d9615d	chenh19	2022-06-13	Build site.
Rmd	04feaa7	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	bbd8978	chenh19	2022-06-13	Build site.
Rmd	0ec2bfa	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	e5a5b52	chenh19	2022-06-13	Build site.
Rmd	c43ae1f	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	4d8bd72	chenh19	2022-06-13	Build site.
html	3373521	chenh19	2022-06-13	Build site.
html	af21ea8	chenh19	2022-06-13	Build site.
Rmd	6e56d75	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	f653f7b	chenh19	2022-06-13	Build site.
Rmd	2723e7f	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	d69c892	chenh19	2022-06-13	Build site.
html	34d877d	chenh19	2022-06-13	Build site.
html	e72400b	chenh19	2022-06-13	Build site.
html	c411223	chenh19	2022-06-13	Build site.
html	1daccd2	chenh19	2022-06-13	Build site.
Rmd	63f46d2	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	26adb45	chenh19	2022-06-13	Build site.
html	a6022a8	chenh19	2022-06-13	Build site.
Rmd	1215832	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	9abc4b8	chenh19	2022-06-13	Build site.
Rmd	7efcfe0	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	f18d385	chenh19	2022-06-13	Build site.
Rmd	a7c1ce0	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	e991f56	chenh19	2022-06-13	Build site.
html	3c9b1d9	chenh19	2022-06-13	Build site.
Rmd	ae1553a	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	34e0d02	chenh19	2022-06-13	Build site.
Rmd	e69aa83	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	9be31af	chenh19	2022-06-13	Build site.
Rmd	ead84c2	chenh19	2022-06-13	wflow_publish("analysis/*.Rmd")
html	0f41de8	chenh19	2022-06-13	Build site.
html	31ad035	chenh19	2022-06-13	Build site.
html	bdf3b44	chenh19	2022-06-13	Build site.
html	8d0890c	chenh19	2022-06-13	Build site.
Rmd	26f455b	chenh19	2022-06-13	update
html	26f455b	chenh19	2022-06-13	update

1. Understand RNA-seq

a. Read about RNA-seq analysis

Yalamanchili et al. 2017: RNA-seq analysis pipeline

Some key points:

Protocol-1 (differential expression of genes):

demuxed raw reads (FastQC)
trimming reads (awk)
aligning reads (TopHat2)
counting reads (HTSeq; may filter out genes with low counts before next step)
detect DE using counted reads (DEseq2)
more QC (PCA/correlation heatmap)

Protocol-2 (differential usage of isoforms):

Protocol-1
counting isoforms (Kallisto, also check cell ranger)
detect DU using counted isoforms (Sleuth)
more QC (aslo PCA/correlation heatmap)

Protocol-3 (crypic splicing):

Protocol-1
detect differential junstions (CrypSplice)

Some key points:

Some key ideas:

STING-seq: Systematic Targeting and Inbition of Noncoding GWAS loci with scRNA-seq
prioritizes candidate cis-regulatory elements (cCREs, 1kb<distance to TSS<1Mb) using fine-mapped GWAS
selected 88 variants (in 56 loci) with enhancer activity
dual CRISPR inhibition: dCas9 as the GPS, MeCP2 and KRAB as the repressors
confirming dual CRISPRi efficacy: gRNAs target TSS of MRPS23, CTSB, FSCN1
CRIPSRi on the 88 variants: two gRNAs for each variant, both within 200bp of the variant
ECCITE-seq: captures gRNAs and epitopes

Some data processing steps and results:

QC: remove cells with low total reads or excessive mitochondrial reads, gRNA assignment UMI>5 (9,343 cells after QC)
Kallisto: counting read more on the official website
Seurat: QC and reference mapping? read more on the official website
SCEPTRE: gRNA_to_gene-expression pairwise test
non-targeting gRNA-gene pairs: not significant (negative ctrl)
TSS-targeting gRNA-gene pairs: expression significantly decreased (positive ctrl)
37 of the 88 variants were significant
Trans-regulatory elements: I'll come back later

Note:

Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq)
Expanded CRISPR-compatible CITE-seq (ECCITE-Seq)
cDNA, HTO (Hashtag oligos), GDO (gRNAs)
ECCITE-seq:

2. Prelim QC for raw STING-seq data

a. Download all data

Code: download.sh

b. Perform FastQC on all fastq files

Code: fastqc.sh

SRR14141135:

SRR14141136:

SRR14141137:

SRR14141138:

SRR14141139:

SRR14141140:

SRR14141141:

SRR14141142:

SRR14141143:

SRR14141144:

SRR14141145:

SRR14141146:

A brief summary:

length: 26bp or 57bp (trimmed?)
depth: 30-35x
overall quality: good (within ~40 bp)

d. Kallisto | bustools pipeline

Code: pip3-kb.sh
Code: anaconda_kallisto.sh

3. Analyze QC’ed STING-seq data

a. Install packages

Code: seurat.sh

b. Data overview

Code: overview.R
Note: about sparse matrix

The [Expression] matrix has: 
- 35,606 rows/genes/targets 
- 686,612 columns/barcodes/cells 
- 24,447,506,872 values in total 
- 82,507,471 values that are non-zero 
- 50,421,358 values that are 1 
- 32,086,113 values that are bigger than 1 
- 3,370,699 values that are bigger than 10 
- 259,734 values that are bigger than 100 
- 2,515 values that are bigger than 1,000 
- 0 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
The [gRNA] matrix has: 
- 210 rows/genes/targets 
- 137,347 columns/barcodes/cells 
- 28,842,870 values in total 
- 2,506,474 values that are non-zero 
- 1,510,919 values that are 1 
- 995,555 values that are bigger than 1 
- 121,554 values that are bigger than 10 
- 41,071 values that are bigger than 100 
- 2,232 values that are bigger than 1,000 
- 20 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
The [Hashtag] matrix has: 
- 4 rows/genes/targets 
- 410,228 columns/barcodes/cells 
- 1,640,912 values in total 
- 739,820 values that are non-zero 
- 409,830 values that are 1 
- 329,990 values that are bigger than 1 
- 218,280 values that are bigger than 10 
- 8,155 values that are bigger than 100 
- 282 values that are bigger than 1,000 
- 46 values that are bigger than 10,000 
- 0 values that are bigger than 100,000

c. Calculate means and non-zeros

Code: Expression_barcode_stats.R
Code: Expression_target_stats.R

Code: gRNA_barcode_stats.R
Code: gRNA_target_stats.R

Code: Hashtag_barcode_stats.R
Code: Hashtag_target_stats.R

Output:

Expression_matrix_barcodes_summary.csv
Expression_matrix_targets_summary.csv

gRNA_matrix_barcodes_summary.csv
gRNA_matrix_targets_summary.csv

Hashtag_matrix_barcodes_summary.csv
Hashtag_matrix_targets_summary.csv

d. Expression (cDNA) dataset

i) Expression barcodes

Code: Expression_barcode_dist_plot.R

Comment: The cell with highest overall detected gene expression

Comment: The cell with lowest overall detected gene expression

Comment: As Xuanyao said, this kind bar plot is too dense and can’t really see the overall distribution, the CDF plot below is more clear

Comment: From this CDF figure I kind of know why there were only ~9000 cells used after Qc’ing with UMI>=850 filter. Less than 10% cells have UMI>=850. But still, why exactly 850 is still a question for me to explore

Comment: Many zeros (consistent with the observation that the matrix was very sparse); UMI>850 is invisible in this plot. As Xuanyao said, I should exclude the outliers or add y-axis break

ii) Expression targets

Code: Expression_target_dist_plot.R

Comment: The highest (mean) expressed gene is WDR45-like (WDR45L) pseudogene (high UMI counts in all cells)

Comment: The lowest (mean) expressed gene is RP4-669L17.1 pseudogene (zero UMI counts in all cells)

Comment: Non-zero UMI counts for all genes (~35k, including mito genes; 686,612 cells intotal)

Comment: CDF plot: ~80% genes have < ~5000 UMI counts in all cells (not all genes captured in each cell, but I guess still a lot)

Comment: PDF plot: same conclusion as above

e. gRNA dataset

i) gRNA barcodes

Code: gRNA_barcode_dist_plot.R

Comment: The cell with highest overall (mean) gRNAs, and it has 15 highly expressed gRNAs

Comment: The cell with lowest overall (mean) gRNAs (transfection/transduction failed in this cell)

Comment: Non-zero UMI counts in all cells (I’d say the transfection/transduction relatively even across all cells)

Comment: CDF plot: ~80% cells have < ~40 UMI counts for each gRNA (note: the authors mentioned MOI ~ 10)

Comment: PDF plot: same conclusion as above

ii) gRNA targets

Code: gRNA_target_dist_plot.R

Comment: The highest (mean) gRNA in all cells (gRNA targeting PPIA-2, which is a control)

Comment: The lowest (mean) gRNA in all cells (likely it’s a low score gRNA site but the authors didn’t have better choices)

Comment: Non-zero UMI counst for all gRNAs (137,347 cells in total; I’d say the transfection/transduction efficiency varies among gRNAs. The authors designed all the gRNAs within 200bp of the targeted variants,there must be limitations in terms of gRNA options)

Comment: CDF plot: ~80% gRNAs have < ~20,000 UMI counts in all cells (137,347 cells in total, ~15% transfection/transduction success rate, acceptable)

Comment: PDF plot: same conclusion as above

f. Hashtag dataset

i) Hashtag barcodes

Code: Hashtag_barcode_dist_plot.R

Comment: The cell with highest (mean) Hashtags (note: the authors used only 4 Hashtags, I might check which antibodies they are when performing association)

Comment: The cell with lowest (mean) Hashtags (not tagged by any of the antibodies)

Comment: This figure is not an error. All cells have 1/2/3/4 UMI counts, and because many of them have 4, it looks like a block when it’s such dense

Comment: CDF plot: ~80% cells have < ~2 UMI counts for each Hashtag (It make sense to me because the authors are likely trying to label different cell types)

Comment: PDF plot: same conclusion as above

i) Hashtag targets

Code: Hashtag_target_dist_plot.R

Comment: The highest (mean) Hashtag (HTO23) in all cells (I would guess this is the relatively more common cell type, also, there were some non-specific antibody binding)

Comment: The lowest (mean) Hashtag (HTO25) in all cells (I would guess this is the relatively less common cell type, also, it dosen’t seem to overlap with HTO25, which is a good thing)

Comment: Non-zero UMI counts for the 4 Hashtags (I’d say the 4 cell types are relatively even)

Comment: CDF plot: ~80% Hashtags have < ~200,000 UMI counts in all cells (410,228 cells in total, I thinking the antibody binding efficiency is pretty good)

Comment: PDF plot: same conclusion as above

g. QC filtering

After preliminary filtering, the authors got 14,775 cells with 3,875 median genes per cell.

Previous question:

why percent-mito < 20%?
why UMI > 850?
why no UMI upper limit?

My understanding:

Previously, 5% was usually the default threshold. But a recent paper did systematic evaluation and proposed a default cutoff of 10% for human cells. Also, according to Luecken et al. 2019, we can use a relatively loose QC cutoff at the beginning. Since the percent-mito was just in the first QC filtering step and the author further filtered by HTO and GDO, I think 20% makes sense.
Again, according to Luecken et al. 2019, the dying/dead cells would be a small peak with low UMI counts. By the zoomed-in plot below, we can see 850 is a reasonable cutoff to remove the entire peak.
Doublets was not filtered out by UMI, but by HTO demuxing, therefore the authors didn’t set an upper limit.

Notes:

In the original cDNA feature txt file, there are only Ensembl gene IDs. To calculate percent-mito, I tried to convert the IDs to symbols using both this web tool and biomaRT package in R.
If I directly use the gene symbols from these two methods and the same filters by Nikita, I get exactly the same results as Nikita (14,813 cells retained).
However, after taking a closer look at the converted gene lists, there are still many “Mitochondrially Encoded” genes starting with “MT” rather than “MT-”, so I wrote a script to convert all these genes.
Then I performed filtering again and got 14,675 cells with 3,917 median genes per cell.
I don’t think we’ll know exactly how the authors filtered the cells unless they release their code.
After QC filtering, there were 9,391 cells retained (9,343 cells by the authors in comparison,
I did cross comparison. There are 508 cells from authors’ list not in my list.

Code: QC_filter.R
Code: UMI_plot.R

Before UMI count filtering:

After UMI count filtering:

Before percent-mito filtering (generated by Seurat):

After percent-mito filtering (generated by Seurat):

Barcodes comparison:

Code: QC_compare.R
QC_by_author.txt
QC_by_hang.txt

Comparison result:

[1] "There are 508 cells filtered out in comparison to authors' list."

g2. PDF y axis issue

Previous question:

Is the small values in the Y-axis of the previous PDF plot wrong?

My understanding: probably NOT wrong.

Point 1: similar results by a package for epdf

Plotted with base R:

hist(barcode_dist, freq=F, breaks=150, main="PDF: UMI counts in each barcode (cell)", xlab="UMI counts in each barcode (cell)", ylab="PDF")

Plotted with EnvStats package:

EnvStats::epdfPlot(barcode_dist, epdf.col = "red")

Point 2: area of the plot is ~1 by eye

Didn’t bother to do calculus, just very roughly calculated 1.5e-04 x 7000 = 1.05

h. zero-inflated plot & regression

Ref: Kim et al. 2020

Code: zero-flated.R

regression summary

summary(glm(zero_prop ~ target_mean, family = poisson, data = whichmodel_10)) # poisson

Call:
glm(formula = zero_prop ~ target_mean, family = poisson, data = whichmodel_10)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.13432  -0.00250   0.01255   0.01294   1.08289  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.012969   0.005915  -2.193   0.0283 *  
target_mean -0.672805   0.019706 -34.141   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2256.629  on 35374  degrees of freedom
Residual deviance:   88.113  on 35373  degrees of freedom
AIC: Inf

Number of Fisher Scoring iterations: 5

summary(glm.nb(zero_prop ~ target_mean, data = whichmodel_10)) # negative binomial

Call:
glm.nb(formula = zero_prop ~ target_mean, data = whichmodel_10, 
    init.theta = 18539.38634, link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.86791  -0.31620   0.01294   0.06140   1.32706  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.012969   0.005915  -2.193   0.0283 *  
target_mean -0.672803   0.019707 -34.141   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(18539.39) family taken to be 1)

    Null deviance: 6991.6  on 35374  degrees of freedom
Residual deviance: 4823.2  on 35373  degrees of freedom
AIC: 66516

Number of Fisher Scoring iterations: 1


              Theta:  18539 
          Std. Err.:  7723 
Warning while fitting theta: iteration limit reached 

 2 x log-likelihood:  -66509.53

i. re-QC

Note: this is to remove the dead cells using a more stringent UMI cutoff (850 originally, 1400 here)

Code: QC_filter_2.R
Code: UMI_plot_2.R

Before UMI count filtering:

After UMI count filtering:

Before percent-mito filtering (generated by Seurat):

After percent-mito filtering (generated by Seurat):

Barcodes comparison:

Code: QC_compare_2.R
QC_by_author.txt
QC_by_hang_2.txt

Comparison result:

[1] "There are 755 cells filtered out in comparison to authors' list."

sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9       compiler_4.2.1   pillar_1.8.0     bslib_0.3.1     
 [5] later_1.3.0      git2r_0.30.1     jquerylib_0.1.4  tools_4.2.1     
 [9] getPass_0.2-2    digest_0.6.29    jsonlite_1.8.0   evaluate_0.15   
[13] tibble_3.1.7     lifecycle_1.0.1  pkgconfig_2.0.3  rlang_1.0.2     
[17] cli_3.3.0        rstudioapi_0.13  yaml_2.3.5       xfun_0.31       
[21] fastmap_1.1.0    httr_1.4.3       stringr_1.4.0    knitr_1.39      
[25] sass_0.4.1       fs_1.5.2         vctrs_0.4.1      rprojroot_2.0.3 
[29] glue_1.6.2       R6_2.5.1         processx_3.6.1   fansi_1.0.3     
[33] rmarkdown_2.14   callr_3.7.0      magrittr_2.0.3   whisker_0.4     
[37] ps_1.7.1         promises_1.2.0.1 htmltools_0.5.2  ellipsis_0.3.2  
[41] httpuv_1.6.5     utf8_1.2.2       stringi_1.7.6

Project-1: STING-seq

1. Understand RNA-seq

a. Read about RNA-seq analysis

Some key points:

b. Read more about RNA-seq analysis

Some key points:

c. Read the Morris paper

Some key points:

2. Prelim QC for raw STING-seq data

a. Download all data

b. Perform FastQC on all fastq files

d. Kallisto | bustools pipeline

3. Analyze QC’ed STING-seq data

a. Install packages

b. Data overview

c. Calculate means and non-zeros

Output:

d. Expression (cDNA) dataset

i) Expression barcodes

ii) Expression targets

e. gRNA dataset

i) gRNA barcodes

ii) gRNA targets

f. Hashtag dataset

i) Hashtag barcodes

i) Hashtag targets

g. QC filtering

g2. PDF y axis issue

Point 1: similar results by a package for epdf

Point 2: area of the plot is ~1 by eye

h. zero-inflated plot & regression

regression summary

i. re-QC