Amplicon pileup analysis pipeline
[1/4] Setup environment
Pipeline failures are often due to an improperly configured environment. To ensure a robust and consistent setup for Ampile, I’ve created a dedicated configuration script. To execute the setup:
- Install
curl
if it is not already installed on your system (e.g.,sudo apt install curl
on Ubuntu). - Connect to internet and execute the below command in terminal:
bash -c "$(curl -fsSL https://raw.githubusercontent.com/chenh19/Ampile/refs/heads/main/setup.sh)"
Note:
- This pipeline is dependent on:
R
,bwa
,fastqc
,fastp
,samtools
,bamtools
,parallel
,r-tidyverse
,r-expss
,r-filesstrings
,r-foreach
,r-doParallel
. It can be run in Linux, FreeBSD, and MacOS environments. - The ampile.sh script will verify that all required packages are installed before proceeding with the analysis.
- The setup.sh script requires no directory changes, and does not need administrative privileges on Linux.
- The pipeline has been tested on Debian 12, Ubuntu 24.04, Kubuntu 24.04, KDE Neon 20250616, Linux Mint 22.1, Zorin OS 17.3, Pop!_OS 22.04, Elementary OS 8, Fedora 42, Rocky Linux 10, AlmaLinux 9, RHEL 8 (UChicago Midway3), CentOS 7 (UChicago Midway2), FreeBSD 14.3, and MacOS Sequoia. If you’re using an unsupported OS or prefer an alternative setup method, please ensure that all required dependencies are installed.
[2/4] Prepare input files
- Prepare reference sequences (
.fa
files) and sequencing reads (.fastq
or.fastq.gz
files) in a master folder (you may name the folder as desired):
- You may also organize the files into the two designated subfolders,
./1.ref/
and./2.fastq/
:
Note:
- Example files are provided in the /examples/ folder.
- The pipeline will automatically organize input files if they are not already in the two designated subfolders.
- The pipeline will also automatically compress sequencing reads to
.fastq.gz
if they are provided in.fastq
format.
[3/4] Running the pipeline
- Change current directory to the folder containing the input files (e.g.,
cd ~/Desktop/Ampile/
). - Connect to internet and execute the below command in terminal:
bash -c "$(curl -fsSL https://raw.githubusercontent.com/chenh19/Ampile/refs/heads/main/ampile.sh)"
- Alternatively, you may download the GitHub repository and place all scripts in the /src/ folder along with the input files to run them manually:
Note:
- All scripts assume the master folder as the working directory.
- If you are running the scripts manually on Linux, please don’t forget to load conda environment first:
source ~/miniconda3/etc/profile.d/conda.sh && conda activate ampile
[4/4] Done
- You may further analyze the parsed mutation rates and perform comparative analyses between groups. The corresponding spreadsheets are located at
./3.analysis/8.spreadsheets/2.mutation_rates/
.
Note:
- The directories
./3.analysis/1.refseq/
,./3.analysis/2.trim/
,./3.analysis/3.bam/
, and./3.analysis/4.mpileup/
contain large intermediate files. You may choose to delete them unless you need them for troubleshooting.