Amplicon pileup analysis pipeline

[1/4] Setup environment

Pipeline failures are often due to an improperly configured environment. To ensure a robust and consistent setup for Ampile, I’ve created a dedicated configuration script. To execute the setup:

  • Install curl if it is not already installed on your system (e.g., sudo apt install curl on Ubuntu).
  • Connect to internet and execute the below command in terminal:
 bash -c "$(curl -fsSL https://raw.githubusercontent.com/chenh19/Ampile/refs/heads/main/setup.sh)" 

Note:

[2/4] Prepare input files

  • Prepare reference sequences (.fa files) and sequencing reads (.fastq or .fastq.gz files) in a master folder (you may name the folder as desired):

  • You may also organize the files into the two designated subfolders, ./1.ref/ and ./2.fastq/:

Note:
  • Example files are provided in the /examples/ folder.
  • The pipeline will automatically organize input files if they are not already in the two designated subfolders.
  • The pipeline will also automatically compress sequencing reads to .fastq.gz if they are provided in .fastq format.

[3/4] Running the pipeline

  • Change current directory to the folder containing the input files (e.g., cd ~/Desktop/Ampile/).
  • Connect to internet and execute the below command in terminal:
 bash -c "$(curl -fsSL https://raw.githubusercontent.com/chenh19/Ampile/refs/heads/main/ampile.sh)" 

  • Alternatively, you may download the GitHub repository and place all scripts in the /src/ folder along with the input files to run them manually:

Note:
  • All scripts assume the master folder as the working directory.
  • If you are running the scripts manually on Linux, please don’t forget to load conda environment first: source ~/miniconda3/etc/profile.d/conda.sh && conda activate ampile

[4/4] Done

  • You may further analyze the parsed mutation rates and perform comparative analyses between groups. The corresponding spreadsheets are located at ./3.analysis/8.spreadsheets/2.mutation_rates/.
Note:
  • The directories ./3.analysis/1.refseq/, ./3.analysis/2.trim/, ./3.analysis/3.bam/, and ./3.analysis/4.mpileup/ contain large intermediate files. You may choose to delete them unless you need them for troubleshooting.