用nextDenovo组装基因组

7 个月前 · 来自专栏萌哥与生信

浙江农林大学农学硕士

背景介绍

NextDenovo是武汉未来组（现在可能得叫希望组了）开发的用于三代基因组组装的软件。想当年读硕士的时候我还因为项目合作的事儿在未来组呆了好几个月来着。

可用资源

GitHub地址： https://github.com/Nextomics/NextDenovo
官方文档： https://nextdenovo.readthedocs.io/en/latest/
洲更学长的笔记： https://xuzhougeng.top/archives/Assembly-nanopore-with-NextDenovo

软件安装

安装起来比较轻松愉快，因为软件本体不需要安装，有编译好的二进制文件可以直接下载使用。唯一需要安装的就是一个python的依赖 Paralleltask

# 下载软件本体
wget https://github.com/Nextomics/NextDenovo/releases/download/v2.5.0/NextDenovo.tgz
# 安装依赖
python -m pip install Paralleltask
# 解压软件
tar -zxvf NextDenovo.tgz

软件测试

压缩包解压开之后可以找到里面有个 test_data 文件夹和它下面的示例程序 test_data/run.cfg ，可以直接运行测试一下软件能不能在你的服务器上跑通。当然这一步是非必须的哈。

cd NextDenovo
nextDenovo test_data/run.cfg

运行自己的项目

生成输入文件

把自己的组装数据的绝对路径存入文件并命名成 input.fofn

ls /path/to/01RawData/PacBio/*hifi_reads.fastq.gz > input.fofn

编写config文件

拷贝一份测试数据的cfg文件过来

cp ../NextDenovo/doc/run.cfg .

按照自己的项目的实际情况去修改参数。我的 test.run.cfg 文件如下：

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = test_nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes 
parallel_jobs = 24 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = hifi # clr, ont, hifi
input_fofn = input.fofn
workdir = 01_rundir
[correct_option]
read_cutoff = 1k
genome_size = x.xg # estimated genome size
sort_options = -m 20g -t 15
minimap2_options_raw = -t 8