一、免疫组库

免疫组库主要看可变区的CDR3序列
其他基础知识见：
10× Genomics单细胞免疫组库VDJ分析必知必会
 单细胞免疫组库：TCR基因重排原理和TCR测序建库方法
We need a minimum of 30x coverage in order to confidently identify unique VDJ sequences as truly unique . In addition, we need to ensure that this coverage amount can be met for all samples, even though we have large variation in the concentration of T-cells within these samples. We assume that a highly diverse sample will have a correspondingly lower average read depth, but need to determine the correlation between initial T-cell concentration and read depth.
ref： T-Cell Concentration and Coverage Depth #10
二、免疫组库数据处理

cellranger vdj --id=sample_name \
                 --reference=/opt/refdata-cellranger-vdj-GRCh38-alts-ensembl-3.1.0 \
                 --fastqs=/fastq_path \
                 --sample=sample_name \
                 --localcores=8 \
                 --localmem=64 
结果中会有一个检测报告，进行质控
  
1.1.2 其他单细胞测序平台
 
平台信息见：单细胞TCR-Seq技术——更高效的TCR a/b 链配对分析
 处理软件：
 TraCeR – reconstruction of T cell receptor sequences from single-cell RNA-seq data.
 scTCR-seq – an implementation of a pipeline for Single-cell RNAseq package for recovering TCR data in python
 TRAPeS – TCR Reconstruction Algorithm for Paired-End Single-cell
 VDJPuzzle2 – TCR and BCR reconstruction from scRNA-seq data
 Mixcr – a universal software for fast and accurate analysis of raw T- or B- cell receptor repertoire sequencing data
 Immunarch – Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R
 
  
1.2、bulk TCR/BCR
 
1.2.1 处理流程
 
bulk TCR/BCR的原始数据处理和RNA-seq的原始数据处理步骤一样，此处不再详细记录
 1、QC
 2、trimmomatic后fastqc
 3、之后可用mixcr、Immunarch等得到CDR3序列
 详见：Bulk VDJ测序数据处理基本方法
 全网第一篇免疫组库分析教程[MiXCR+VDJtools+Python+R] 
1.2.2 注意事项
 
fastqc后，TCR/BCR的结果和RNA-seq的有点不一样，此处记录一下
 1、Per base sequence content的TCGA配对碱基含量不一致和overrepresented sequence序列过多
 这是由于免疫组库是mPCR的文库不是均一的，所以TCGA含量不一致，是正常现象
 
 
 2、Per base N content含量过多
 未知的碱基被标记为N，需要用trimmomatic去除N后符合fastqc的标准再进行后续分析
 注：剪切过滤后会出现很多短序列
  
1.2.3 mixcr代码
 
mixcr analyze amplicon --species hs \
        --adapters no-adapters \
        --starting-material dna \
        --5-end v-primers \
        --3-end j-primers \ 
        --receptor-type tcr \
        --productive \
/home/zy/TCR/example_A_1_val_1.fq.gz /home/zy/TCR/example_A_2_val_2.fq.gz analysis
#有些用的J-primers有些用的c-primers,不知如何选择，试了之后发现两者跑出来的结果一致
# 问过师兄后说是直接不写5-end和3-end参数也可以
2、筛选functional的CDR3序列
 
通常使用的过滤条件：
 1、序列为productive
 2、核酸序列为3的倍数，氨基酸序列大于4 
其他的过滤条件
 例如：expression abundance(TPM)：
 已看到几篇文章在使用，alpha chain TPM < 10 or beta chain TPM < 15， 但不确定是公认标准还是个人经验设置 
 结果部分：In total, we detected full TCR sequences for 94% (3,792/4,032) T cells, with at least one paired productive TCR a-b chain for subsequent analyses (Table S5). While most cells expressed unique TCR a and b alleles, nonunique a and/or b could be detected in a fraction of T cells. After
 eliminating non-productive alleles (e.g., out-of-frame transcripts) or low-abundance TCRs (Figure S6A), we found that 84% (3,174/3,792) contained unique and productive a chains and 94% (3,559/3,792) unique and productive b chains (Figure S6B), in agreement with previous reports
 
 方法部分：
 TCR analysis
 The TCR sequences for each single T cell were assembled by the TraCeR method from single cell RNA-Seq data, leading to the identification of the CDR3 sequence, the rearranged TCR genes, and their expression abundance (transcripts per million, TPM). First, we discard those cells with no obvious TCR forms. Then we arrange TCR alpha and beta chain respectively with the following steps. The first TCR alpha (beta) chain was defined as follows: 1) keep all single T cells in which only one productive TCR alpha and beta chain was present. 2) if more than one TCR alpha or beta chain were identified in one T cell, we kept only the cells in which a dominant form of alpha and beta was detected. Often, one alpha/beta chain was productive and the other chain was non-productive, or the expression level of one was far higher than the alternative allele, and the productive or dominant form was identified. Next, we filtered out the second TCR alpha chains with TPM less than 10 and beta chains with TPM less than 15 to eliminate the biological and bioinformatics error based on the histogram analysis for the expression distribution (Figure S6A). From a total 4032 cells with successfully assembled TCR sequences, we identified the TCR alpha/beta pairs for 3792 cells.
 
 cite: Zheng C, Zheng L, Yoo JK, et al. Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing. Cell. 2017;169(7):1342-1356.e16. doi:10.1016/j.cell.2017.05.035 
 TCR analysis
 To reduce false positive assembly, we filtered out TCR assemblies with alpha chain TPM < 10 or beta chain TPM < 15…Only productive (that is, in frame) TCR alpha–beta pairs were considered to define the dominant TCR of a single cell.
 
 cite: Guo, Xinyi et al. “Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing.” Nature medicine vol. 24,7 (2018): 978-985. doi:10.1038/s41591-018-0045-3 
 本文仅仅简单记录了一下首次处理免疫组库数据的一些收获，还待后续补充。 
                    文章目录前言一、免疫组库二、使用步骤1.引入库2.读入数据总结前言刚接触免疫组库数据提示：以下是本篇文章正文内容，下面案例可供参考一、免疫组库We need a minimum of 30x coverage in order to confidently identify unique VDJ sequences as truly unique. In addition, we need to ensure that this coverage amount can be met for 
				免疫组库数据分析（三）：免疫组库数据可视化
在系列文章第二篇《免疫组库数据分析（二）：Excel 分析免疫组库数据》中，分析了免疫组库中V基因、J基因、V-J组合的使用频率。在氨基酸水平，分析了CDR3 的氨基酸的长度分布以及20种氨基酸的使用频率；在免疫组库多样性方面，分析了4种不同的多样性指数。
本篇将利用作图软件Graphpad prism 8以及Excel 将上述分析的数据进行可视化，此外利用在线工具分析CDR3 氨基酸保守性，或者两组样本CDR3长度的氨基酸差异。
数据可视化
1. 免疫组库
import matplotlib.pyplot as plt
df = pd.read_csv("/mnt/g/20220309-scBCR/HY01-1F11_ALL.csv",sep=",",low_memory=False)
				免疫组库数据分析（一）：windows 系统下MiXCR的安装和使用
     免疫系统的T细胞或者B细胞免疫组库的多样性主要取决于抗原决定簇CDR3区域的多样性，CDR3区域有部分V基因的3端到J基因的5‘端序列构成，其中包含D基因。因此如何多维度的分析CDR3至关重要。
     本系列文章分析小鼠 5’RACE实验数据，并在Windows 系统下用MIXCR进行初步分析，利用Excel进行进一步分析，利用Graphpad prism 8 以及在线绘图网站进行一系列的可视化分析。希望以少代码
 具有可选的CDR3重建步骤，该步骤允许从几个不相交的读取中恢复完整的高变区。 同时使用一流的效率来保护免受误报汇编的复杂算法。
 组装克隆型，应用几种错误校正算法以消除由PCR和测序错误引起的人为多样性
克隆型可以基于CDR3序列（默认）以及任何其他区域进行组装，包括全长可变序列（从FR1的开始到FR4的结束）
 组装完整的TCR / Ig受
免疫算法与遗传算法其实非常相似，但其独特的地方在于，免疫算法用激励度而非亲和度来衡量结果的好坏，而激励度又与抗体密度有关，这就使得密度大的抗体激励度反而小，让免疫算法有全局搜索的能力，不容易陷入局部最优，接下来我就结合代码来讲解。
2.开发环境
【Anaconda + jupyter notebook python 3.7.9】
3.具体实例
现有一函数,定义域为,函数图像
首先导入需要的包
numpy
pandas(这个好像全程没用到)
matplotl...
				MIXCR
羊驼（好像是已经免疫过后的）外周血转录组/基因组经多重PCR扩增后，形成特定库并将这些序列重组于表达载体转入噬菌体（噬菌体展示技术），经固相/液相淘选后得到高亲和力的VHH序列库。该序列库再次放大构成高通量测序库，采用PE300测序策略。
paired reads 组装成productive contig
注释contig得到FWR1/CDR1/FWR2/CDR2/FWR3/CDR3/FWR4等信息
得到clonotype
unique protein的统计信息，unqiu
Read subsampling
Reduce the reads for a given barcode to at most 80,000, because more reads don’t help.
Read trimming
Trim off read bases after enrichment primers.
Graph formation
Build a De Bruijn graph using 
单细胞实战(三) Cell Ranger使用初探
把sra文件转化成fastq格式，并对fastq格式的文件进行质控。
find  ./10X -name "*R1*.gz">id-1.txt
find ./10X -name "*R1*.gz">id-2.txt
cat id-1.txt id-2.txt >id-all.txt
cat id-all.txt| xargs fastqc -t 20 -o ./
得到质控结果。
基本上可以用于下游分析。
接下来就是用 cell
				MVC（Model-View-Controller）是一种软件架构模式，用于将应用程序的逻辑分离成三个不同的组件，即模型（Model）、视图（View）和控制器（Controller）。在JavaWeb项目中，可以使用MVC模式对管理员登录进行RSA加密验证，具体步骤如下：
1. Model层：定义管理员实体类和数据访问层接口。管理员实体类包含管理员的账号和密码属性，数据访问层接口定义管理员登录验证方法。
public class Admin {
    private String account;
    private String password;
    // getter和setter方法
public interface AdminDAO {
    public boolean validate(String account, String password);
2. View层：定义登录页面，包括账号和密码输入框和登录按钮。在登录页面的JavaScript代码中，使用RSA算法对密码进行加密，并将加密后的密码传递给Controller层。
<form id="login-form" action="#" method="post">
    <input type="text" id="account" name="account" placeholder="请输入账号">
    <input type="password" id="password" name="password" placeholder="请输入密码">
    <button type="button" id="login-btn">登录</button>
</form>
<script src="js/rsa.js"></script>
<script>
    var pubkey = "MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCWx9X7vUVF+JlRgEj8Iz0L7VdJ\n" +
        "1yU6jD9+2gGJ/y+U6V5SbJz/2Q5c5G5t5LJ4Af4hXNlNjuyy+4dD8/BZwDmHv/TY\n" +
        "f8tG8mlHJF3c+3fQmQGw5M2QOA+5K5J5L5h4x4pB4oXJ0Kj1f5JZ+8t1aCzwhhO\n" +
        "y+E+8lF9f3Gq/H2jJQIDAQAB";
    var encrypt = new JSEncrypt();
    encrypt.setPublicKey(pubkey);
    $('#login-btn').click(function() {
        var account = $('#account').val().trim();
        var password = $('#password').val().trim();
        var encryptedPwd = encrypt.encrypt(password);
        $.ajax({
            type: 'POST',
            url: 'login',
            data: { account: account, password: encryptedPwd },
            success: function(data) {
                // 处理登录结果
            error: function() {
                alert('登录失败');
</script>
3. Controller层：处理登录请求，调用Model层的数据访问层接口进行管理员登录验证。在登录验证方法中，使用RSA算法对加密后的密码进行解密，并与数据库中保存的密码进行比对。
public class AdminController {
    private AdminDAO adminDAO = new AdminDAOImpl();
    public void login(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        String account = request.getParameter("account");
        String encryptedPwd = request.getParameter("password");
        String password = decryptRSA(encryptedPwd); // RSA解密
        boolean result = adminDAO.validate(account, password);
        if (result) {
            request.getSession().setAttribute("admin", account);
            response.sendRedirect("admin.jsp");
        } else {
            response.getWriter().write("登录失败");
    private String decryptRSA(String encryptedPwd) {
        // 使用私钥解密
        return "";
以上就是使用MVC模式对管理员登录进行RSA加密验证的主要步骤。需要注意的是，RSA算法需要在前端页面和后端服务器均实现，才能进行加密和解密。为了提高安全性，可以使用HTTPS协议对数据传输进行加密。
                    您好，感谢您的分享。我最近也在使用TraCeR进行单细胞TCR分析。有两个问题想请教您。1）筛选CDR3序列/TCR的时候，第二个过滤条件我没有文章中看到类似的说法，想请问您是否方便提供这个条件的出处？同时关于TPM卡阈值的问题我似乎只在张泽民老师他们的文章里面看到这个条件。2）我看TraCeR定义克隆是根据V gene加上junction sequence再加上J genes这样去定义的，同时它最后assembly的时候也会输出CDR3的nt/aa信息，我另外还有一批10X数据，想和TraCeR的结果match上，但10X的ouput数据是没有junction sequences这一部分的，想请问在这种情况下，对于10X TCR 数据克隆的定义您有什么建议吗？
                机器学习阴性集的选择 —— drug-target interactions （DTIs）
                    已经变秃何时变强: 
                    你说的其实还是这里举出的第一种方法，也是最开始用的方法。第一种方法里就是把你手上有的药物和靶点集合中验证过存在确实关系的作为阳性集，没有证实存在关系的就作为阴性集，但是这样阴性集就会存在假阴性。所以后面的几种方法就是为了弥补这个缺陷
                机器学习阴性集的选择 —— drug-target interactions （DTIs）
                    Kivsen: 
                    各位大佬，小弟初探生信，一直存在一个问题。如果已知的DTI可以作为正例集那么，如何产生反例集的数据？毕竟我没办法获取到不存在关联的药物靶点对或者说，我无法断言，一对尚未存在关联的药物靶点就一定不存在关系。
                机器学习阴性集的选择 —— drug-target interactions （DTIs）
                    不正经的kimol君: 
                    收获很多，谢谢大佬的分享
一、免疫组库

二、免疫组库数据处理

1、组装得到CDR3序列

1.1、scTCR/BCR

1.1.1 10X Genomics平台

1.1.2 其他单细胞测序平台

1.2、bulk TCR/BCR

1.2.1 处理流程

1.2.2 注意事项

1.2.3 mixcr代码

2、筛选functional的CDR3序列