CNV

Same batch to make a reference, don’t use same reference. May be better?

CNV must have a panel of normals?

Try GATK and cnvkit

GATK4.0 study

docker run -v ~/gatk_bundle:/gatk/my_data -it broadinstitute/gatk:4.0.2.0

gatk HaplotypeCaller \
-R ref/ref.fasta \
-I bams/mother.bam \
-O sandbox/motherHC.vcf \
-L 20:10,000,000-10,200,000
gatk HaplotypeCaller \
-R ref/ref.fasta \
-I bams/mother.bam \
-O sandbox/motherHCdebug.vcf \
-bamout sandbox/motherHCdebug.bam \
-L 20:10,002,371-10,002,546 -ip 100

gatk HaplotypeCaller \
-R ref/ref.fasta \
-I bams/mother.bam \
-O sandbox/mother.g.vcf \
-ERC GVCF \
-L 20:10,000,000-10,200,000

gatk GenomicsDBImport \
-V gvcfs/mother.g.vcf \
-V gvcfs/father.g.vcf \
-V gvcfs/son.g.vcf \
–genomicsdb-workspace-path sandbox/trio \
–intervals 20:10,000,000-10,200,000

gatk GenotypeGVCFs \
-R ref/ref.fasta \
-V gendb://sandbox/trio \
-O sandbox/trioGGVCF.vcf \
-L 20:10,000,000-10,200,000

#_________________
gatk SelectVariants \
-R ref/ref.fasta \
-V input_vcfs/trio.vcf.gz \
-sn NA12878 \
-select-type SNP \
–exclude-non-variants \
-O sandbox/motherSNP.vcf.gz

somatic CNV

Step Latest GATK tool Old tool Description
1 PreprocessInterals PadTargets Pad or bin intervals for coverage collection
2 CollectFragmentCounts CalculateTargetCoverage Collect fragment counts at specified intervals
3 CreateReadCountPanelof
Normals CreatePanelofNormals Create the PoN from fragment
counts
4 DenoiseReadCounts NormalizeSomaticReadCounts Denoise case sample counts against the PoN
5 ModelSegments PerformSegmentation, AllelicCNV Group and model contiguous copy-ratios and allele fractions
6 CallCopyRatioSegments CallSegments Call copy neutral (0) loss (-), and gain (+) segments
7 PlotDenoisedCopyRatios
& PlotModeled
Segements PlotSegmentedCopyRatio, PlotACNVResults Plot copy ratios and allele fractions to visualize denoising
and segmentation

gatk PreprocessIntervals \
-L intervals/targets_C.interval_list \
–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–reference ref/Homo_sapiens_assembly38.fasta \
–padding 250 \
–bin-length 0 \
–interval-merging-rule OVERLAPPING_ONLY \
–output sandbox/targets_C.preprocessed.interval_list

 

Example Tumor
gatk –java-options “-Xmx6g” CollectFragmentCounts \
-I bams/tumor.bam \
-L sandbox/targets_C.preprocessed.interval_list \
–reference ref/Homo_sapiens_assembly38.fasta \
–format TSV \
–interval-merging-rule OVERLAPPING_ONLY \
–output sandbox/tumor_clean.counts.tsv

 

Example Normal
gatk –java-options “-Xmx6g” CollectFragmentCounts \
-I bams/normal.bam \
-L sandbox/targets_C.preprocessed.interval_list \
–reference ref/Homo_sapiens_assembly38.fasta \
–format TSV \
–interval-merging-rule OVERLAPPING_ONLY \
–output sandbox/normal_clean.counts.tsv

gatk –java-options “-Xmx6500m” CreateReadCountPanelOfNormals \
–input file1_clean.counts.tsv \

–input file40_clean.counts.tsv \
–minimum-interval-median-percentile 5.0 \
–output cnvponM.pon.hdf5

gatk –java-options “-Xmx7g” DenoiseReadCounts \
-I cnv_inputs/hcc1143_T_clean.counts.hdf5 \
–count-panel-of-normals cnv_inputs/cnvponC.pon.hdf5 \
–standardized-copy-ratios sandbox/hcc1143_T_clean.standardizedCR.tsv \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv

 

gatk –java-options “-Xmx7500m” ModelSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–output sandbox \
–output-prefix hcc1143_T_clean

 

gatk –java-options “-Xmx6000m” CallCopyRatioSegments \
-I sandbox/hcc1143_T_clean.cr.seg \
-O sandbox/hcc1143_T_clean.called.seg

gatk –java-options “-Xmx6000m” PlotDenoisedCopyRatios \
–standardized-copy-ratios sandbox/hcc1143_T_clean.standardizedCR.tsv \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–minimum-contig-length 46709983 \
–output cnv_plots \
–output-prefix hcc1143_T_clean

 

gatk –java-options “-Xmx6000m” PlotModeledSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–segments sandbox/hcc1143_T_clean.modelFinal.seg \

–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–minimum-contig-length 46709983 \
–output cnv_plots \
–output-prefix hcc1143_T_clean

1. Run ​ CollectAllelicCounts​ to collect reference and alternate allele counts for the tumor
and normal.
2. Provide the outputs from step 1 as inputs to ​ ModelSegments​ , along with the denoised
copy ratios from the tumor.
3. Make plots with ​ PlotModeledSegments​ . We skip plotting for the
PlotDenoisedCopyRatios because it will have the same inputs & outputs as in section 5.

gatk –java-options “-Xmx7500m” CollectAllelicCounts \
-L cnv_inputs/theta_biallelicsnps_agilentintervals.interval_list \
-I bams/normal.bam \
–reference ref/Homo_sapiens_assembly38.fasta \
–output sandbox/hcc1143_N_clean.allelicCounts.tsv

gatk –java-options “-Xmx7500m” ModelSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–allelic-counts cnv_inputs/hcc1143_T_clean.allelicCounts.tsv \
–normal-allelic-counts cnv_inputs/hcc1143_N_clean.allelicCounts.tsv \
–output sandbox \
–output-prefix hcc1143_TN_clean

gatk –java-options “-Xmx6000m” PlotModeledSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–allelic-counts sandbox/hcc1143_TN_clean.hets.tsv \
–segments sandbox/hcc1143_TN_clean.modelFinal.seg \
–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–minimum-contig-length 46709983 \
–output cnv_plots \
–output-prefix hcc1143_TN_clean

 

linux shell array

数值类型的数组:一对括号表示数组,数组中元素之间使用“空格”来隔开。

举个列子:

arr_number=(1 2 3 4 5);

字符串类型数组:同样,使用一对括号表示数组,其中数组中的元素使用双引号或者单引号包含,同样使用“空格”来隔开。

arr_string=(“abc” “edf” “sss”); 或者 arr_string=(‘abc’ ‘edf’ ‘sss’);

获取数组长度

arr_length=${#arr_number[*]}或${#arr_number[@]}均可,即形式:${#数组名[@/*]} 可得到数组的长度。

读取某个下标的值

arr_index2=${arr_number[2]},即形式:${数组名[下标]}

对某个下标赋值

这里需要提出两个问题:

第一个问题是如果该下标元素已经存在,会怎么样?

答:会修改该下标的值为新的指定值。

例如:arr_number[2]=100,数组被修改为(1 2 100 4 5)

第二个问题是如果指定的下标已经超过当前数组的大小,如上述的arr_number的大小为5,指定下标为10或者11或者大于5的任意值会如何?

答:新赋的值被追加到数组的尾部。

例如:arr_number[13]=13,数组被修改为(1 2 100 4 5 13)

删除操作

清除某个元素:unset arr_number[1],这里清除下标为1的数组;

清空整个数组:unset arr_number;

分片访问

分片访问形式为:${数组名[@或*]:开始下标:结束下标},注意,不包括结束下标元素的值。

例如:${arr_number[@]:1:4},这里分片访问从下标为1开始,元素个数为4。

模式替换

形式为:${数组名[@或*]/模式/新值}

例如:${arr_number[@]/2/98}

数组的遍历

数组遍历我们使用for语句来演示:

for v in ${arr_number[@]}; do

echo $v;

done

[Forward]docker centos7 目录配置

对于刚把docker迁移到centos7的一些童鞋来说,可能会不习惯原本在centos6.x下对docker的一些默认配置。尤其是在目前的一些云主机上,系统盘一般都是20GB,要是docker image 比较多并且使用默认的docker image 存放目录可能会很快不够用。

而在centos6.x下我们通常可以通过修改 `/etc/sysconfig/docker` 中的 `other_args`参数达到目录,如把docker 相关的文件从`/var/lib/docker`转移到`/data/docker`我们可以这么配置:

    other_args=”-g /data/docker -p /var/run/docker.pid”

而在 centos7会有点不一样,因为`systemctl`的原因,启动参数默认定义在`docker.service`中,具体可以参见官方说明。我们可以这么修改(_以我自己的测试机为例,docker.service的路径因设置可能不一样_)

    sudo vim /usr/lib/systemd/system/docker.service

修改`[service]`中的`ExecStart`,参考以下示例:

    [Service]
ExecStart=/usr/bin/docker daemon -H fd:// –graph /data/docker

特此备注,希望对有需要的人有用。


作者:灵魂秩序者
链接:https://www.jianshu.com/p/8266e6567c8b
來源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

Install docker on centos7

Uninstall old versions

Older versions of Docker were called docker or docker-engine. If these are installed, uninstall them, along with associated dependencies.

$ sudo yum remove docker \
                  docker-common \
                  docker-selinux \
                  docker-engine
$ sudo yum-config-manager \
    --add-repo \
https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sudo yum install docker-ce
sudo systemctl start docker

sudo docker run hello-world
这一步极有可能出错,为什么呢?因为大中国的“长城”啊,怎么解决?

1:使用代理访问

https://docs.docker.com/engine/admin/systemd/#start-automatically-at-system-boot

Create a systemd drop-in directory for the docker service:

  1. $ sudo mkdir -p /etc/systemd/system/docker.service.d
    
  2. Create a file called /etc/systemd/system/docker.service.d/http-proxy.conf that adds the HTTP_PROXY environment variable:
    [Service]
    Environment="HTTP_PROXY=http://proxy.example.com:80/"
    

    Or, if you are behind an HTTPS proxy server, create a file called /etc/systemd/system/docker.service.d/https-proxy.conf that adds the HTTPS_PROXY environment variable:

    [Service]
    Environment="HTTPS_PROXY=https://proxy.example.com:443/"
    
  3. If you have internal Docker registries that you need to contact without proxying you can specify them via the NO_PROXY environment variable:
    Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"
    

    Or, if you are behind an HTTPS proxy server:

    Environment="HTTPS_PROXY=https://proxy.example.com:443/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"
    
  4. Flush changes:
    $ sudo systemctl daemon-reload
    
  5. Restart Docker:
    $ sudo systemctl restart docker
    
  6. Verify that the configuration has been loaded:
    $ systemctl show --property=Environment docker
    Environment=HTTP_PROXY=http://proxy.example.com:80/
    

    Or, if you are behind an HTTPS proxy server:

    $ systemctl show --property=Environment docker
    Environment=HTTPS_PROXY=https://proxy.example.com:443/
    

2:使用国内docker镜像

https://docs.docker.com/registry/recipes/mirror/#configure-the-docker-daemon

Use case: the China registry mirror

The URL of the registry mirror for China is registry.docker-cn.com. You can pull images from this mirror just like you do for other registries by specifying the full path, including the registry, in your docker pull command, for example:

$ docker pull registry.docker-cn.com/library/ubuntu

You can add "https://registry.docker-cn.com" to the registry-mirrors array in /etc/docker/daemon.json to pull from the China registry mirror by default.

{
  "registry-mirrors": ["https://registry.docker-cn.com"]
}

Save the file and reload Docker for the change to take effect.

3:有本事自己搞一个镜像本地服务

https://docs.docker.com/registry/deploying/

Centos s-s privoxy : anyone would be known the word s-s

 

yum install python-pip
pip install shadowsocks
为了更方便,建议新建一个.json的文件,将上述信息放里面,如新建/etc/ss.json文件,内容为:

{
“server”:”your_server_ip”, #ss服务器IP
“server_port”:your_server_port, #端口
“local_address”: “127.0.0.1”, #本地ip
“local_port”:1080, #本地端口
“password”:”your_server_passwd”,#连接ss密码
“timeout”:300, #等待超时
“method”:”rc4-md5″, #加密方式
“fast_open”: false, # true 或 false。如果你的服务器 Linux 内核在3.7+,可以开启 fast_open 以降低延迟。开启方法: echo 3 > /proc/sys/net/ipv4/tcp_fastopen 开启之后,将 fast_open 的配置设置为 true 即可
“workers”: 1 # 工作线程数
}

服务端和客户端的配置一样,不过服务端要开机运行, vi /etc/rc.local

加入:/usr/local/bin/ssserver -c /root/config.json -d start > /root/tmp.txt &

客户端然后运行
nohup sslocal -c /etc/ss.json /dev/null 2>&1 &

##autostart when centos startup

$ vi /etc/init.d/ss.sh
#!/bin/sh
#chkconfig:2345 80 90
#description:ss
/bin/sslocal -c /etc/ss.json &

chkconfig –add ss.sh

安装Privoxy
yum isntall privoxy
配置修改

配置修改

vi /usr/local/etc/privoxy/config
:783: 找到 783行,去掉前面的注释符号,端口可以随便改
listen-address 127.0.0.1:8118
:1336: 找到 1336行,去掉前面的注释符号,后面的1080端口要对应ss服务里面的配置,要一致
forward-socks5t / 127.0.0.1:1080 .

socks5t / 127.0.0.1:1080 . 此句的注释去掉(注意后面的点不要删了哦).  8118端口走http协议,1080走socks5协议,socks5会经过http,这样间接的ss也能支持http协议了

让终端走代理

开启http代理需要配置相应的环境变量
全局代理模式环境变量设置如下

vi /ect/profile
vi ~/.bash_profile
vi ~/.bashrc

export https_proxy=http://127.0.0.1:8118
export http_proxy=http://127.0.0.1:8118
export ftp_proxy=http://127.0.0.1:8118 结果验证
service privoxy start
curl www.google.com
That's OK! enjoy it~

##

Setup python3 and pip3 coexist python2-pip2

$ sudo mkdir /usr/local/python3 # 创建安装目录
sudo chmod -R 777 /usr/local/python3 

# 下载 Python 源文件
 wget --no-check-certificate https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tar.xz
# 注意:wget获取https的时候要加上:--no-check-certificate

$ tar -xzvf Python-3.6.3.tar.xz# 解压缩包

$ cd Python-3.6.0 # 进入解压目录
$  ./configure --prefix=/usr/local/python3 # 指定创建的目录

$ make

$ sudo make install

创建 python3 的软链接:

sudo ln -s /usr/local/python3/bin/python3 /usr/bin/python3

sudo ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3

sudo ln -s /usr/local/python3/bin/pyenv /usr/bin/pyenv3

通过 python 命令使用 Python 2,python3 来使用 Python 3。