Thursday, January 26, 2012

edgeR tumor and matched normal

Software: edgeR bioconductor R package
Description: edgeR for paired significant test use generalized linear model and likelihood ratio test to calculate the p-value for each genes/transcript.
Error: when any zero value contain in the count matrix, it shows:
  Error in beta[k, ] <- betaj[decr, ] :
  NAs are not allowed in subscripted assignments
Solution: first filter out the all-zero rows in the matrix, then transform the matrix by matrx + 1 to create non-zero matrix.
Question: compared to the large size of the library, the different in library size effecting on 1 could be neglected?

Wednesday, January 25, 2012

Fastx toolkit with illumina fastq file using standard Sanger format

Software: fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
Input: fastq file from illumina, standard Sanger format (Phred + 33)
Function: fastx_quality_stats
Description: Generate quality statistic report for fastq as output. Using this report as input to draw quality plot for fastq files.
Error message: fastx_quality_stats: Invalid quality score value (char '#' ord 35 quality value -29) on line 4.
Solution: -Q 33
Solution source: http://seqanswers.com/forums/showthread.php?p=49667
Final command: zcat sample.fastq.gz | fastx_quality_stats -N -Q 33 -o quality.txt

Thursday, April 28, 2011

MapSplice to find fusion genes from pair-end RNA-seq data

Why? I am trying to use MapSplice to find fusion now. I used theirs because they are conservative and every fusion must be supported by multiple reads with different starting points.
How? They support both parameters in command line or parameters in a configure file. And for pair-end data, the 2 parts of reads from the same pair must be stored in a separate file and in the configure file they must in a format like: s_1_1_seq.txt,s_1_2_seq.txt,s_2_1_seq.txt,s_2_2_seq.txt . with no space in between the file names, just coma.
Remember! If no file path are given in the configure file, MapSplice should be running in the directory that have the seq data (I am too careless).
What's going on? MapSplice is still check my reads format.