Thursday, January 26, 2012

edgeR tumor and matched normal

Software: edgeR bioconductor R package
Description: edgeR for paired significant test use generalized linear model and likelihood ratio test to calculate the p-value for each genes/transcript.
Error: when any zero value contain in the count matrix, it shows:
  Error in beta[k, ] <- betaj[decr, ] :
  NAs are not allowed in subscripted assignments
Solution: first filter out the all-zero rows in the matrix, then transform the matrix by matrx + 1 to create non-zero matrix.
Question: compared to the large size of the library, the different in library size effecting on 1 could be neglected?

Wednesday, January 25, 2012

Fastx toolkit with illumina fastq file using standard Sanger format

Software: fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
Input: fastq file from illumina, standard Sanger format (Phred + 33)
Function: fastx_quality_stats
Description: Generate quality statistic report for fastq as output. Using this report as input to draw quality plot for fastq files.
Error message: fastx_quality_stats: Invalid quality score value (char '#' ord 35 quality value -29) on line 4.
Solution: -Q 33
Solution source: http://seqanswers.com/forums/showthread.php?p=49667
Final command: zcat sample.fastq.gz | fastx_quality_stats -N -Q 33 -o quality.txt