The simulation of RNA-Seq in Saccharomyces cerevisiae joins a reverse transcription model by poly-dT primers with subsequent fragmenation by DNAseI. Sequence biases that have been reported for the DNAseI fragmentation process (Hansen et al. 2010) are captured in the simulation by a position weight matrix (DNAseI.pwm).
Expression | ||
NB_MOLECULES | 5,000,000 | Number of RNA molecules initially in the experiment |
TSS_MEAN | 25 | Average deviation from the annotated transcription start site (TSS) |
POLYA_SCALE | 80 | Scale of the Weibull distribution, shifts the average length of poly-A tail sizes |
POLYA_SHAPE | 2 | Shape of the Weibull distribution describing poly-A tail sizes |
Reverse Transcription | ||
RTRANSCRIPTION | YES | Switch on the reverse transcription |
RT_PRIMER | PDT | Use poly-dT primers used for first strand synthesis |
RT_LOSSLESS | YES | Flag to force every molecule to be reversely transcribed |
RT_MIN | 500 | Minimum length observed after reverse transcription of full-length transcripts |
RT_MAX | 2,500 | Maximum length observed after reverse transcription of full-length transcripts |
Fragmentation | ||
FRAG_SUBSTRATE | DNA | Specifies DNA as the substrate of fragmentation |
FRAG_METHOD | EZ | Enzymatic digestion as fragmentation method |
FRAG_EZ_MOTIF | DNAseI.pwm | Fragmentation by enzymatic digestion |
Amplification and Size Segregation | ||
PCR_DISTRIBUTION | default | Default PCR distribution with 15 rounds and 20 bins |
GC_MEAN | 0.5 | Mean value of a gaussian distribution that reflects GC bias amplification probability |
GC_SD | 0.1 | Standard deviation of a gaussian distribution that reflects GC bias amplification probability |
FILTERING | YES | Enables size filtering of fragments |
SIZE_SAMPLING | MH | The Metropolis-Hastings algorithm is used for filtering |
Sequencing | ||
READ_NUMBER | 1,000,000 | Produce 1 million reads |
READ_LENGTH | 36 | Each read sequence is 36nt long |
PAIRED_END | NO | Single reads are simulated, one per fragment |
[INFO] I am collecting information on the run. initializing profiler ********** [INFO] Checking GTF file *[WARN] Unsorted in line 5 - cannot perform gene clustering: chrI + YAL069W @ 335 after YAL012W @ 130799 ********* OK (00:00:02) [GTF FILE] The GTF reference file given is not sorted, but we found a sorted version. [GTF FILE] The Simulator will use /Users/micha/Desktop/sacCer3_SGDGenes_fromUCSC120515_sorted.gtf [GTF FILE] You might want to update your parameters file [PROFILING] I am assigning the expression profile ********** OK (00:00:02) Reading reference annotation *[WARN] merging exon (31229,35248) with exon (29935,31227) in transcript YBL100W-B because intervening intron has 4 or less nt. [WARN] merging exon (222636,226598) with exon (221330,222634) in transcript YBL005W-B because intervening intron has 4 or less nt. *********[WARN] merging exon (-854953,-856257) with exon (-850989,-854951) in transcript YPR158C-D because intervening intron has 4 or less nt. OK (00:00:01) found 6664 transcripts [PROFILING] Parameters NB_MOLECULES 5000000 EXPRESSION_K -0.6 EXPRESSION_X0 5.0E7 EXPRESSION_X1 9500.0 PRO_FILE_NAME /Users/micha/Desktop/sacCer3_enzyme.pro profiling ********** OK (00:00:00) Updating .pro file ********** OK (00:00:00) molecules 4999971 [LIBRARY] creating the cDNA libary Initializing Fragmentation File ********** OK (00:00:04) 4999971 mol initialized [LIBRARY] Reverse Transcription [LIBRARY] Configuration Mode: PDT PWM: No RT MIN: 500 RT MAX: 2500 Processing Fragments ********** OK (00:00:15) 4999971 mol: in 4999971, new 0, out 4999971 avg Len 969.7831, maxLen 2500 preparing transcript sequences *[WARN] merging exon (31229,35248) with exon (29935,31227) in transcript YBL100W-B because intervening intron has 4 or less nt. *********[WARN] merging exon (-854953,-856257) with exon (-850989,-854951) in transcript YPR158C-D because intervening intron has 4 or less nt. OK (00:00:02) [LIBRARY] Enzymatic Digestion [LIBRARY] Configuration Left Flank : 100 Right Flank : 300 Motif: DNAseI.pwm Processing Fragments ********** OK (00:02:38) 60604099 mol: in 4999971, new 55604128, out 60604099 avg Len 80.00923, maxLen 2500 initializing Selected Size distribution [LIBRARY] Segregating cDNA (MCMC Filter) Processing Fragments ********** OK (00:01:47) 60604099 mol: in 60604099, new 0, out 25719279 avg Len 47.310493, maxLen 276 start amplification [INFO] Loading default PCR distribution [LIBRARY] Amplification [LIBRARY] Configuration Rounds: 15 Mean: 0.5 Standard Deviation: 0.1 Processing Fragments ********** OK (00:01:05) Amplification done. In: 25719279 Out: 693695450 25719279 mol: in 25719279, new 0, out 693695450 avg Len 47.319595, maxLen 266 Copied results to /Users/micha/Desktop/sacCer3_enzyme.lib Updating .pro file ********** OK (00:00:00) [SEQUENCING] getting the reads Initializing Fragment Index Indexing ********** OK (00:00:14) 13804020 lines indexed (693695450 fragments, 6534 entries) sequencing *[WARN] merging exon (31229,35248) with exon (29935,31227) in transcript YBL100W-B because intervening intron has 4 or less nt. *********[WARN] merging exon (-854953,-856257) with exon (-850989,-854951) in transcript YPR158C-D because intervening intron has 4 or less nt. OK (00:14:03) 693695450 fragments found (13804020 without PCR duplicates) 998612 reads sequenced 226528 reads fall in poly-A tail 407504 truncated reads Moving temporary BED file Updating .pro file ********** OK (00:00:00) Updating .pro file ********** OK (00:00:00) Updating .pro file ********** OK (00:00:00) Updating .pro file ********** OK (00:00:00) [END] I finished, took me 1305 sec.