I've been running the flux simulator to generate 20M reads, 100bp, paired-end. The total running time is rather big,
[END] I finished, took me 37099 sec.
That'd be ok with me, but I've notice that most of the running time (~90%) is spent on the Sequencing step, and was wondering if this is expected.
Ho ho ho,
the sequencing step--especially when reproducing the actual read sequences rather than only their genomic locations–is dominated by I/O overhead; many bytes that are parts of the genomic sequence are to be read, and many bytes of the (possibly mutated) read sequences are to be written. In order to make yourself a picture, please have a look at the profiling output provided for the transition to the sequencing step:
The efficiency of the simulator in the sequencing step therefore depends directly on the corresponding hardware/system configuration. Having said that, approx. 10h for producing 2 GigaBases seems really a bit much of overhead, please consider whether someo of the following points can improve your situation:
You may consider to watchlist the evolution of ticket BARNA-117 above to get notified if there are updates on the topic.
Currently we are still busy with deliveries on the Eastern routes, but who knows--maybe we have a super-fast new hard disk for you tonight? Just in case, please remember hanging a 3.5" sock at your chimney. Merry Christmas