Performance#
There is a little utility to try it out on your own systems and files (there are options, available with the flag –help).
The two mode are “speed” and “compare”, the former benchmarking the speed of different parsers and the second comparing the output of different parsers (not so good to be fast if not correct).
Speed#
python -m fastqandfurious.demo.benchmark speed <FASTQ or FASTQ.gz or FASTQ.bz2 or FASTQ.lzma file>
Note that third-party library parsing FASTQ files are required in order to be able to run the full benchmark.
With a gzip-compressed FASTQ file of 146MB (size compressed) with 1,562,120 entries, the benchmark is (the throughput is for the DNA sequences in the file - headers and quality strings are not counted):
parser |
throughput (MB/s) |
notes |
---|---|---|
screed |
11.0 |
|
biopython |
6.8 |
|
biopython FastqGeneralIterator |
34.5 |
Bio.SeqIO.QualityIO.FastqGeneralIterator |
pyfastx |
51.7 |
|
fastqandfurious |
32.0 |
pure python |
fastqandfurious w/ c-ext |
48.7 |
using C extension in the package |
fastqandfurious w/ c-ext + index |
37.7 |
Like above and w/ index of entry positions |
fastqandfurious with c-extension is 43% faster than Biopython’s FastqGeneralIterator. The relatively recent pyfastx is only 6% faster than fastqandfurious while at the cost of pretty much all Python-level flexibility in fastqandfurious. For example, fastqandfurious can handle input from any Python io stream. This allows the use other compression algorithms other than gzip (e.g., LZO-compression), or to data not in files (e.g., network streams, pipes, or interprocess communications).
Compare#
To compare the output of two parsers, for example biopython and our parser:
python -m fastqandfurious.demo.benchmark compare biopython fastqandfurious \
<FASTQ | FASTQ.gz | FASTQ.bz2 | FASTQ.lzma>