Performance ----------- There is a little utility to try it out on your own systems and files (there are options, available with the flag `--help`). The two mode are "speed" and "compare", the former benchmarking the speed of different parsers and the second comparing the output of different parsers (not so good to be fast if not correct). Speed ^^^^^ .. code-block:: bash python -m fastqandfurious.demo.benchmark speed Note that third-party library parsing FASTQ files are required in order to be able to run the full benchmark. With a gzip-compressed FASTQ file of 146MB (size compressed) with 1,562,120 entries, the benchmark is (the throughput is for the DNA sequences in the file - headers and quality strings are not counted): +----------------------------------+-------------------+--------------------------------------------+ | parser | throughput (MB/s) | notes | +==================================+===================+============================================+ | screed | 11.0 | | +----------------------------------+-------------------+--------------------------------------------+ | biopython | 6.8 | | +----------------------------------+-------------------+--------------------------------------------+ | biopython FastqGeneralIterator | 34.5 | `Bio.SeqIO.QualityIO.FastqGeneralIterator` | +----------------------------------+-------------------+--------------------------------------------+ | pyfastx | 51.7 | | +----------------------------------+-------------------+--------------------------------------------+ | fastqandfurious | 32.0 | pure python | +----------------------------------+-------------------+--------------------------------------------+ | fastqandfurious w/ c-ext | 48.7 | using C extension in the package | +----------------------------------+-------------------+--------------------------------------------+ | fastqandfurious w/ c-ext + index | 37.7 | Like above and w/ index of entry positions | +----------------------------------+-------------------+--------------------------------------------+ `fastqandfurious` with c-extension is 43% faster than Biopython's FastqGeneralIterator. The relatively recent `pyfastx` is only 6% faster than `fastqandfurious` while at the cost of pretty much all Python-level flexibility in `fastqandfurious`. For example, `fastqandfurious` can handle input from any Python `io` stream. This allows the use other compression algorithms other than gzip (e.g., LZO-compression), or to data not in files (e.g., network streams, pipes, or interprocess communications). Compare ^^^^^^^ To compare the output of two parsers, for example `biopython` and our parser: .. code-block:: bash python -m fastqandfurious.demo.benchmark compare biopython fastqandfurious \