Efficiency guide¶
If you’re using Conveyor, you’re using it because you want to squeeze every possible drop of performance out of your application. This should serve as a short guide of performance-breaking scenarios and how to avoid them.
Run pipelines using with
statements¶
Pipelines can be run in two different ways, as demonstrated with the code snippets below:
pl = Pipeline()
pl.add(job)
pl.run([data, data2, data3])
pl.run([data4, data5, data6])
or…
with Pipeline() as pl:
pl.add(job)
pl.run([data, data2, data3])
pl.run([data4, data5, data6])
When running a pipeline multiple times, Conveyor encourages users to use the
second option described above. In the first case, heavyweight processes are
created and killed (called ‘opening’ and ‘closing’ the pipeline, respectively)
at the start and end of each invocation of .run()
. This is disadvantageous,
because creating and killing processes takes a large length of time. It would
be much better to create the processes in the pipeline, use them on the first
invocation of .run()
, keep them running, and then use them again on the
second invocation of .run()
. This is what the second case does, where the
pipeline is implicitly opened and closed at the start and end of the with
statement.