mangoes.utils.multiproc module

class mangoes.utils.multiproc.DataParallel(task, reducer, nb_workers=1, batch=None)

Bases: object

Utility class to parallelize a function to apply on iterable data

Parameters
task: callable

function that takes an iterable as first argument and returns a value that can be reduced with the ‘reducer’ parameter. Its signature should be : task_function(jobs_iterator, *args, **kwargs)

reducer: callable

function that takes two objects of the same type (same type as the output of the ‘task’ function), merge them and returns an object of the same type. Its signature should be : reduce_function(current_output, new_output)

nb_workers: int

number of subprocess to use (default=1)

batch: int, default None

if provided, yields the result of task and call the reducer after batch iterations

Methods

run(data, *args, **kwargs)

Run the task function on data with nb_workers processes.

run(data, *args, **kwargs)

Run the task function on data with nb_workers processes.

Parameters
data: iterator

iterator over data to process

args

arguments for task function

kwargs

keywords arguments for task function

Returns
the result of the merging of all the values returned by task