Distribution transparent Machine Learning experiments on Apache Spark

logicalclocks, updated 🕥 2022-06-03 12:25:48


Hopsworks Community Maggy Documentation PyPiStatus Downloads CodeStyle License

Maggy is a framework for distribution transparent machine learning experiments on Apache Spark. In this post, we introduce a new unified framework for writing core ML training logic as oblivious training functions. Maggy enables you to reuse the same training code whether training small models on your laptop or reusing the same code to scale out hyperparameter tuning or distributed deep learning on a cluster. Maggy enables the replacement of the current waterfall development process for distributed ML applications, where code is rewritten at every stage to account for the different distribution context.

Maggy uses the same distribution transparent training function in all steps of the machine learning development process.

Quick Start

Maggy uses PySpark as an engine to distribute the training processes. To get started, install Maggy in the Python environment used by your Spark Cluster, or install Maggy in your local Python environment with the 'spark' extra, to run on Spark in local mode:

python pip install maggy

The programming model consists of wrapping the code containing the model training inside a function. Inside that wrapper function provide all imports and parts that make up your experiment.

Single run experiment:

```python def train_fn(): # This is your training iteration loop for i in range(number_iterations): ... # add the maggy reporter to report the metric to be optimized reporter.broadcast(metric=accuracy) ... # Return metric to be optimized or any metric to be logged return accuracy

from maggy import experiment result = experiment.lagom(train_fn=train_fn, name='MNIST') ```

lagom is a Swedish word meaning "just the right amount". This is how MAggy uses your resources.


Full documentation is available at maggy.ai


There are various ways to contribute, and any contribution is welcome, please follow the CONTRIBUTING guide to get started.


Issues can be reported on the official GitHub repo of Maggy.


Please see our publications on maggy.ai to find out how to cite our work.


lagom() got an unexpected keyword argument 'searchspace'

opened on 2022-07-15 05:17:03 by sreenathelloti

TypeError Traceback (most recent call last) in ----> 1 result = experiment.lagom(embeddings_computer, 2 searchspace=sp, 3 optimizer='randomsearch', 4 direction='max', 5 num_trials=2,

TypeError: lagom() got an unexpected keyword argument 'searchspace'

Maggy requirements too strict

opened on 2022-06-03 08:44:04 by robzor92 None

AttributeError: module 'maggy.experiment' has no attribute 'lagom'

opened on 2022-05-12 18:53:40 by CindyLu0406

pip install the latest version of maggy(version1.1.0), and run a simple maggy example and it is not working. import maggy from maggy import experiment ... result = experiment.lagom(train_fn=training_fn, name='MNIST')

returns AttributeError: module 'maggy.experiment' has no attribute 'lagom'


ModuleNotFoundError: No module named 'maggy.experiment_config'

opened on 2021-06-06 15:29:17 by crakama

I have can install maggy on my PySpark cluster from pip but whenever I issue this command from maggy.experiment_config import OptimizationConfig I get the error ModuleNotFoundError: No module named 'maggy.experiment_config'. Any idea of what could be happening. I am using JupiterLab with Python3 Kenrnel

from maggy import experiment says no module named hops.

The imports that I have seen working are from maggy.ablation import AblationStudy and from maggy import Searchspace

Edit: I noticed that I cannot use pre-release version (if it has a fix of this that is). I get this error when I try to install the pre-release version ERROR: Could not find a version that satisfies the requirement maggy==1.0.0rc0 (from versions: 0.0.1, 0.1, 0.1.1, 0.2, 0.2.1, 0.2.2, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.4.0, 0.4.1, 0.4.2, 0.5.0, 0.5.1, 0.5.2, 0.5.3) ERROR: No matching distribution found for maggy==1.0.0rc0

Run on Python kernel

opened on 2021-04-29 15:24:37 by RiccardoGrigoletto None

[Ablation] Develop a dataset generator function for databricks

opened on 2021-02-08 15:08:41 by RiccardoGrigoletto

As per https://github.com/logicalclocks/maggy/blob/master/maggy/ablation/ablationstudy.py , we need to write a custom function for the dataset generator to make AblationStudy to work on databricks.


Maggy 1.0.0rc0 2021-05-25 13:32:00

This is the first release candidate for the first major Maggy release 1.0.0.


This release contains many new features, which will be documented on maggy.ai.

These include: - Distribution transparency for distributed training, hyperparameter optimization and ablation studies. - Distributed training support for PyTorch, including DeepSpeed ZeRO - Distributed training support for TensorFlow, using MultiWorkerMirroredStrategy

Release 0.4.2 2020-05-20 14:14:36

This release changes the LICENSE to Apache V2.


  • apply Black code formatting
  • allow access to optimization direction in optimizer
  • Add IDLE message to allow for idle executors (in preparation for Bayesian Opt)
  • [ablation] Support for Keras custom models
  • Makes Searchspace a sorted iterable
  • Adapt to tensorboard 1.15


  • unpin numpy version dependency
  • remove ExperimentDriver from public API
  • [ablation] Fixes TF pickling issue

Release 0.4.0 2019-12-19 16:08:37


  • Adapts Maggy to the new Experiments V2 service in Hopsworks 1.1.0
  • Adds the TensorBoard HParams plugin
  • Jupyter Notebook gets versioned automatically
  • versioned resources are removed
  • Returns separate log files per trial
  • Improves Exception handling


  • Fixes bugs related to internal exceptions and handles exceptions more gracefully by returning a stacktrace

Release 0.3.3 2019-11-28 20:36:46



  • Fixes a bug when using custom dataset generators in the ablation api

Release 0.3.2 2019-11-08 11:01:21



  • Makes defaults coherent

Release 0.3.1 2019-11-08 10:21:08



  • Fixes a bug when running single trials, where defaults were not set properly
  • Fixes the way exceptions are thrown

Developers of Hopsworks

GitHub Repository Homepage

hyperparameter-optimization hyperparameter-search automl ablation spark hyperparameter-tuning blackbox-optimization ablation-studies ablation-study