Sean Shookman |
Joao Moreira |
Sherry Wang |
Cody Hutchins |
Kazi Tanzim Islam |
Samuel Gaist |
SanthoshBala18 |
Skelebot is a command-line tool for developing machine learning projects and executing them in Docker. The purpose of Skelebot is to simply make the life of a Data Scientist easier by doing a lot of the legwork for mundane tasks automatically through a unified, consistent interface.
``` [/code/my-iris-model] > skelebot -h usage: skelebot [-h] [-e ENV] [-s] [-n] {loadData,train,score,push,pull,jupyter,plugin,bump,prime,exec} ...
Iris Example Example Skelebot Project
Version: 1.1.0 Environment: None Skelebot Version: 1.8.5
positional arguments: {loadData,train,score,push,pull,jupyter,plugin,bump,prime,exec} loadData Load the Iris Dataset and save it into the data folder for the train job to access (src/loadData.py) train Use the data loaded in the loadData job to train the iris model (src/train.py) score Use the model that was built in the train job to score new data against the iris model (src/score.py) push Push an artifact to artifactory pull Pull an artifact from artifactory jupyter Spin up Jupyter in a Docker Container (port=8888, folder=.) plugin Install a plugin for skelebot from a local zip file bump Bump the skelebot.yaml project version prime Generate Dockerfile and .dockerignore and build the docker image exec Exec into the running Docker container
optional arguments: -h, --help show this help message and exit -v, --version Display the version number of Skelebot -e ENV, --env ENV Specify the runtime environment configurations -s, --skip-build Skip the build process and attempt to use previous docker build -n, --native Run natively instead of through Docker -c, --contact Display the contact email of the Skelebot project ```
Install Skelebot with Pip:
pip install skelebot
To get started using Skelebot you can follow the Documentation.
Anyone is welcome to make contributions to the project. If you would like to make a contribution, please read our Contributor Guide.
This project adheres to Semantic Versioning. Please refer to the Changelog for information regarding the differences between versions of the project.
Resolves #212
Allows for the version value to be specified in the config yaml and override the VERSION
file value, allowing for multiple versions on the same project using different env configs.
An issue that arose from a previous discussion has actually found its way into one of my projects, and I would like to use Skelebot to solve it. The issue revolve around versioning different parts of a Data Science project separately.
First, I built a model with a particular version and published it to S3. This code was then pushed to a branch and merged to master with that version as the overall project version. Later, I needed to make some updates to the scoring process, so I did that and updated the patch version at the same time. Unfortunately now, this creates an issue where my Python package is now looking for that particular version number for the model artifact instead of the old one.
I don't need to re-build the model, that code is unchanged, but I also want to increment the version since the scoring code did get updated. Skelebot itself already provides a way of pulling artifacts with the Latest Compatible Version (LCV) that could help in this situation, but gets more complicated as you consider versioning the Python package, the training code, and the results of training (models, encoders, etc.) separately since each is capable of being updated independently.
One step toward a solution for this issue would be to allow Skelebot users to override the VERSION file value with a value placed directly in the skelebot.yaml file. This would at least allow for multiple versions of the same project if used in different skelebot env files (skelebot-env.yaml).
While this doesn't completely solve the problem, specifically for Python packages that contain training and scoring code, this does help to alleviate it quite a bit for non-package DS projects.
version
in the Skelebot yaml file should take precedence over the VERSION file in the projectCurrently there is not an easy way to see what files will be included/excluded from the docker image.
Add command to preview build context prior to building final image.
While working on #177 I noticed that skelebot uses setup.y [...] test
which has been deprecated.
Not only that, while reading through the latest documentation of setuptools I learned that it's not recommend to even have a setup.py
anymore: https://setuptools.readthedocs.io/en/latest/userguide/index.html
Deprecated features should be removed from the build process and testing could also be separate from installation imo (this would have the added benefit of removing 2 dependencies for end-users: coverage
and pytest
).
Currently, we must deploy any new skelebot images manually. We should automate this step.
This Github action provides a potential way to implement this: https://github.com/marketplace/actions/build-and-push-docker-images
pip install skelebot==1.33.4
This Release Includes:
pip install skelebot==1.33.1
This Release Includes:
pip install skelebot==1.31.0
This Release Includes:
pyproject.toml
filepip install skelebot==1.30.0
This Release Includes:
pip install skelebot==1.29.0
This Release Includes:
pip install skelebot==1.28.0
This Release Includes: - In Memory Pull | Allows S3Repo class to pull artifacts and return them directly in python
python machine-learning data-science build-tool ai project-development cli hacktoberfest