Machine Learning Project Development Tool

carsdotcom, updated 🕥 2022-11-09 19:51:06

Machine Learning Project Development Tool

CircleCI token codecov Codacy Badge License: MIT Version PyPI PyPI - Downloads


Sean Shookman

Joao Moreira

Sherry Wang

Cody Hutchins

Kazi Tanzim Islam

Samuel Gaist

SanthoshBala18

About

Skelebot is a command-line tool for developing machine learning projects and executing them in Docker. The purpose of Skelebot is to simply make the life of a Data Scientist easier by doing a lot of the legwork for mundane tasks automatically through a unified, consistent interface.

``` [/code/my-iris-model] > skelebot -h usage: skelebot [-h] [-e ENV] [-s] [-n] {loadData,train,score,push,pull,jupyter,plugin,bump,prime,exec} ...

Iris Example Example Skelebot Project


Version: 1.1.0 Environment: None Skelebot Version: 1.8.5


positional arguments: {loadData,train,score,push,pull,jupyter,plugin,bump,prime,exec} loadData Load the Iris Dataset and save it into the data folder for the train job to access (src/loadData.py) train Use the data loaded in the loadData job to train the iris model (src/train.py) score Use the model that was built in the train job to score new data against the iris model (src/score.py) push Push an artifact to artifactory pull Pull an artifact from artifactory jupyter Spin up Jupyter in a Docker Container (port=8888, folder=.) plugin Install a plugin for skelebot from a local zip file bump Bump the skelebot.yaml project version prime Generate Dockerfile and .dockerignore and build the docker image exec Exec into the running Docker container

optional arguments: -h, --help show this help message and exit -v, --version Display the version number of Skelebot -e ENV, --env ENV Specify the runtime environment configurations -s, --skip-build Skip the build process and attempt to use previous docker build -n, --native Run natively instead of through Docker -c, --contact Display the contact email of the Skelebot project ```

Install

Install Skelebot with Pip:

pip install skelebot

Getting Started

To get started using Skelebot you can follow the Documentation.

Contributing

Anyone is welcome to make contributions to the project. If you would like to make a contribution, please read our Contributor Guide.

Versioning

This project adheres to Semantic Versioning. Please refer to the Changelog for information regarding the differences between versions of the project.

Issues

Version Override

opened on 2022-01-21 21:47:08 by sshookman

Resolves #212

Solution

Allows for the version value to be specified in the config yaml and override the VERSION file value, allowing for multiple versions on the same project using different env configs.

Version Override in Config

opened on 2022-01-21 20:02:29 by sshookman

Context

An issue that arose from a previous discussion has actually found its way into one of my projects, and I would like to use Skelebot to solve it. The issue revolve around versioning different parts of a Data Science project separately.

First, I built a model with a particular version and published it to S3. This code was then pushed to a branch and merged to master with that version as the overall project version. Later, I needed to make some updates to the scoring process, so I did that and updated the patch version at the same time. Unfortunately now, this creates an issue where my Python package is now looking for that particular version number for the model artifact instead of the old one.

I don't need to re-build the model, that code is unchanged, but I also want to increment the version since the scoring code did get updated. Skelebot itself already provides a way of pulling artifacts with the Latest Compatible Version (LCV) that could help in this situation, but gets more complicated as you consider versioning the Python package, the training code, and the results of training (models, encoders, etc.) separately since each is capable of being updated independently.

Feature

One step toward a solution for this issue would be to allow Skelebot users to override the VERSION file value with a value placed directly in the skelebot.yaml file. This would at least allow for multiple versions of the same project if used in different skelebot env files (skelebot-env.yaml).

While this doesn't completely solve the problem, specifically for Python packages that contain training and scoring code, this does help to alleviate it quite a bit for non-package DS projects.


Acceptance Criteria

  • [ ] The value for version in the Skelebot yaml file should take precedence over the VERSION file in the project
  • [ ] This should work for Skelebot env yaml files as well
  • [ ] Create Unit Tests
  • [ ] Update Documentation
  • [ ] Bump Minor Version

Ability to preview build context

opened on 2021-03-29 16:09:56 by salernoa

Context

Currently there is not an easy way to see what files will be included/excluded from the docker image.

Feature

Add command to preview build context prior to building final image.


Acceptance Criteria

  • [ ] Output included files from build context

Update build process

opened on 2020-10-17 22:13:22 by jagmoreira

Context

While working on #177 I noticed that skelebot uses setup.y [...] test which has been deprecated.

Not only that, while reading through the latest documentation of setuptools I learned that it's not recommend to even have a setup.py anymore: https://setuptools.readthedocs.io/en/latest/userguide/index.html

Feature

Deprecated features should be removed from the build process and testing could also be separate from installation imo (this would have the added benefit of removing 2 dependencies for end-users: coverage and pytest).


Acceptance Criteria

  • [ ] No deprecated commands or keywords on build process
  • [ ] Separate testing and installation dependencies

Automate DockerHub deployment

opened on 2020-04-07 20:30:14 by jagmoreira

Context

Currently, we must deploy any new skelebot images manually. We should automate this step.

Feature

This Github action provides a potential way to implement this: https://github.com/marketplace/actions/build-and-push-docker-images


Acceptance Criteria

  • [ ] Script/action that pushes all skelebot images to Dockerhub with proper tagging.

Releases

Version 1.33.4 2022-09-23 14:35:49

pip install skelebot==1.33.4


This Release Includes:

  • Scaffolding | Fixed a bug in scaffolding where templates were not having components loaded in the config section

Version 1.33.1 2022-09-19 14:02:01

pip install skelebot==1.33.1


This Release Includes:

  • S3Repo | Fixed a bug in S3Repo where the artifact version was not being parsed properly in some cases
  • Git Templates | Added the ability to load templates from Git repos in scaffolding
  • Push Prefix | Added the ability to push artifacts to S3 or Artifactory with prefix text

Version 1.31.0 2022-05-24 19:01:24

pip install skelebot==1.31.0


This Release Includes:

  • Dependencies | Added the ability for python dependencies to be specified and installed via a pyproject.toml file
  • CodeArtifact Dependencies | Create libs folder if it doesn't already exist

Version 1.30.0 2022-05-04 19:21:09

pip install skelebot==1.30.0


This Release Includes:

  • Docker Publish | Docker Publish allows for omitting version and LATEST tags

Version 1.29.0 2022-05-02 18:38:48

pip install skelebot==1.29.0


This Release Includes:

  • CodeArtifact Dependencies | Adds an option to pull CodeArtifact Python packages into a libs folder for install during docker build

Version 1.28.0 2022-04-15 19:22:41

pip install skelebot==1.28.0


This Release Includes: - In Memory Pull | Allows S3Repo class to pull artifacts and return them directly in python

python machine-learning data-science build-tool ai project-development cli hacktoberfest