Sharing Updatable Models (SUM) on Blockchain

microsoft, updated πŸ•₯ 2023-02-03 14:27:33

Sharing Updatable Models (SUM) on Blockchain

(formerly Decentralized & Collaborative AI on Blockchain)

Animated logo for the project. A neural network appears on a block. The nodes change color until finally converging. The block slides away on a chain and the process restarts on the next blank block.

| Demo | Simulation | Security | |:-:|:-:|:-:| | Demo: Test | Simulation: Test | Build Status |

Sharing Updatable Models (SUM) on Blockchain is a framework to host and train publicly available machine learning models. Ideally, using a model to get a prediction is free. Adding data consists of validation by three steps as described below.

Picture of a someone sending data to the addData method in CollaborativeTrainer which sends data to the 3 main components as further described next.

  1. The IncentiveMechanism validates the request to add data, for instance, in some cases a "stake" or deposit is required. In some cases, the incentive mechanism can also be triggered later to provide users with payments or virtual "karma" points.
  2. The DataHandler stores data and meta-data on the blockchain. This ensures that it is accessible for all future uses, not limited to this smart contract.
  3. The machine learning model is updated according to predefined training algorithms. In addition to adding data, anyone can query the model for predictions for free.

The basics of the framework can be found in our blog post. A demo of one incentive mechanism can be found here. More details can be found in the initial paper describing the framework, accepted to Blockchain-2019, The IEEE International Conference on Blockchain.

This repository contains: * Demos showcasing some proof of concept systems using the Ethereum blockchain. There is a locally deployable test blockchain and demo dashboard to interact with smart contracts written in Solidity. * Simulation tools written in Python to quickly see how models and incentive mechanisms would work when deployed.

Picture of a QR code with aka.ms/0xDeCA10B written in the middle.

FAQ/Concerns

Aren't smart contracts just for simple code?

There are many options. We can restrict the framework to simple models: Perceptron, Naive Bayes, Nearest Centroid, etc. We can also combine off-chain computation with on-chain computation in a few ways such as: * encoding off-chain to a higher dimensional representation and just have the final layers of the model fine-tuned on-chain, * using secure multiparty computation, or * using external APIs, or as they are called the blockchain space, oracles, to train and run the model

We can also use algorithms that do not require all models parameters to be updated (e.g. Perceptron). We hope to inspire more research in efficient ways to update more complex models.

Some of those proposals are not in the true spirit of this system which is to share models completely publicly but for some applications they may be suitable. At least the data would be shared so others can still use it to train their own models.

Will transaction fees be too high?

Fees in Ethereum are low enough for simple models: a few cents as of July 2019. Simple machine learning models are good for many applications. As described the previous answer, there are ways to keep transactions simple. Fees are decreasing: Ethereum is switching to proof of stake. Other blockchains may have lower or possibly no fees.

What about storing models off-chain?

Storing the model parameters off-chain, e.g. using IPFS, is an option but many of the popular solutions do not have robust mirroring to ensure that the model will still be available if a node goes down. One of the major goals of this project is to share models and improve their availability, the easiest way to do that now is to have the model stored and trained in a smart contract.

We're happy to make improvements! If you do know of a solution that would be cheaper and more robust than storing models on a blockchain like Ethereum then let us know by filing an issue!

What if I just spam bad data?

This depends on the incentive mechanism (IM) chosen but essentially, you will lose a lot of money. Others will notice the model is performing badly or does not work as expected and then stop contributing to it. Depending on the IM, such as in Deposit, Refund, and Take: Self-Assessment, others that already submitted "good" data will gladly take your deposits without submitting any more data.

Furthermore, people can easily automatically correct your data using techniques from unsupervised learning such as clustering. They can then use the data offline for their own private model or even deploy a new collection system using that model.

What if no one gives bad data, then no one can profit?

That’s great! This system will work as a source for quality data and models. People will contribute data to help improve the machine learning models they use in their daily life.

Profit depends on the incentive mechanism (IM). Yes, in Deposit, Refund, and Take: Self-Assessment, the contributors will not profit and should be able to claim back their own deposits. In the Prediction Market based mechanism, contributors can still get rewarded by the original provider of the bounty and test set.

Learn More

Papers

More details can be found in our initial paper, Decentralized & Collaborative AI on Blockchain, which describes the framework, accepted to Blockchain-2019, The IEEE International Conference on Blockchain.

An analysis of several machine learning models with the self-assessment incentive mechanism can be found in our second paper, Analysis of Models for Decentralized and Collaborative AI on Blockchain, which was accepted to The 2020 International Conference on Blockchain.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Issues

build(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /demo

opened on 2023-02-03 14:27:32 by dependabot[bot]

Bumps http-cache-semantics from 4.1.0 to 4.1.1.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/0xDeCA10B/network/alerts).

build(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /demo/client

opened on 2023-02-03 05:11:42 by dependabot[bot]

Bumps http-cache-semantics from 4.1.0 to 4.1.1.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/0xDeCA10B/network/alerts).

http://localhost:5006/simulate_imdb_perceptron is a blank page

opened on 2022-08-09 11:37:51 by Hongchenglong

When I visit http://localhost:5006/simulate_imdb_perceptron, the page is blank. Is this normal? If not, how can I fix it?

[demo] Bump truffle and other versions

opened on 2022-06-22 18:56:06 by juharris

Also bump some dependencies and add more troubleshooting steps.

[simulation] Use normalization like in the demo.

opened on 2021-11-30 20:20:15 by juharris

In the demo, normalization is done before updating the dense perceptron and dense nearest centroid classifiers, but the simulation doesn't use normalization.

Normalization is used to avoid submissions with large values that would corrupt the model. Using normalized vectors might be bad for these models but it seems like a necessary guard to have on-chain.

[demo] Use and store more model meta-data in local storage

opened on 2021-03-04 00:30:05 by juharris

So that the values don't have to be hidden when online safety is enabled.

  • [ ] classifications
  • [ ] encoder

Releases

Demo 1.3.0 2020-10-27 04:39:15

Demo

  • Improve accessibility
  • Add clarifications and disclaimers
  • Add meta-data storage options
  • Add online safety options to hide external text (text in smart contracts)
  • Add pagination for models and data (#65)
  • Store model name, description, and encoder type in the smart contract
  • Use notification toasts more (#54)
  • Add point based incentive mechanism (no deposit required)
  • Optimize Sparse Nearest Centroid Classifier (#77)
  • Add page to store the meta-data for a model that has already been deployed so that you can bookmark it within the dashboard and store your notes about it
  • Allow interacting with models not listed
  • UI improvements
  • Add encoders: Word hashing (MurmurHash3), none, and decimal mapping (#88, #91, #104, #106)
  • Add and improve model deployment code robustness

Demo 1.2.0 2019-11-28 18:58:43

  • Much nicer UI
  • Ability to add models in the UI
  • Add option to use local storage instead of the back end service with a database

Demo 1.1.0 2019-07-31 21:50:53

Add a Naive Bayes classifier.

Microsoft

Open source projects and samples from Microsoft

GitHub Repository Homepage

blockchain ml ai economics machine-learning artificial-intelligence ethereum truffle prediction-mar prediction-market python node react smart-contracts