chip-n-scale-queue-arranger
helps you run machine learning models over satellite imagery at scale. It is a collection of AWS CloudFormation templates deployed by kes
, lambda functions, and utility scripts for monitoring and managing the project.
Currently this is only deployed internally to Development Seed and we are refactoring a bit for easier reuse, modification, and deployment. Please excuse the dust and feel free to open an issue if you have any questions. The current build process for the lambda looks like:
sh
cd lambda
make build
which produces a package.zip file. This can eventually be built into another script.
python 3.7.x
node
yarn
(or npm
)yarn model
toolGeoJSON
, you can use geodex
, mercantile
, or tile-cover
config/cloudformation.template.yml
To create your own project, first install the node
dependencies:
sh
yarn install
Then add values to config/.env
and to config/config.yml
to configure your project. Samples for each are provided and you can find more information on the kes
documentation page.
Once these values are filled in, you can deploy the project to AWS (takes ~10 minutes):
```sh yarn deploy ... CF operation is in state of CREATE_COMPLETE
The stack test-stack is deployed or updated. - The database is available at: postgres://your-db-string - The queue is available at https://your-queue-url
Is this the first time setting up this stack? Run the following command to set up the database:
$ yarn setup postgres://your-db-string
β¨ Done in 424.62s. ```
This will return a database string to run a migration:
sh
yarn setup [DB_STRING]
If yarn deploy
fails on the first attempt, you'll need to run yarn delete
to remove the stack and start again. Otherwise the project will fail on newer updates indicating that it is in the state ROLLBACK_COMPLETE
. If the first deploy succeeds, you can make future updates by rerunning yarn deploy
.
By default, the cloudwatch logs are not tagged for resource tracking. To add Project
tags to the cloudwatch logs, run the following:
sh
yarn tag-logs
If you'd like to confirm the everything is deployed correctly (recommended), run:
sh
yarn verify
This will test a few portions of the deployed stack to ensure that it will function correctly. Once you're ready, begin pushing tile messages to the SQS queue.
Once the stack is deployed, you can kick off the prediction by adding messages to the SQS queue. Each individual message will look like:
json
{ "x": 1, "y": 2, "z": 3}
where x
, y
, and z
specify an individual map tile. Because pushing these messages into the queue quickly is important to running the prediction at scale, we've included a utility script to assist this process:
sh
yarn sqs-push [tiles.txt] [https://your-queue-url]
The first argument, tiles.txt
, is a line-delimited file containing your tile indices in the format x-y-z
and the second argument is the URL of your SQS Queue. If you have a lot of tiles to push to the queue, it's best to run this script in the background or on a separate computer. The maximum number of simultaneous inflight SQS requests can be set with the PROMISE_THRESHOLD
environment variable.
Once the processing is complete, you can pull down the stored results as a simple CSV file.
sh
DATABASE_URL='postgres://myusername:[email protected]:5432/ResultsDB' yarn download my_csv_filename.csv
You can then convert that CSV file to a geojson while thresholding on per-class ML confidence. For example, if you have a binary prediction and only want to keep tiles where confidence in class index 1 was 95% or greater, use something like:
sh
yarn convert-geojson my_csv_filename.csv my_thresholded_features.geojson --thresh_ind 1 --thresh 0.95
After the prediction is complete, you should download the data from the AWS RDS database. Then it's okay to delete the stack:
sh
yarn delete
The primary costs of running this stack come from Lambda Functions and GPU instances. The Lambdas parallelize the image downloading and database writing; The GPU instances provide the prediction capacity. To run the inference optimally, from a speed and cost perspective, these two resources need to be scaled in tandem. Roughly four scenarios can occur:
- Lambda concurrency is much higher than GPU prediction capacity. When too many Lambdas call the prediction endpoint at once, many of them will timeout and fail. The GPU instances will be fully utilized (good) but Lambda costs will be very high running longer and for more times than necessary. This will also hit the satellite imagery tile endpoint more times than needed. If this is happening, Lambda errors will be high, Lambda run time will be high, GPU utilization will be high, and SQS messages will show up in the dead letter queue. To fix it, lower the maximum Lambda concurrency or increase GPU capacity.
- Lambda concurrency is slightly higher than GPU prediction capacity. Similar to the above case, if the Lambda concurrency is slightly too high compared to GPU prediction throughput, the Lambdas will run for longer than necessary but not timeout. If this is happening, Lambda errors will be low, Lambda run time will be high, and GPU utilization will be high. To fix it, lower the maximum Lambda concurrency or increase GPU capacity.
- Lambda concurrency is lower than GPU prediction capacity. In this case, the Lambda monitoring metrics will look normal (low errors and low run time) but the GPU prediction instances have the capacity to predict many more images. To see this, run yarn gpu-util [ssh key]
which will show the GPU utilization of each instance/GPU in the cluster:
bash
$ yarn gpu-util ~/.ssh/my-key.pem
yarn run v1.3.2
$ node scripts/gpu-util.js ~/.ssh/my-key.pem
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β IP Address β Instance Type β GPU Utilization β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β 3.89.130.180 β p3.2xlarge β 5 % β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β 23.20.130.19 β p3.2xlarge β 2 % β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β 54.224.113.60 β p3.2xlarge β 3 % β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β 34.204.40.177 β p3.2xlarge β 12 % β
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
β¨ Done in 3.30s.
To fix this, increase the number of concurrent Lambdas or decrease the GPU capacity. (Note that by default, the security group on the instances won't accept SSH connections. To use gpu-util
, add a new rule to your EC2 security group)
High GPU utilization, low Lambda errors, and low Lambda run time. :ship:
Running machine learning inference at scale can be challenging. One bottleneck is that it's often hard to ingest/download images fast enough to keep a GPU fully utilized. This seeks to solve that bottleneck by parallelizing the imagery acquisition on AWS Lambda functions and running that separate from the machine learning predictions.
nvidia-docker
+ ECS enabled).Bumps knex from 0.16.3 to 2.4.0.
Sourced from knex's releases.
2.4.0
New features:
Bug fixes
- Insert array into json column #5321
- Fix unexpected max acquire-timeout #5377
- Fix: orWhereJson #5361
- MySQL: Add assertion for basic where clause not to be object or array #1227
- SQLite: Fix changing the default value of a boolean column in SQLite #5319
Typings:
- add missing type for 'expirationChecker' on PgConnectionConfig #5334
2.3.0
New features:
- PostgreSQL: Explicit jsonb support for custom pg clients #5201
- SQLite: Support returning with sqlite3 and better-sqlite3 #5285
- MSSQL: Implement mapBinding mssql dialect option #5292
Typings:
- Update types for TS 4.8 #5279
- Fix typo #5267
- Fix WhereJsonObject withCompositeTableType #5306
- Fix AnalyticFunction type #5304
- Infer specific column value type in aggregations #5297
2.2.0
New features:
- Inline primary key creation for postgres flavours #5233
- SQLite: Add warning for undefined connection file #5223
- MSSQL: Add JSON parameter support for connection #5200
Bug fixes:
- PostgreSQL: add primaryKey option for uuid #5212
Typings:
2.1.0 - 26 May, 2022
... (truncated)
Sourced from knex's changelog.
2.4.0 - 06 January, 2022
New features:
Bug fixes
- Insert array into json column #5321
- Fix unexpected max acquire-timeout #5377
- Fix: orWhereJson #5361
- MySQL: Add assertion for basic where clause not to be object or array #1227
- SQLite: Fix changing the default value of a boolean column in SQLite #5319
Typings:
- add missing type for 'expirationChecker' on PgConnectionConfig #5334
2.3.0 - 31 August, 2022
New features:
- PostgreSQL: Explicit jsonb support for custom pg clients #5201
- SQLite: Support returning with sqlite3 and better-sqlite3 #5285
- MSSQL: Implement mapBinding mssql dialect option #5292
Typings:
- Update types for TS 4.8 #5279
- Fix typo #5267
- Fix WhereJsonObject withCompositeTableType #5306
- Fix AnalyticFunction type #5304
- Infer specific column value type in aggregations #5297
2.2.0 - 19 July, 2022
New features:
- Inline primary key creation for postgres flavours #5233
- SQLite: Add warning for undefined connection file #5223
- MSSQL: Add JSON parameter support for connection #5200
Bug fixes:
- PostgreSQL: add primaryKey option for uuid #5212
Typings:
- Add promisable and better types #5222
... (truncated)
3475d81
Prepare to release 2.4.0e97f922
Bump tsd from 0.24.1 to 0.25.0 (#5396)e145322
1227: add assertion for basic where clause values (#5417)962bb0a
Bump sinon from 14.0.2 to 15.0.1 (#5413)ab45314
Add JSDoc (TS Flavour) to mjs stub file (#5390)72bd1f7
Fix: orWhereJson (#5361)4fc939a
Fixes unexpected max acquire-timeout (#5377)5c4837c
Fix lib/.gitignore path separator on Windows. (#5325)7dbbd00
Bump actions/setup-node from 3.4.1 to 3.5.1 (#5356)d39051f
fix: add missing type for 'expirationChecker' on PgConnectionConfig (#5334)This version was pushed to npm by kibertoad, a new releaser for knex since your current version.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps decode-uri-component from 0.2.0 to 0.2.2.
Sourced from decode-uri-component's releases.
v0.2.2
- Prevent overwriting previously decoded tokens 980e0bf
https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.1...v0.2.2
v0.2.1
- Switch to GitHub workflows 76abc93
- Fix issue where decode throws - fixes #6 746ca5d
- Update license (#1) 486d7e2
- Tidelift tasks a650457
- Meta tweaks 66e1c28
https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.1
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps moment from 2.24.0 to 2.29.4.
Sourced from moment's changelog.
2.29.4
- Release Jul 6, 2022
- #6015 [bugfix] Fix ReDoS in preprocessRFC2822 regex
2.29.3 Full changelog
2.29.2 See full changelog
- Release Apr 3 2022
Address https://github.com/moment/moment/security/advisories/GHSA-8hfj-j24r-96c4
2.29.1 See full changelog
- Release Oct 6, 2020
Updated deprecation message, bugfix in hi locale
2.29.0 See full changelog
- Release Sept 22, 2020
New locales (es-mx, bn-bd). Minor bugfixes and locale improvements. More tests. Moment is in maintenance mode. Read more at this link: https://momentjs.com/docs/#/-project-status/
2.28.0 See full changelog
- Release Sept 13, 2020
Fix bug where .format() modifies original instance, and locale updates
2.27.0 See full changelog
- Release June 18, 2020
Added Turkmen locale, other locale improvements, slight TypeScript fixes
2.26.0 See full changelog
- Release May 19, 2020
... (truncated)
000ac18
Build 2.24.4f2006b6
Bump version to 2.24.4536ad0c
Update changelog for 2.29.49a3b589
[bugfix] Fix redos in preprocessRFC2822 regex (#6015)6374fd8
Merge branch 'master' into developb4e6153
Revert "[bugfix] Fix redos in preprocessRFC2822 regex (#6015)"7aebb16
[bugfix] Fix redos in preprocessRFC2822 regex (#6015)57c9062
Build 2.29.3aaf50b6
Fixup release complaints26f4aef
Bump version to 2.29.3Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps minimist from 1.2.0 to 1.2.6.
7efb22a
1.2.6ef88b93
security notice for additional prototype pollution issuec2b9819
isConstructorOrProto adapted from PRbc8ecee
test from prototype pollution PRaeb3e27
1.2.5278677b
1.2.44cf1354
security notice1043d21
additional test for constructor prototype pollution6457d74
1.2.338a4d1c
even more aggressive checks for protocol pollutionDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps follow-redirects from 1.7.0 to 1.14.8.
3d81dc3
Release version 1.14.8 of the npm package.62e546a
Drop confidential headers across schemes.2ede36d
Release version 1.14.7 of the npm package.8b347cb
Drop Cookie header across domains.6f5029a
Release version 1.14.6 of the npm package.af706be
Ignore null headers.d01ab7a
Release version 1.14.5 of the npm package.40052ea
Make compatible with Node 17.86f7572
Fix: clear internal timer on request abort to avoid leakage2e1eaf0
Keep Authorization header on subdomain redirects.Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps node-fetch from 2.6.0 to 2.6.7.
Sourced from node-fetch's releases.
v2.6.7
Security patch release
Recommended to upgrade, to not leak sensitive cookie and authentication header information to 3th party host while a redirect occurred
What's Changed
- fix: don't forward secure headers to 3th party by
@βjimmywarting
in node-fetch/node-fetch#1453Full Changelog: https://github.com/node-fetch/node-fetch/compare/v2.6.6...v2.6.7
v2.6.6
What's Changed
- fix(URL): prefer built in URL version when available and fallback to whatwg by
@βjimmywarting
in node-fetch/node-fetch#1352Full Changelog: https://github.com/node-fetch/node-fetch/compare/v2.6.5...v2.6.6
v2.6.2
fixed main path in package.json
v2.6.1
This is an important security release. It is strongly recommended to update as soon as possible.
See CHANGELOG for details.
Sourced from node-fetch's changelog.
Changelog
All notable changes will be recorded here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
What's Changed
- core: update fetch-blob by
@βjimmywarting
in node-fetch/node-fetch#1371- docs: Fix typo around sending a file by
@βjimmywarting
in node-fetch/node-fetch#1381- core: (http.request): Cast URL to string before sending it to NodeJS core by
@βjimmywarting
in node-fetch/node-fetch#1378- core: handle errors from the request body stream by
@βmdmitry01
in node-fetch/node-fetch#1392- core: Better handle wrong redirect header in a response by
@βtasinet
in node-fetch/node-fetch#1387- core: Don't use buffer to make a blob by
@βjimmywarting
in node-fetch/node-fetch#1402- docs: update readme for TS
@βtypes/node-fetch
by@βadamellsworth
in node-fetch/node-fetch#1405- core: Fix logical operator priority to disallow GET/HEAD with non-empty body by
@βmaxshirshin
in node-fetch/node-fetch#1369- core: Don't use global buffer by
@βjimmywarting
in node-fetch/node-fetch#1422- ci: fix main branch by
@βdnalborczyk
in node-fetch/node-fetch#1429- core: use more node: protocol imports by
@βdnalborczyk
in node-fetch/node-fetch#1428- core: Warn when using data by
@βjimmywarting
in node-fetch/node-fetch#1421- docs: Create SECURITY.md by
@βJamieSlome
in node-fetch/node-fetch#1445- core: don't forward secure headers to 3th party by
@βjimmywarting
in node-fetch/node-fetch#1449New Contributors
@βmdmitry01
made their first contribution in node-fetch/node-fetch#1392@βtasinet
made their first contribution in node-fetch/node-fetch#1387@βadamellsworth
made their first contribution in node-fetch/node-fetch#1405@βmaxshirshin
made their first contribution in node-fetch/node-fetch#1369@βJamieSlome
made their first contribution in node-fetch/node-fetch#1445Full Changelog: https://github.com/node-fetch/node-fetch/compare/v3.1.0...v3.1.2
3.1.0
What's Changed
- fix(Body): Discourage form-data and buffer() by
@βjimmywarting
in node-fetch/node-fetch#1212- fix: Pass url string to http.request by
@βserverwentdown
in node-fetch/node-fetch#1268- Fix octocat image link by
@βlakuapik
in node-fetch/node-fetch#1281- fix(Body.body): Normalize
Body.body
into anode:stream
by@βjimmywarting
in node-fetch/node-fetch#924- docs(Headers): Add default Host request header to README.md file by
@βrobertoaceves
in node-fetch/node-fetch#1316- Update CHANGELOG.md by
@βjimmywarting
in node-fetch/node-fetch#1292- Add highWaterMark to cloned properties by
@βdavesidious
in node-fetch/node-fetch#1162- Update README.md to fix HTTPResponseError by
@βthedanfernandez
in node-fetch/node-fetch#1135- docs: switch
url
toURL
by@βdhritzkiv
in node-fetch/node-fetch#1318- fix(types): declare buffer() deprecated by
@βdnalborczyk
in node-fetch/node-fetch#1345- chore: fix lint by
@βdnalborczyk
in node-fetch/node-fetch#1348- refactor: use node: prefix for imports by
@βdnalborczyk
in node-fetch/node-fetch#1346- Bump data-uri-to-buffer from 3.0.1 to 4.0.0 by
@βdependabot
in node-fetch/node-fetch#1319- Bump mocha from 8.4.0 to 9.1.3 by
@βdependabot
in node-fetch/node-fetch#1339- Referrer and Referrer Policy by
@βtekwiz
in node-fetch/node-fetch#1057- Add typing for Response.redirect(url, status) by
@βc-w
in node-fetch/node-fetch#1169
... (truncated)
1ef4b56
backport of #1449 (#1453)8fe5c4e
2.x: Specify encoding as an optional peer dependency in package.json (#1310)f56b0c6
fix(URL): prefer built in URL version when available and fallback to whatwg (...b5417ae
fix: import whatwg-url in a way compatible with ESM Node (#1303)18193c5
fix v2.6.3 that did not sending query params (#1301)ace7536
fix: properly encode url with unicode characters (#1291)152214c
Fix(package.json): Corrected main file path in package.json (#1274)b5e2e41
update version number2358a6c
Honor the size
option after following a redirect and revert data uri support8c197f8
docs: Fix typos and grammatical errors in README.md (#686)This version was pushed to npm by endless, a new releaser for node-fetch since your current version.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.