JupyterLab extension for Dask

dask, updated 🕥 2023-03-15 02:29:38

Dask JupyterLab Extension

Build Status Version Downloads Dependencies

This package provides a JupyterLab extension to manage Dask clusters, as well as embed Dask's dashboard plots directly into JupyterLab panes.

Dask Extension

Explanatory Video (5 minutes)

Dask + JupyterLab Screencast

Requirements

JupyterLab >= 1.0 distributed >= 1.24.1

Installation

To install the Dask JupyterLab extension you will need to have JupyterLab installed. For JupyterLab < 3.0, you will also need Node.js version >= 12. These are available through a variety of sources. One source common to Python users is the conda package manager.

bash conda install jupyterlab conda install -c conda-forge nodejs

JupyterLab 3.0 or greater

You should be able to install this extension with pip or conda, and start using it immediately, e.g.

bash pip install dask-labextension

JupyterLab 3.x

This extension includes both client-side and server-side components. Prior to JupyterLab 3.0 these needed to be installed separately, with node available on the machine.

The server-side component can be installed via pip or conda-forge:

bash pip install dask_labextension

bash conda install -c conda-forge dask-labextension

You then build the client-side extension into JupyterLab with:

bash jupyter labextension install dask-labextension

If you are running Notebook 5.2 or earlier, enable the server extension by running

bash jupyter serverextension enable --py --sys-prefix dask_labextension

Configuration of Dask cluster management

This extension has the ability to launch and manage several kinds of Dask clusters, including local clusters and kubernetes clusters. Options for how to launch these clusters are set via the dask configuration system, typically a .yml file on disk.

By default the extension launches a LocalCluster, for which the configuration is:

yaml labextension: factory: module: 'dask.distributed' class: 'LocalCluster' args: [] kwargs: {} default: workers: null adapt: null # minimum: 0 # maximum: 10 initial: [] # - name: "My Big Cluster" # workers: 100 # - name: "Adaptive Cluster" # adapt: # minimum: 0 # maximum: 50

In this configuration, factory gives the module, class name, and arguments needed to create the cluster. The default key describes the initial number of workers for the cluster, as well as whether it is adaptive. The initial key gives a list of initial clusters to start upon launch of the notebook server.

In addition to LocalCluster, this extension has been used to launch several other Dask cluster objects, a few examples of which are:

  • A SLURM cluster, using

yaml labextension: factory: module: 'dask_jobqueue' class: 'SLURMCluster' args: [] kwargs: {}

  • A PBS cluster, using

yaml labextension: factory: module: 'dask_jobqueue' class: 'PBSCluster' args: [] kwargs: {}

yaml labextension: factory: module: dask_kubernetes class: KubeCluster args: [] kwargs: {}

Configuring a default layout

This extension can store a default layout for the Dask dashboard panes, which is useful if you find yourself reaching for the same dashboard charts over and over. You can launch the default layout via the command palette, or by going to the File menu and choosing "Launch Dask Dashboard Layout".

Default layouts can be configured via the JupyterLab config system (either using the JSON editor or the user interface). Specify a layout by writing a JSON object keyed by the individual charts you would like to open. Each chart is opened with a mode, and a ref. mode refers to how the chart is to be added to the workspace. For example, if you want to split a panel and add the new one to the right, choose split-right. Other options are split-top, split-bottom, split-left, tab-after, and tab-before. ref refers to the panel to which mode is applied, and might be the names of other dashboard panels. If ref is null, the panel in question is added at the top of the layout hierarchy.

A concrete example of a default layout is

json { "individual-task-stream": { "mode": "split-right", "ref": null }, "individual-workers-memory": { "mode": "split-bottom", "ref": "individual-task-stream" }, "individual-progress": { "mode": "split-right", "ref": "individual-workers-memory" } }

which adds the task stream to the right of the workspace, then adds the worker memory chart below the task stream, then adds the progress chart to the right of the worker memory chart.

Development install

As described in the JupyterLab documentation for a development install of the labextension you can run the following in this directory:

bash jlpm # Install npm package dependencies jlpm build # Compile the TypeScript sources to Javascript jupyter labextension develop . --overwrite # Install the current directory as an extension

To rebuild the extension:

bash jlpm build

You should then be able to refresh the JupyterLab page and it will pick up the changes to the extension.

To run an editable install of the server extension, run

bash pip install -e . jupyter serverextension enable --sys-prefix dask_labextension

Publishing

This application is distributed as two subpackages.

The JupyterLab frontend part is published to npm, and the server-side part to PyPI.

Releases for both packages are done with the jlpm tool, git and Travis CI.

Note: Package versions are not prefixed with the letter v. You will need to disable this.

console $ jlpm config set version-tag-prefix ""

Making a release

console $ jlpm version [--major|--minor|--patch] # updates package.json and creates git commit and tag $ git push upstream main && git push upstream main --tags # pushes tags to GitHub which triggers Travis CI to build and deploy

Issues

Bump webpack from 5.73.0 to 5.76.1

opened on 2023-03-15 02:29:38 by dependabot[bot]

Bumps webpack from 5.73.0 to 5.76.1.

Release notes

Sourced from webpack's releases.

v5.76.1

Fixed

  • Added assert/strict built-in to NodeTargetPlugin

Revert

v5.76.0

Bugfixes

Features

Security

Repo Changes

New Contributors

Full Changelog: https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0

v5.75.0

Bugfixes

  • experiments.* normalize to false when opt-out
  • avoid NaN%
  • show the correct error when using a conflicting chunk name in code
  • HMR code tests existance of window before trying to access it
  • fix eval-nosources-* actually exclude sources
  • fix race condition where no module is returned from processing module
  • fix position of standalong semicolon in runtime code

Features

  • add support for @import to extenal CSS when using experimental CSS in node

... (truncated)

Commits
  • 21be52b Merge pull request #16804 from webpack/chore-patch-release
  • 1cce945 chore(release): 5.76.1
  • e76ad9e Merge pull request #16803 from ryanwilsonperkin/revert-16759-real-content-has...
  • 52b1b0e Revert "Improve performance of hashRegExp lookup"
  • c989143 Merge pull request #16766 from piranna/patch-1
  • 710eaf4 Merge pull request #16789 from dmichon-msft/contenthash-hashsalt
  • 5d64468 Merge pull request #16792 from webpack/update-version
  • 67af5ec chore(release): 5.76.0
  • 97b1718 Merge pull request #16781 from askoufis/loader-context-target-type
  • b84efe6 Merge pull request #16759 from ryanwilsonperkin/real-content-hash-regex-perf
  • Additional commits viewable in compare view
Maintainer changes

This version was pushed to npm by evilebottnawi, a new releaser for webpack since your current version.


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dask/dask-labextension/network/alerts).

Using +NEW freezes Jupyterlab completely

opened on 2023-01-19 10:57:00 by michaelaye

Describe the issue: After pressing +NEW button to start a cluster:

  • The launching never finishes, seemingly stalls.
  • I cannot CTRL-C the Jupyter lab server in the terminal any more.
  • I cannot open existing notebooks anymore.
  • I do not get any log related to having tried to launch a dask cluster in the terminal.
  • I still can get out of the tmux (the detach hotkey still works, so that i can kill the session that way.

As usual, this "WORKED BEFORE" (TM) on this machine, but I hadn't used it for several weeks. :(

Minimal Complete Verifiable Example:

  • create a new env with python 3.11
  • mamba install jupyterlab dask_labextension
  • run jupyter lab . in a folder with work projects (sub folders with notebooks in them)

Anything else we need to know?:

I saw that a new py311 env automatically gets bokeh > 3, so I mamba-installed bokeh<3 to make dashboards work.

Environment:

  • Dask version: 2023.1.0 (labextension 6.0.0)
  • Python version: 3.11
  • Operating System: Linux CentOS 7.5.1804 (core)
  • Install method (conda, pip, source): conda

Help connecting to existing KubeCluster using the build-in Discovery Mechanism

opened on 2023-01-10 15:00:01 by jerrygb

Describe the issue:

I am able to create clusters, connect using dask clients and perform Dask operations without issues using KubeCluster Operator on a Notebook. I am also able to connect to the status dashboard using port-forwarding to the scheduler.

However, I am not able to connect to these clusters when using the lab extensions. When I try to move to an active notebook and click search on the Dask Lab-extension, it does picks up a remote cluster address. The Dashboard URLs that are picked up the extension code look like,

http://internal-scheduler.namespace:8787/

But, I think the extension is not able to connect to it. I do not see any logs pertaining to this action.

Do these dashboards need to be external (meaning are these connections made from browser or backend service)? Since I was not sure about this, I tried setting up AWS NLB. I tried connecting to the NLB address using the Client as seen in the second snippet below.

Minimal Complete Verifiable Example:

All of the following code snippets work fine from the notebook.

```python

Create a cluster

from dask_kubernetes.operator import make_cluster_spec, make_worker_spec from dask_kubernetes.operator import KubeCluster from dask.distributed import Client import dask.dataframe as dd import os profile_name = namespace_name

custom_spec = make_cluster_spec(name=profile_name, image='ghcr.io/dask/dask:latest', resources={"requests": {"memory": "512Mi"}, "limits": {"cpu": "4","memory": "8Gi"}})

custom_spec['spec']['scheduler']['spec']['serviceAccount'] = 'default-editor' custom_spec['spec']['worker']['spec']['serviceAccount'] = 'default-editor'

custom_worker_spec = make_worker_spec(image='ghcr.io/dask/dask:latest', n_workers=6, resources={"requests": {"memory": "512Mi"}, "limits": {"memory": "12Gi"}}) custom_worker_spec['spec']['serviceAccount'] = 'default-editor' custom_worker_spec cluster = KubeCluster(custom_cluster_spec=custom_spec, n_workers=0) cluster.add_worker_group(name='highmem', custom_spec=custom_worker_spec) ```

As mentioned, let's assume that I have AWS NLB type LoadBalancer/Ingress Service. Then the Dask Client is able to successfully interact against 8787 and 8786 ports on the scheduler in order to manage the workers and jobs, externally.

```python

Connect to external endpoint works fine

import dask; from dask.distributed import Client dask.config.set({'scheduler-address': 'tcp://nlb-address.region.elb.amazonaws.com:8786'}) client = Client() ```

Anything else we need to know?:

Another thing noticed was that the dask-extension relies on testDaskDashboard function to pick up the URL info (defined in https://github.com/dask/dask-labextension/blob/main/src/dashboard.tsx#L588),

In the console, I can see,

Found the dashboard link at 'http://internal-scheduler.namespace:8787/status'

However, the consequent dashboard-check request to the backend is replacing an extra / from protocol.

See the GET request below,

GET https://website/notebook/internal/test-dask-1/dask/dashboard-check/http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491

To be a bit more verbose,

http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491 translates to http:/internal-scheduler.namespace:8787/?1673363416491

I am not sure if this is expected or a bug.

Environment:

  • Dask version: 2022.12.1
  • Dask Kubernetes: 2022.12.0
  • @dask-labextension: v6.0.0
  • @jupyterlab/server-proxy: v3.2.2
  • Python version: 3.8.10
  • Platform: Kubeflow
  • Install method (conda, pip, source): pip

Need to press the new "Launch in JupyterLab" button twice

opened on 2022-11-17 16:40:28 by jrbourbeau

Recently we added a new button for launching a default set of dashboard plots from the Client HTML repr :tada: (xref https://github.com/dask/dask-labextension/pull/248). When using the button today I noticed I needed to press the button twice to have the dashboard plots launched

cc @ian-r-rose

Can't connect to local cluster using local IP and port

opened on 2022-10-12 15:53:52 by samanake

I started a new cluster hosted on http://127.0.01/:33459 using the +NEW button, and it appropriately shows the many options for managing/monitoring the cluster. If I try to search for http://127.0.0.1/33459 using the search bar at the top, nothing comes up. It instead connects to dask/dashboard/511c94dc-3b49-4f8b-95be-f30f959a41aa. Is there some network proxy I should be aware of?

Also, as a side-note question, can dask_cuda.LocalCUDACluster be used to integrate GPUs in the cluster when the module is added to the .yml file for cluster customization?

Dashboard shows cluster of different user

opened on 2022-09-05 14:20:46 by SofianeB

What happened:

We are running a jupyterhub for HPC users. I noticed when opening jupyterlab that the Dask Dashboard is activated and shows a running cluster on port 8787even if I didn't start a cluster before. After checking who is using that port it is actually a different user. How can that be possible? the should not be possible right (security perspective)? What you expected to happen:

Anything else we need to know?:

Environment:

  • Dask version: 2022.02.0
  • Dask-Labextension version: 5.2.0
  • Python version: 3.9
  • Operating System: CentOS 8
  • Install method (conda, pip, source): conda
Cluster Dump State:

Releases

2023-02-19 00:28:03

This is a minor release which contains a fix for dashboard URL construction logic in the presence of query parameters. Now query parameters reported by a Cluster implementation in the dashboard URL are correctly propagated to individual dashboard panes. This is relevant for cases where things like authentication tokens are included in a URL. Thanks @ntabris for the contribution in #258.

6.0.0 2022-11-02 00:22:40

This is a major release that includes:

  1. A new system for configuring default dashboard layouts (#247). This is accessible from the File menu as well as the command palette, and allows you to launch a set of dashboard panels for a cluster with a single command. For more information see this explanation.
  2. A bugfix that prevented Dask Dashboard charts from rendering properly when using JupyterLab's dark themes (#243)

5.3.0 2022-06-21 23:17:56

This minor release for dask-labextension includes a few new features and bugfixes.

  1. The extension now uses the new Dask logo and branding. For more discussion, see this issue.
  2. There is a new configuration option browserDashboardCheck which can force the extension to check for a Dask dashboard using the frontend browser session (rather than a request made on the server side). This can be useful in cases where a browser cookie is needed to authenticate to the dashboard. Thanks to new contributor @viniciusdc for implementing the feature!
  3. There is an error message which suggests the user might have a misconfigured/not-installed serverextension if it is not found. This was a little overeager, resulting in some false positives. In this release it should be a little slower to show the message. (cf https://github.com/dask/dask-labextension/pull/237)

5.2.0 2022-01-08 00:53:48

This is release for dask-labextension contains a few bugfixes and minor enhancements:

  1. Fixes a bug where polling the dashboard wouldn't back off if the extension loses its connection (#220)
  2. Adds an option to hide the cluster manager user interface (#219) for deployments where it does not make sense. Note: this does not remove the underlying REST API, it only hides the frontend UI elements.
  3. Fixes a layout restoration bug where, if an active cluster is not found, a dashboard panel is closed. This prevents pre-created layouts from being distributed in the absence of a running cluster (#217).

5.0.2 2021-06-02 23:43:52

Contains a fix for an issue in which continued polling of the dask dashboard could result in unbounded memory increases on the scheduler.

2021-01-13 00:56:33

This is the first release of dask-labextension that supports JupyterLab 3.0. There are no changes in functionality, but installation should be significantly easier. No more nodejs, no more rebuilding the application. You can install with just pip:

pip install jupyterlab pip install dask-labextension