Jupyter as a Service on FlashBlade

栏目: IT技术 · 发布时间: 5年前

内容简介：Jupyter notebooks are a popular tool for data scientists to explore datasets and experiment with model development. They enable developers to easily supplement code with analysis and visualizations.Rather than the historical practice of having users manage

Jupyter notebooks are a popular tool for data scientists to explore datasets and experiment with model development. They enable developers to easily supplement code with analysis and visualizations.

Rather than the historical practice of having users manage their own notebook servers, JupyterHub can be deployed by an organization to offer a centralized notebook platform. JupyterHub also enables infrastructure teams to give each user access to centralized storage for: shared datasets, scratch space, and a persistent IDE.

This blog post presents an example of deploying Jupyter-as-a-Service on Pure Storage FlashBlade . Users are able to create new notebook servers on the fly within a Kubernetes cluster with zero-touch provisioning. IT teams are able to manage efficient use of compute and storage resources across users.

JupyterHub

JupyterHub is used to manage and proxy multiple instances of the “single-user” Jupyter notebook server. It provides a public HTTP proxy on your network so users can login to a central landing page from their browser. Once a user logs in, JupyterHub spins up a server (pod) for that user. It reconnects to that user’s persistent storage. So, users can have stateful dev environments, but the compute nodes are only used as needed.

We’ll deploy JupyterHub as a Kubernetes service so it’s easily manageable as part of a cluster.

FlashBlade in a Kubernetes environment

FlashBlade is an excellent storage backend for JupyterHub for a few reasons.

First, it enables access to training datasets in-place, eliminating the need to copy datasets between nodes. Data scientists can perform training and testing of models using shared datasets with minimal data management.

Second, FlashBlade supports the Pure Service Orchestrator (PSO) , which fully automates creation and management of PersistentVolumes (PV) for applications in a Kubernetes cluster. PSO brings self-service to a JupyterHub deployment by eliminating manual storage administration for new users whose environments need persistent storage.

In fact, JupyterHub is just one of the many applications that together, form a complete AI platform for data scientists. All of these applications should be backed by the same, centralized storage for management simplicity and efficient data management.

Remove storage silos.

Installation

Prep Steps

helm repo add pure https://purestorage.github.io/helm-chartshelm repo add jupyterhub https://jupyterhub.github.io/helm-chart/

Each node in the cluster needs to have access to the datasets on FlashBlade. Mount the datasets folder directly to each cluster node at /datasets .

Deploy PSO

Customize:You’ll need a psovalues.yaml file that describes your FlashBlade array. The easiest thing to do is copy our default ./psovalues.yaml and adjust the “arrays” section.

Example customization:

arrays:
 FlashBlades:
 - MgmtEndPoint: "10.61.169.20" # CHANGE
 APIToken: "T-c4925090-c9bf-4033-8537-d24ee5669135" # CHANGE
 NFSEndPoint: "10.61.169.30" # CHANGE

Install:

helm install pure-storage-driver pure/pure-csi — namespace jhub -f ./psovalues.yaml

Deploy a PV for shared datasets

Customize:

The ./datasetpv.yaml file is used create a Persistent Volume Claim named “shared-ai-datasets”. Adjust datasetpv.yaml to use your FlashBlade Data VIP and filesystem name.

nfs:
 server: 10.61.169.100 # CHANGE to your data vip
 path: /datasets # CHANGE to your filesystem name

Install:

kubectl create -f datasetpv.yaml

Deploy JupyterHub

Customize:

The only change required for the jupvalues.yaml file is to add a security token. Generate a random hex string:

openssl rand -hex 32

Copy the output and, in your jupvalues.yaml file, replace the phrase SECRET_TOKEN with your generated string:

proxy:
 secretToken: 'SECRET_TOKEN' # CHANGE to 32-digit secret token

Install:

helm install jhub jupyterhub/jupyterhub — namespace jhub — version 0.8.2 -f jupyterhub/values.yaml

Use Jupyter notebooks!

JupyterHub is now ready for use.

Installing JupyterHub creates a proxy service that serves traffic for end users. The public address (proxy-public) can be found via:

> kubectl --namespace=jhub get svc proxy-publicNAME TYPE CLUSTER-IP EXTERNAL-IP 
proxy-public LoadBalancer 10.43.197.255. 10.61.169.60

When a user navigates to proxy-public’s external-IP address, they’ll get the JupyterHub login screen:

When Victor logs in, he has access to shared datasets (like cifar10 and openimages) as well as his home directory of personal notebooks, plots, and files.

Conclusion

Running JupyterHub as a service within a Kubernetes cluster is easy to deploy and manage. Data scientists not only have persistent storage backing their personal environments, but they also have access to all shared datasets without time-consuming data copying or complex data management.

Grab our code and try out these quick installation steps — and let us know how it goes! #PureStorage

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Jupyter as a Service on FlashBlade

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

高可用架构（第1卷）

高可用架构社区 / 电子工业出版社 / 2017-11-1 / 108.00元

《高可用架构（第1卷）》由数十位一线架构师的实践与经验凝结而成，选材兼顾技术性、前瞻性与专业深度。各技术焦点，均由极具代表性的领域专家或实践先行者撰文深度剖析，共同组成“高可用”的全局视野与领先高度，内容包括精华案例、分布式原理、电商架构等热门专题，及云计算、容器、运维、大数据、安全等重点方向。不仅架构师可以从中受益，其他IT、互联网技术从业者同样可以得到提升。一起来看看《高可用架构（第1卷）》这本书的介绍吧!

码农工具

Jupyter as a Service on FlashBlade

JupyterHub

FlashBlade in a Kubernetes environment

Installation

Prep Steps

Deploy PSO

Install:

Deploy a PV for shared datasets

Customize:

Install:

Deploy JupyterHub

Customize:

Install:

Use Jupyter notebooks!

Conclusion

高可用架构（第1卷）

图片转BASE64编码

正则表达式在线测试

RGB HSV 转换