Deploy SDK Part2!
原本想在上一篇一起解決的,不過最後範例程式碼架上選染效果後真的超級長的,最後決定來開一篇新的寫安裝步驟好了。
但是說這篇真的拖有夠久的,產出速度比不上待寫文章的新增速度,結果草稿越積越多 Orz
安裝篇 (圖片來源: Meet創業小聚)
按照文件看來安裝步驟似乎不難,不過按照以往經驗看來成不成功要看人品與緣份 XDDD
系統需求
在文件一開始列了落落長的系統需求,除了一些基本硬體需求外,還有指定了 K8S、 Helm 與 Docker 的版號:
- Kubernetes 1.15.4
- Docker 19.03.1
- NVIDIA Docker 2.2.0
- Helm 2.15.2
如果系統中未安裝這些需求,可以不用先行安裝,等等安裝時會幫忙一併安裝;但如果安裝的版號不合,可能需要先行移除,否則它們會跳過安裝。
安裝步驟
上張圖說明一下 Deploy SDK 的安裝流程:
安裝流程 (圖片來源: SDK 0.7.1 documentation)
- 下載並安裝 bootstrap
首先先登入 NGC,找到 Clara Deploy Bootstrap 並進行下載與解壓縮:1
2$ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_bootstrap/versions/0.7.1-2008.1/zip -O bootstrap.zip $ unzip bootstrap.zip -d bootstrap
完成下載後,進入資料夾執行腳本。這份腳本它將會安裝 Docker、 K8S …等所需求的軟體:
1
2$ cd bootstrap $ sudo ./bootstrap.sh
是說,如果不想登入 NGC 也行,登入與否其實不影響下載。不過還是建議登入下,否則很有機會在接下來的步驟中被它打斷,它實在很吵…
- 下載並安裝 CLI
接下來再去 NGC,找 Clara CLI 來下載與解壓縮:1
2
3
4
5
6
7
8
9$ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_cli/versions/0.7.1-2008.1/zip -O clara_cli.zip $ sudo unzip clara_cli.zip -d /usr/bin/ && sudo chmod 755 /usr/bin/clara* Archive: cli.zip inflating: /usr/bin/clara inflating: /usr/bin/clara-dicom inflating: /usr/bin/clara-monitor inflating: /usr/bin/clara-pull inflating: /usr/bin/clara-render
將檔案放到
/usr/bin/
下後,可以試著呼叫 clara 指令,驗證是否安裝成功:1
2$ clara version Clara CLI version: 0.7.1-12788.ae65aea0
-
配置 NGC 憑證
安裝 Clara CLI 須配置 NGC 憑證,稍等 Clara CLI 才能從 NGC Pull 相關 Helm Chart 以進行部署。這邊你須要拿到一把
NGC_API_KEY
。這次就必須一定要登入 NGC 了,登入後點選右上角頭像選單中的Setup
,並選擇Generate API Key
進入頁面後,會右上方有個 Generate API Key 的按鈕,點擊後就會產生
NGC_API_KEY
了。完成後回到終端輸入下列指令,可以考慮
orgteam
使用預設值就好,:1
2
3$ clara config --key NGC_API_KEY [--orgteam nvidia/clara] -y ✔ Yes Configuration "ngc-clara"successfully created
是說 successfully 的意思,是指你成功配置了憑證,但憑證是否能使用必須使用後才知道,可以試著使用
pull
指令來試試:1
2
3
4
5$ clara pull platform ✔ Yes Clara Platform 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara Hint: use "clara platform start" or "clara platform restart" to deploy pulled Clara Platform.
如果失敗可能會看到下列這樣的訊息:
1
2
3
4$ clara pull platform ✔ Yes Error: unable to fetch latest version information 401 Unauthorized
或是
1
2
3
4$ clara pull platform ✔ Yes Error: unable to fetch latest version information 403 Forbidden
-
啟動 Helm Chart
在上一篇提到 Helm Charts 時有提過,除了 Triton Inference Server 之外的 charts,都可以藉由這步驟啟動。DNVIDIA Clara Deploy Architecture(圖片來源: SDK 0.7.1 documentation)
因為 platform 的下載在剛剛測試 clara 的指令時已經順道完成了,所以這邊就直接啟動。
1
2
3
4$clara platform start Starting clara... NAME: clara Note: If there is a running instance of Clara Console, Clara Dicom Adapter or Clara Renderer, they should be restarted.
接下來下載 Clara Deploy Services 的 Helm Charts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24$ clara pull dicom ✔ Yes Clara Dicom Adapter 0.7.1-2008.1 Chart saved at: /home/.clara/charts/dicom-adapter Hint: use "clara dicom start" or "clara dicom restart" to deploy pulled Clara Dicom Adapter. $ clara pull render ✔ Yes Clara Renderer 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-renderer Hint: use "clara render start" or "clara render restart" to deploy pulled Clara Renderer. $ clara pull monitor ✔ Yes Clara Monitor Server 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-monitor-server Hint: use "clara monitor start" or "clara monitor restart" to deploy pulled Clara Monitor Server. $ clara pull console ✔ Yes Clara Management Console 0.7.1-2008.1 Chart saved at: /home/.clara/charts/clara-console Hint: use "clara console start" or "clara console restart" to deploy pulled Clara Management Console.
之後就可以試著啟動了:
1
2
3
4
5
6
7
8
9
10
11
12
13$ clara dicom start Starting DICOM Adapter... NAME: clara-dicom-adapter $ clara render start NAME: clara-render-server $ clara monitor start NAME: clara-monitor-server $ clara console start NAME: clara-console
驗證安裝
如果一切順利的話,跑完上面算是安裝完成了,你可以試著下 hlem ls
指令來觀察目前所啟動的 charts:
1 |
|
或是下 kubectl get pods
應該會看到下面這些 Pods:
- clara-clara-platformapiserver-
- clara-dicom-adapter-
- clara-monitor-server-fluentd-elasticsearch-
- clara-monitor-server-grafana-
- clara-monitor-server-monitor-server-
- clara-render-server-clara-renderer-
- clara-resultsservice-
- clara-ui-
- clara-console-
- clara-console-mongodb-
- clara-workflow-controller-
- elasticsearch-master-0
- elasticsearch-master-1
觀察 Pod 的變化
提到啟動的 Pod,有點好奇在每個 Chart 啟動時,會啟動的 Pod 有哪些。所以把整個 Clara 卸掉,重新安裝一次並觀察 Pod 的變化。
- 安裝前
1
2
3
4
5
6$ kubectl get all NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d23h $ kubectl get pods No resources found.
- clara platform start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30$ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-9b97b 1/1 Running 0 95s pod/clara-resultsservice-664477898f-zl8cr 1/1 Running 0 95s pod/clara-ui-6f89b97df8-fn2zm 1/1 Running 0 95s pod/clara-workflow-controller-69cbb55fc8-t67ns 1/1 Running 0 95s pod/fluentd-7n2b8 1/1 Running 0 95s pod/fluentd-ccnzw 1/1 Running 0 95s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.103.37.52 <none> 50051:31536/TCP 95s service/clara-resultsservice ClusterIP 10.108.91.220 <none> 8088/TCP 95s service/clara-ui ClusterIP 10.97.148.11 <none> 80/TCP 95s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9m52s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd 2 2 2 2 2 <none> 95s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 95s deployment.apps/clara-resultsservice 1/1 1 1 95s deployment.apps/clara-ui 1/1 1 1 95s deployment.apps/clara-workflow-controller 1/1 1 1 95s NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 95s replicaset.apps/clara-resultsservice-664477898f 1 1 1 95s replicaset.apps/clara-ui-6f89b97df8 1 1 1 95s replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 95s
- clara dicom start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40$ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-9b97b 1/1 Running 0 2m44s pod/clara-dicom-adapter-7948fcd445-rtbqr 1/1 Running 0 33s pod/clara-resultsservice-664477898f-zl8cr 1/1 Running 0 2m44s pod/clara-ui-6f89b97df8-fn2zm 1/1 Running 0 2m44s pod/clara-workflow-controller-69cbb55fc8-t67ns 1/1 Running 0 2m44s pod/fluentd-7n2b8 1/1 Running 0 2m44s pod/fluentd-ccnzw 1/1 Running 0 2m44s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) $ GE service/clara NodePort 10.103.37.52 <none> 50051:31536/TCP $ m44s service/clara-dicom-adapter NodePort 10.105.101.54 <none> 104:31289/TCP,5000:31880/TCP $ 3s service/clara-resultsservice ClusterIP 10.108.91.220 <none> 8088/TCP $ m44s service/clara-ui ClusterIP 10.97.148.11 <none> 80/TCP $ m44s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP $ 1m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd 2 2 2 2 2 <none> 2m44s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 2m44s deployment.apps/clara-dicom-adapter 1/1 1 1 33s deployment.apps/clara-resultsservice 1/1 1 1 2m44s deployment.apps/clara-ui 1/1 1 1 2m44s deployment.apps/clara-workflow-controller 1/1 1 1 2m44s NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 2m44s replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 33s replicaset.apps/clara-resultsservice-664477898f 1 1 1 2m44s replicaset.apps/clara-ui-6f89b97df8 1 1 1 2m44s replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 2m44s
- clara render start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46$ kubectl get all kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-gfwng 1/1 Running 0 24m pod/clara-dicom-adapter-7948fcd445-mv248 1/1 Running 0 20m pod/clara-render-server-clara-renderer-d79dd4779-f5hgd 2/3 CrashLoopBackOff 7 11m pod/clara-resultsservice-664477898f-2vsw9 1/1 Running 0 24m pod/clara-ui-6f89b97df8-c5p2f 1/1 Running 0 24m pod/clara-workflow-controller-69cbb55fc8-mc682 1/1 Running 0 24m pod/fluentd-ntl6q 1/1 Running 0 24m pod/fluentd-tvnrl 1/1 Running 0 24m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.101.135.71 <none> 50051:32455/TCP 24m service/clara-dicom-adapter NodePort 10.100.25.126 <none> 104:31985/TCP,500 0:30647/TCP 20m service/clara-renderer-clara-render-server NodePort 10.108.60.232 <none> 8070:30105/TCP,80 60:32006/TCP 11m service/clara-resultsservice ClusterIP 10.109.206.204 <none> 8088/TCP 24m service/clara-ui ClusterIP 10.101.195.91 <none> 80/TCP 24m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 27m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd 2 2 2 2 2 <none> 24m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 24m deployment.apps/clara-dicom-adapter 1/1 1 1 20m deployment.apps/clara-render-server-clara-renderer 0/1 1 0 11m deployment.apps/clara-resultsservice 1/1 1 1 24m deployment.apps/clara-ui 1/1 1 1 24m deployment.apps/clara-workflow-controller 1/1 1 1 24m NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 24m replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 20m replicaset.apps/clara-render-server-clara-renderer-d79dd4779 1 1 0 11m replicaset.apps/clara-resultsservice-664477898f 1 1 1 24m replicaset.apps/clara-ui-6f89b97df8 1 1 1 24m replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 24m
- clara monitor start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70$ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-gfwng 1/1 Running 0 40m pod/clara-dicom-adapter-7948fcd445-mv248 1/1 Running 0 36m pod/clara-monitor-server-fluentd-elasticsearch-dl7bj 1/1 Running 0 14m pod/clara-monitor-server-fluentd-elasticsearch-jxdk6 1/1 Running 0 14m pod/clara-monitor-server-grafana-5f874b974d-qvxgn 1/1 Running 0 14m pod/clara-monitor-server-monitor-server-59c8bf68f7-5rcg7 0/1 CrashLoopBackOff 7 14m pod/clara-render-server-clara-renderer-d79dd4779-f5hgd 2/3 CrashLoopBackOff 10 27m pod/clara-resultsservice-664477898f-2vsw9 1/1 Running 0 40m pod/clara-ui-6f89b97df8-c5p2f 1/1 Running 0 40m pod/clara-workflow-controller-69cbb55fc8-mc682 1/1 Running 0 40m pod/elasticsearch-master-0 1/1 Running 0 14m pod/elasticsearch-master-1 1/1 Running 0 14m pod/fluentd-ntl6q 1/1 Running 0 40m pod/fluentd-tvnrl 1/1 Running 0 40m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.101.135.71 <none> 50051:32455/TCP 40m service/clara-dicom-adapter NodePort 10.100.25.126 <none> 104:31985/TCP,500 0:30647/TCP 36m service/clara-monitor-server ClusterIP 10.111.167.160 <none> 50051/TCP 14m service/clara-monitor-server-grafana NodePort 10.100.148.116 <none> 80:32000[16/1632] 14m service/clara-renderer-clara-render-server NodePort 10.108.60.232 <none> 8070:30105/TCP,80 60:32006/TCP 27m service/clara-resultsservice ClusterIP 10.109.206.204 <none> 8088/TCP 40m service/clara-ui ClusterIP 10.101.195.91 <none> 80/TCP 40m service/elasticsearch-master ClusterIP 10.108.240.18 <none> 9200/TCP,9300/TCP 14m service/elasticsearch-master-headless ClusterIP None <none> 9200/TCP,9300/TCP 14m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 43m NAME DESIRED CURRENT READY UP-TO-DATE AVAI LABLE NODE SELECTOR AGE daemonset.apps/clara-monitor-server-fluentd-elasticsearch 2 2 2 2 2 <none> 14m daemonset.apps/fluentd 2 2 2 2 2 <none> 40m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 40m deployment.apps/clara-dicom-adapter 1/1 1 1 36m deployment.apps/clara-monitor-server-grafana 1/1 1 1 14m deployment.apps/clara-monitor-server-monitor-server 0/1 1 0 14m deployment.apps/clara-render-server-clara-renderer 0/1 1 0 27m deployment.apps/clara-resultsservice 1/1 1 1 40m deployment.apps/clara-ui 1/1 1 1 40m deployment.apps/clara-workflow-controller 1/1 1 1 40m NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 40m replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 36m replicaset.apps/clara-monitor-server-grafana-5f874b974d 1 1 1 14m replicaset.apps/clara-monitor-server-monitor-server-59c8bf68f7 1 1 0 14m replicaset.apps/clara-render-server-clara-renderer-d79dd4779 1 1 0 27m replicaset.apps/clara-resultsservice-664477898f 1 1 1 40m replicaset.apps/clara-ui-6f89b97df8 1 1 1 40m replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 40m NAME READY AGE statefulset.apps/elasticsearch-master 2/2 14m
- clara console start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81$ kubectl get all NAME READY STATUS RESTARTS AGE pod/clara-clara-platformapiserver-54c5c44bbd-gfwng 1/1 Running 0 61m pod/clara-console-8565b4d565-77jhc 2/2 Running 0 19m pod/clara-console-mongodb-85f8bd5f95-8nwqx 1/1 Running 0 19m pod/clara-dicom-adapter-7948fcd445-mv248 1/1 Running 0 58m pod/clara-monitor-server-fluentd-elasticsearch-dl7bj 1/1 Running 0 36m pod/clara-monitor-server-fluentd-elasticsearch-jxdk6 1/1 Running 0 36m pod/clara-monitor-server-grafana-5f874b974d-qvxgn 1/1 Running 0 36m pod/clara-monitor-server-monitor-server-59c8bf68f7-5rcg7 0/1 CrashLoopBackOff 11 36m pod/clara-render-server-clara-renderer-d79dd4779-f5hgd 2/3 CrashLoopBackOff 14 48m pod/clara-resultsservice-664477898f-2vsw9 1/1 Running 0 61m pod/clara-ui-6f89b97df8-c5p2f 1/1 Running 0 61m pod/clara-workflow-controller-69cbb55fc8-mc682 1/1 Running 0 61m pod/elasticsearch-master-0 1/1 Running 0 36m pod/elasticsearch-master-1 1/1 Running 0 36m pod/fluentd-ntl6q 1/1 Running 0 61m pod/fluentd-tvnrl 1/1 Running 0 61m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/clara NodePort 10.101.135.71 <none> 50051:32455/TCP 61m service/clara-console NodePort 10.99.119.217 <none> 8080:32002/TCP,50 00:32003/TCP 19m service/clara-console-mongodb ClusterIP 10.102.177.195 <none> 27017/TCP 19m service/clara-dicom-adapter NodePort 10.100.25.126 <none> 104:31985/TCP,500 0:30647/TCP 58m service/clara-monitor-server ClusterIP 10.111.167.160 <none> 50051/TCP 36m service/clara-monitor-server-grafana NodePort 10.100.148.116 <none> 80:32000/TCP 36m service/clara-renderer-clara-render-server NodePort 10.108.60.232 <none> 8070:30105/TCP,80 60:32006/TCP 48m service/clara-resultsservice ClusterIP 10.109.206.204 <none> 8088/TCP 61m service/clara-ui ClusterIP 10.101.195.91 <none> 80/TCP 61m service/elasticsearch-master ClusterIP 10.108.240.18 <none> 9200/TCP,9300/TCP 36m service/elasticsearch-master-headless ClusterIP None <none> 9200/TCP,9300/TCP 36m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 65m NAME DESIRED CURRENT READY UP-TO-DATE AVAI LABLE NODE SELECTOR AGE daemonset.apps/clara-monitor-server-fluentd-elasticsearch 2 2 2 2 2 <none> 36m daemonset.apps/fluentd 2 2 2 2 2 <none> 61m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clara-clara-platformapiserver 1/1 1 1 61m deployment.apps/clara-console 1/1 1 1 19m deployment.apps/clara-console-mongodb 1/1 1 1 19m deployment.apps/clara-dicom-adapter 1/1 1 1 58m deployment.apps/clara-monitor-server-grafana 1/1 1 1 36m deployment.apps/clara-monitor-server-monitor-server 0/1 1 0 36m deployment.apps/clara-render-server-clara-renderer 0/1 1 0 48m deployment.apps/clara-resultsservice 1/1 1 1 61m deployment.apps/clara-ui 1/1 1 1 61m deployment.apps/clara-workflow-controller 1/1 1 1 61m NAME DESIRED CURRENT READY AGE replicaset.apps/clara-clara-platformapiserver-54c5c44bbd 1 1 1 61m replicaset.apps/clara-console-8565b4d565 1 1 1 19m replicaset.apps/clara-console-mongodb-85f8bd5f95 1 1 1 19m replicaset.apps/clara-dicom-adapter-7948fcd445 1 1 1 58m replicaset.apps/clara-monitor-server-grafana-5f874b974d 1 1 1 36m replicaset.apps/clara-monitor-server-monitor-server-59c8bf68f7 1 1 0 36m replicaset.apps/clara-render-server-clara-renderer-d79dd4779 1 1 0 48m replicaset.apps/clara-resultsservice-664477898f 1 1 1 61m replicaset.apps/clara-ui-6f89b97df8 1 1 1 61m replicaset.apps/clara-workflow-controller-69cbb55fc8 1 1 1 61m NAME READY AGE statefulset.apps/elasticsearch-master 2/2 36m
當然,如果像我這個白目的安裝就沒有那順利了…
錯誤嘗試:部署 Clara Platform 與 啟動 Helm Chart
在環境要求的部分,對我來說比較麻煩的是 Kubernetes 與 Helm 的版號,因為我的伺服器環境是與組員共用,所以一開始我決定保留同事需要的環境來硬幹,試試能不能安裝成功,如果真的不行再來嘗試退版安裝。
所以這段如果只是要完成 Deploy SDK 安裝的可以跳過,這邊只是因為我的一時興起所產生的錯誤紀錄而已,當然如果想看我怎麼焦頭爛額的可以繼續往下拉。
恩…我先跟大家說最後的嘗試結果好了,我最後還是退版了。不過我有將過程的一些錯誤紀錄保留下來,看看之後還有沒機會回來再看,絕對不是因為單純湊數字 XDDD
這邊接續安裝步驟-配置 NGC 憑證,在完成 platform chart 的下載後,試著啟動 platform,得到了第一條錯誤訊息:
1 |
|
在 Stack Overflow 上看到了一條類似錯誤訊息的提問,似乎重新初始化 Helm 即可:
1 |
|
結果沒有 helm init
!
查詢了一下 Helm 所找不到的 Tiller 到底是啥,根據 smalltown 所說,在 Helm2 中,Tiller 是用來安裝與管理其他應用服務的 K8S 元件,簡單來說 Tiller 是一個用來與 K8S API Server 溝通的 Service,不過由於權限設置與管理的問題,在 Helm3 的推出後就走向歷史了。
很不幸的,我的 Helm 是 v3 的版本:
1 |
|
Helm2 與 Helm3 的變動已經屬於系統架構的變動,這個實在不好改。經過調查與論壇上發問,最後只好將 Helm 降版。我是透過 Binary Releases 安裝的方式,將版本降回到 v2.15.2。
降版後再次檢查 Helm 的版號,版號是正確了,但錯誤訊息依舊沒有消失:
1 |
|
不過版本都降了,helm init
指令應該可以使用了:
1 |
|
再次檢查 Helm 的版號,可以發現多出了 Server,用 kubectl 查看正在運行的 Pod,可以看到 Tiller 正在努力工作:
1 |
|
好了,排除 Tiller 的錯誤訊息後,重新 platform 的 Chart 後再重新 start 一次,看看會不會成功。
1 |
|
呃,又出現 rpc 的 error。看到錯誤訊息,幾個可能的猜測:
- Server 安裝配置的問題:會考慮這個是因為我的 Tiller 後來降版後我自己重起的。
- 權限問題:這個的機會比較大,在查資料的時候,幾乎碰到的是這個狀況。
算了先試試看 Helm2 文件中說的本地運行分 Tiller 試試看:
1 |
|
但文件中第二步驟連接到新的本地 Tiller 主機,看起來怪怪的,所還是沒做了,
直接放棄第一條路,先是試著處理權限問題好了,根據 Helm2 的 Role-based Access Control 說明與 GitHub 上的大神討論重新設定了連接,並 start platform:
1 |
|
喔耶! platform 起動後,就跟前面一樣來下載 Clara Deploy Services 的 Helm Charts:
1 |
|
最後是心驚膽戰的最後一步:
1 |
|
就知道沒這麼好過年的,是有看到有個 Issue 在討論這問題的,不過必須承認的是,這個討論超過我這個初學者對於 K8S 的掌握了,我才剛入門沒幾天阿(崩潰
問了論壇的人得到的回覆還是要我降版 K8S,所以最終只能鼻子摸摸開始降版了:
1 |
|
重新啟動剛剛失敗 monitor Chart:
1 |
|
參考資料
- Clara Deploy Platform。檢自 NVIDIA NGC (2021-02-02)。
- NVIDIA Taiwan (2020-11-02)。NVIDIA Clara Deploy。檢自 Youtube (2021-02-02)。
- 2. Installation。檢自 Clara Deploy SDK https://hackmd.io/0.7.3 documentation (2021-02-02)。
- Community (2019-05-18)。openshift - Helm: could not find tiller。檢自 Stack Overflow (2021-02-02)。
- smalltown (2020-05-17)。Helm 3 踹踹看。檢自 Starbugs Weekly 星巴哥技術專欄|Medium (2021-02-02)。
- godleon (2021-01-24)。[Kubernetes] Package Manager - Helm 簡介。檢自 小信豬的原始部落 (2021-02-02)。
- postak (2018-04-10)。forbidden: User “system:serviceaccount:kube-system:default” cannot get namespaces in the namespace “default。檢自 fnproject/fn-helm|GitHub (2021-02-02)。
- noprom (2017-11-12)。User “system:serviceaccount:kube-system:default” cannot get namespaces in the namespace “default”。檢自 helm/helm|GitHub (2021-02-02)。
- Helm v2。檢自 Helm 官網 (2021-02-02)。
- Nick (2019-10-12)。[Day27] k8s應用篇(一):Helm部署apps、HPA和CA。檢自 iT 邦幫忙 (2021-02-02)。
- Terrones-Oscar (2020-08-13)。helm fails Error: validation failed: [unable to recognize “”: no matches for kind “PodSecurityPolicy”]。檢自 helm/charts|GitHub (2021-02-02)。
- jckasper (2019-09-06)。Helm init fails on Kubernetes 1.16.0。檢自 helm/helm|GitHub (2021-02-02)。
- Zz Chen (2018-07-03)。Helm 部署在 GKE 上的權限問題。檢自 smalltowntechblog|Medium (2021-02-02)。
- MengYun (2019-10-27)。Not responding when running “clara render start”。檢自 NVIDIA Developer Forums (2021-02-02)。
更新紀錄
最後更新日期:2021-03-07
- 2021-03-07 發布
- 2021-02-03 完稿
- 2020-11-09 起稿