【Survey】 NVIDIA Clara Deploy SDK - 安裝篇

Deploy SDK Part2!

原本想在上一篇一起解決的,不過最後範例程式碼架上選染效果後真的超級長的,最後決定來開一篇新的寫安裝步驟好了。

但是說這篇真的拖有夠久的,產出速度比不上待寫文章的新增速度,結果草稿越積越多 Orz

安裝篇
安裝篇 (圖片來源: Meet創業小聚



按照文件看來安裝步驟似乎不難,不過按照以往經驗看來成不成功要看人品與緣份 XDDD

系統需求

在文件一開始列了落落長的系統需求,除了一些基本硬體需求外,還有指定了 K8S、 Helm 與 Docker 的版號:

  • Kubernetes 1.15.4
  • Docker 19.03.1
  • NVIDIA Docker 2.2.0
  • Helm 2.15.2

如果系統中未安裝這些需求,可以不用先行安裝,等等安裝時會幫忙一併安裝;但如果安裝的版號不合,可能需要先行移除,否則它們會跳過安裝。



安裝步驟

上張圖說明一下 Deploy SDK 的安裝流程:

安裝流程
安裝流程 (圖片來源: SDK 0.7.1 documentation


  1. 下載並安裝 bootstrap
    首先先登入 NGC,找到 Clara Deploy Bootstrap 並進行下載與解壓縮:
    1
    2
     $ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_bootstrap/versions/0.7.1-2008.1/zip -O bootstrap.zip
     $ unzip bootstrap.zip -d bootstrap
    

    完成下載後,進入資料夾執行腳本。這份腳本它將會安裝 Docker、 K8S …等所需求的軟體:

    1
    2
     $ cd bootstrap
     $ sudo ./bootstrap.sh
    

    是說,如果不想登入 NGC 也行,登入與否其實不影響下載。不過還是建議登入下,否則很有機會在接下來的步驟中被它打斷,它實在很吵…


  2. 下載並安裝 CLI
    接下來再去 NGC,找 Clara CLI 來下載與解壓縮:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    $ wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_cli/versions/0.7.1-2008.1/zip -O clara_cli.zip
        
    $ sudo unzip clara_cli.zip -d /usr/bin/ && sudo chmod 755 /usr/bin/clara*
    Archive:  cli.zip
    inflating: /usr/bin/clara
    inflating: /usr/bin/clara-dicom
    inflating: /usr/bin/clara-monitor
    inflating: /usr/bin/clara-pull
    inflating: /usr/bin/clara-render
    

    將檔案放到 /usr/bin/ 下後,可以試著呼叫 clara 指令,驗證是否安裝成功:

    1
    2
    $ clara version
    Clara CLI version: 0.7.1-12788.ae65aea0
    


  3. 配置 NGC 憑證
    安裝 Clara CLI 須配置 NGC 憑證,稍等 Clara CLI 才能從 NGC Pull 相關 Helm Chart 以進行部屬。

    這邊你須要拿到一把 NGC_API_KEY。這次就必須一定要登入 NGC 了,登入後點選右上角頭像選單中的 Setup,並選擇 Generate API Key

    Generate API Key


    進入頁面後,會右上方有個 Generate API Key 的按鈕,點擊後就會產生 NGC_API_KEY 了。

    Generate API Key-2


    完成後回到終端輸入下列指令,可以考慮 orgteam 使用預設值就好,:

    1
    2
    3
     $ clara config --key NGC_API_KEY [--orgteam nvidia/clara] -y    
     ✔ Yes
     Configuration "ngc-clara"successfully created    
    


    是說 successfully 的意思,是指你成功配置了憑證,但憑證是否能使用必須使用後才知道,可以試著使用 pull 指令來試試:

    1
    2
    3
    4
    5
     $ clara pull platform     
     ✔ Yes
     Clara Platform 0.7.1-2008.1
     Chart saved at: /home/.clara/charts/clara
     Hint: use "clara platform start" or "clara platform restart" to deploy pulled Clara Platform.
    

    如果失敗可能會看到下列這樣的訊息:

    1
    2
    3
    4
     $ clara pull platform
     ✔ Yes
     Error: unable to fetch latest version information
     401 Unauthorized
    

    或是

    1
    2
    3
    4
     $ clara pull platform
     ✔ Yes
     Error: unable to fetch latest version information
     403 Forbidden
    


  4. 啟動 Helm Chart
    在上一篇提到 Helm Charts 時有提過,除了 Triton Inference Server 之外的 charts,都可以藉由這步驟啟動。

    NVIDIA Clara Deploy Architecture
    DNVIDIA Clara Deploy Architecture(圖片來源: SDK 0.7.1 documentation


    因為 platform 的下載在剛剛測試 clara 的指令時已經順道完成了,所以這邊就直接啟動。

    1
    2
    3
    4
     $clara platform start
     Starting clara...
     NAME:   clara
     Note: If there is a running instance of Clara Console, Clara Dicom Adapter or Clara Renderer, they should be restarted.
    


    接下來下載 Clara Deploy Services 的 Helm Charts:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
     $ clara pull dicom
     ✔ Yes
     Clara Dicom Adapter 0.7.1-2008.1
     Chart saved at: /home/.clara/charts/dicom-adapter
     Hint: use "clara dicom start" or "clara dicom restart" to deploy pulled Clara Dicom Adapter.
    
     $ clara pull render
     ✔ Yes
     Clara Renderer 0.7.1-2008.1
     Chart saved at: /home/.clara/charts/clara-renderer
     Hint: use "clara render start" or "clara render restart" to deploy pulled Clara Renderer.
    
     $ clara pull monitor
     ✔ Yes
     Clara Monitor Server 0.7.1-2008.1
     Chart saved at: /home/.clara/charts/clara-monitor-server
     Hint: use "clara monitor start" or "clara monitor restart" to deploy pulled Clara Monitor Server.
    
    
     $ clara pull console
     ✔ Yes
     Clara Management Console 0.7.1-2008.1
     Chart saved at: /home/.clara/charts/clara-console
     Hint: use "clara console start" or "clara console restart" to deploy pulled Clara Management Console.   
    

    之後就可以試著啟動了:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
     $ clara dicom start
     Starting DICOM Adapter...
     NAME: clara-dicom-adapter
    
     $ clara render start
     NAME: clara-render-server
    
    
     $ clara monitor start
     NAME: clara-monitor-server
    
     $ clara console start
     NAME: clara-console
    



驗證安裝

如果一切順利的話,跑完上面算是安裝完成了,你可以試著下 hlem ls 指令來觀察目前所啟動的 charts:

1
2
3
4
5
6
7
$ helm ls
NAME                    CHART 
clara                     clara-0.7.1-2008.1   
clara-console             clara-console-0.7.1-2008.1  
clara-dicom-adapter        dicom-adapter-0.7.1-2008.1 
clara-monitor-server        clara-monitor-server-0.7.1-2008.1   
clara-render-server        clara-renderer-0.7.1-2008.1


或是下 kubectl get pods 應該會看到下面這些 Pods:

  • clara-clara-platformapiserver-
  • clara-dicom-adapter-
  • clara-monitor-server-fluentd-elasticsearch-
  • clara-monitor-server-grafana-
  • clara-monitor-server-monitor-server-
  • clara-render-server-clara-renderer-
  • clara-resultsservice-
  • clara-ui-
  • clara-console-
  • clara-console-mongodb-
  • clara-workflow-controller-
  • elasticsearch-master-0
  • elasticsearch-master-1



觀察 Pod 的變化

提到啟動的 Pod,有點好奇在每個 Chart 啟動時,會啟動的 Pod 有哪些。所以把整個 Clara 卸掉,重新安裝一次並觀察 Pod 的變化。

  1. 安裝前

    1
    2
    3
    4
    5
    6
     $ kubectl get all
     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
     service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   4d23h
    
     $ kubectl get pods
     No resources found.
    


  2. clara platform start

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
     $ kubectl get all    
     NAME                                                 READY   STATUS    RESTARTS   AGE
     pod/clara-clara-platformapiserver-54c5c44bbd-9b97b   1/1     Running   0          95s
     pod/clara-resultsservice-664477898f-zl8cr            1/1     Running   0          95s
     pod/clara-ui-6f89b97df8-fn2zm                        1/1     Running   0          95s
     pod/clara-workflow-controller-69cbb55fc8-t67ns       1/1     Running   0          95s
     pod/fluentd-7n2b8                                    1/1     Running   0          95s
     pod/fluentd-ccnzw                                    1/1     Running   0          95s
    
    
     NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
     service/clara                  NodePort    10.103.37.52    <none>        50051:31536/TCP   95s
     service/clara-resultsservice   ClusterIP   10.108.91.220   <none>        8088/TCP          95s
     service/clara-ui               ClusterIP   10.97.148.11    <none>        80/TCP            95s
     service/kubernetes             ClusterIP   10.96.0.1       <none>        443/TCP           9m52s
    
     NAME                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
     daemonset.apps/fluentd   2         2         2       2            2           <none>          95s
    
     NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
     deployment.apps/clara-clara-platformapiserver   1/1     1            1           95s
     deployment.apps/clara-resultsservice            1/1     1            1           95s
     deployment.apps/clara-ui                        1/1     1            1           95s
     deployment.apps/clara-workflow-controller       1/1     1            1           95s
    
     NAME                                                       DESIRED   CURRENT   READY   AGE
     replicaset.apps/clara-clara-platformapiserver-54c5c44bbd   1         1         1       95s
     replicaset.apps/clara-resultsservice-664477898f            1         1         1       95s
     replicaset.apps/clara-ui-6f89b97df8                        1         1         1       95s
     replicaset.apps/clara-workflow-controller-69cbb55fc8       1         1         1       95s
    


  3. clara dicom start

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
     $ kubectl get all       
     NAME                                                 READY   STATUS    RESTARTS   AGE
     pod/clara-clara-platformapiserver-54c5c44bbd-9b97b   1/1     Running   0          2m44s
     pod/clara-dicom-adapter-7948fcd445-rtbqr             1/1     Running   0          33s
     pod/clara-resultsservice-664477898f-zl8cr            1/1     Running   0          2m44s
     pod/clara-ui-6f89b97df8-fn2zm                        1/1     Running   0          2m44s
     pod/clara-workflow-controller-69cbb55fc8-t67ns       1/1     Running   0          2m44s
     pod/fluentd-7n2b8                                    1/1     Running   0          2m44s
     pod/fluentd-ccnzw                                    1/1     Running   0          2m44s
    
    
     NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                        $
     GE
     service/clara                  NodePort    10.103.37.52    <none>        50051:31536/TCP                $
     m44s
     service/clara-dicom-adapter    NodePort    10.105.101.54   <none>        104:31289/TCP,5000:31880/TCP   $
     3s
     service/clara-resultsservice   ClusterIP   10.108.91.220   <none>        8088/TCP                       $
     m44s
     service/clara-ui               ClusterIP   10.97.148.11    <none>        80/TCP                         $
     m44s
     service/kubernetes             ClusterIP   10.96.0.1       <none>        443/TCP                        $
     1m
    
     NAME                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
     daemonset.apps/fluentd   2         2         2       2            2           <none>          2m44s
    
     NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
     deployment.apps/clara-clara-platformapiserver   1/1     1            1           2m44s
     deployment.apps/clara-dicom-adapter             1/1     1            1           33s
     deployment.apps/clara-resultsservice            1/1     1            1           2m44s
     deployment.apps/clara-ui                        1/1     1            1           2m44s
     deployment.apps/clara-workflow-controller       1/1     1            1           2m44s
    
     NAME                                                       DESIRED   CURRENT   READY   AGE
     replicaset.apps/clara-clara-platformapiserver-54c5c44bbd   1         1         1       2m44s
     replicaset.apps/clara-dicom-adapter-7948fcd445             1         1         1       33s
     replicaset.apps/clara-resultsservice-664477898f            1         1         1       2m44s
     replicaset.apps/clara-ui-6f89b97df8                        1         1         1       2m44s
     replicaset.apps/clara-workflow-controller-69cbb55fc8       1         1         1       2m44s
    


  4. clara render start

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
     $ kubectl get all    
      kubectl get all    
     NAME                                                     READY   STATUS             RESTARTS   AGE
     pod/clara-clara-platformapiserver-54c5c44bbd-gfwng       1/1     Running            0          24m
     pod/clara-dicom-adapter-7948fcd445-mv248                 1/1     Running            0          20m
     pod/clara-render-server-clara-renderer-d79dd4779-f5hgd   2/3     CrashLoopBackOff   7          11m
     pod/clara-resultsservice-664477898f-2vsw9                1/1     Running            0          24m
     pod/clara-ui-6f89b97df8-c5p2f                            1/1     Running            0          24m
     pod/clara-workflow-controller-69cbb55fc8-mc682           1/1     Running            0          24m
     pod/fluentd-ntl6q                                        1/1     Running            0          24m
     pod/fluentd-tvnrl                                        1/1     Running            0          24m
    
    
     NAME                                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          
                    AGE
     service/clara                                NodePort    10.101.135.71    <none>        50051:32455/TCP  
                    24m
     service/clara-dicom-adapter                  NodePort    10.100.25.126    <none>        104:31985/TCP,500
     0:30647/TCP    20m
     service/clara-renderer-clara-render-server   NodePort    10.108.60.232    <none>        8070:30105/TCP,80
     60:32006/TCP   11m
     service/clara-resultsservice                 ClusterIP   10.109.206.204   <none>        8088/TCP         
                    24m
     service/clara-ui                             ClusterIP   10.101.195.91    <none>        80/TCP           
                    24m
     service/kubernetes                           ClusterIP   10.96.0.1        <none>        443/TCP          
                    27m
    
     NAME                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
     daemonset.apps/fluentd   2         2         2       2            2           <none>          24m
    
     NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
     deployment.apps/clara-clara-platformapiserver        1/1     1            1           24m
     deployment.apps/clara-dicom-adapter                  1/1     1            1           20m
     deployment.apps/clara-render-server-clara-renderer   0/1     1            0           11m
     deployment.apps/clara-resultsservice                 1/1     1            1           24m
     deployment.apps/clara-ui                             1/1     1            1           24m
     deployment.apps/clara-workflow-controller            1/1     1            1           24m
    
     NAME                                                           DESIRED   CURRENT   READY   AGE
     replicaset.apps/clara-clara-platformapiserver-54c5c44bbd       1         1         1       24m
     replicaset.apps/clara-dicom-adapter-7948fcd445                 1         1         1       20m
     replicaset.apps/clara-render-server-clara-renderer-d79dd4779   1         1         0       11m
     replicaset.apps/clara-resultsservice-664477898f                1         1         1       24m
     replicaset.apps/clara-ui-6f89b97df8                            1         1         1       24m
     replicaset.apps/clara-workflow-controller-69cbb55fc8           1         1         1       24m
    


  5. clara monitor start

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
     $ kubectl get all          
     NAME                                                       READY   STATUS             RESTARTS   AGE
     pod/clara-clara-platformapiserver-54c5c44bbd-gfwng         1/1     Running            0          40m
     pod/clara-dicom-adapter-7948fcd445-mv248                   1/1     Running            0          36m
     pod/clara-monitor-server-fluentd-elasticsearch-dl7bj       1/1     Running            0          14m
     pod/clara-monitor-server-fluentd-elasticsearch-jxdk6       1/1     Running            0          14m
     pod/clara-monitor-server-grafana-5f874b974d-qvxgn          1/1     Running            0          14m
     pod/clara-monitor-server-monitor-server-59c8bf68f7-5rcg7   0/1     CrashLoopBackOff   7          14m
     pod/clara-render-server-clara-renderer-d79dd4779-f5hgd     2/3     CrashLoopBackOff   10         27m
     pod/clara-resultsservice-664477898f-2vsw9                  1/1     Running            0          40m
     pod/clara-ui-6f89b97df8-c5p2f                              1/1     Running            0          40m
     pod/clara-workflow-controller-69cbb55fc8-mc682             1/1     Running            0          40m
     pod/elasticsearch-master-0                                 1/1     Running            0          14m
     pod/elasticsearch-master-1                                 1/1     Running            0          14m
     pod/fluentd-ntl6q                                          1/1     Running            0          40m
     pod/fluentd-tvnrl                                          1/1     Running            0          40m
    
    
     NAME                                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          
                    AGE
     service/clara                                NodePort    10.101.135.71    <none>        50051:32455/TCP  
                    40m
     service/clara-dicom-adapter                  NodePort    10.100.25.126    <none>        104:31985/TCP,500
     0:30647/TCP    36m
     service/clara-monitor-server                 ClusterIP   10.111.167.160   <none>        50051/TCP        
                    14m
     service/clara-monitor-server-grafana         NodePort    10.100.148.116   <none>        80:32000[16/1632]
                    14m
     service/clara-renderer-clara-render-server   NodePort    10.108.60.232    <none>        8070:30105/TCP,80
     60:32006/TCP   27m
     service/clara-resultsservice                 ClusterIP   10.109.206.204   <none>        8088/TCP         
                    40m
     service/clara-ui                             ClusterIP   10.101.195.91    <none>        80/TCP           
                    40m
     service/elasticsearch-master                 ClusterIP   10.108.240.18    <none>        9200/TCP,9300/TCP
                    14m
     service/elasticsearch-master-headless        ClusterIP   None             <none>        9200/TCP,9300/TCP
                    14m
     service/kubernetes                           ClusterIP   10.96.0.1        <none>        443/TCP          
                    43m
    
     NAME                                                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAI
     LABLE   NODE SELECTOR   AGE
     daemonset.apps/clara-monitor-server-fluentd-elasticsearch   2         2         2       2            2   
             <none>          14m
     daemonset.apps/fluentd                                      2         2         2       2            2   
             <none>          40m
    
     NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
     deployment.apps/clara-clara-platformapiserver         1/1     1            1           40m
     deployment.apps/clara-dicom-adapter                   1/1     1            1           36m
     deployment.apps/clara-monitor-server-grafana          1/1     1            1           14m
     deployment.apps/clara-monitor-server-monitor-server   0/1     1            0           14m
     deployment.apps/clara-render-server-clara-renderer    0/1     1            0           27m
     deployment.apps/clara-resultsservice                  1/1     1            1           40m
     deployment.apps/clara-ui                              1/1     1            1           40m
     deployment.apps/clara-workflow-controller             1/1     1            1           40m
    
     NAME                                                             DESIRED   CURRENT   READY   AGE
     replicaset.apps/clara-clara-platformapiserver-54c5c44bbd         1         1         1       40m
     replicaset.apps/clara-dicom-adapter-7948fcd445                   1         1         1       36m
     replicaset.apps/clara-monitor-server-grafana-5f874b974d          1         1         1       14m
     replicaset.apps/clara-monitor-server-monitor-server-59c8bf68f7   1         1         0       14m
     replicaset.apps/clara-render-server-clara-renderer-d79dd4779     1         1         0       27m
     replicaset.apps/clara-resultsservice-664477898f                  1         1         1       40m
     replicaset.apps/clara-ui-6f89b97df8                              1         1         1       40m
     replicaset.apps/clara-workflow-controller-69cbb55fc8             1         1         1       40m
    
     NAME                                    READY   AGE
     statefulset.apps/elasticsearch-master   2/2     14m
    


  6. clara console start

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
     $ kubectl get all                                                             
    
     NAME                                                       READY   STATUS             RESTARTS   AGE
     pod/clara-clara-platformapiserver-54c5c44bbd-gfwng         1/1     Running            0          61m
     pod/clara-console-8565b4d565-77jhc                         2/2     Running            0          19m
     pod/clara-console-mongodb-85f8bd5f95-8nwqx                 1/1     Running            0          19m
     pod/clara-dicom-adapter-7948fcd445-mv248                   1/1     Running            0          58m
     pod/clara-monitor-server-fluentd-elasticsearch-dl7bj       1/1     Running            0          36m
     pod/clara-monitor-server-fluentd-elasticsearch-jxdk6       1/1     Running            0          36m
     pod/clara-monitor-server-grafana-5f874b974d-qvxgn          1/1     Running            0          36m
     pod/clara-monitor-server-monitor-server-59c8bf68f7-5rcg7   0/1     CrashLoopBackOff   11         36m
     pod/clara-render-server-clara-renderer-d79dd4779-f5hgd     2/3     CrashLoopBackOff   14         48m
     pod/clara-resultsservice-664477898f-2vsw9                  1/1     Running            0          61m
     pod/clara-ui-6f89b97df8-c5p2f                              1/1     Running            0          61m
     pod/clara-workflow-controller-69cbb55fc8-mc682             1/1     Running            0          61m
     pod/elasticsearch-master-0                                 1/1     Running            0          36m
     pod/elasticsearch-master-1                                 1/1     Running            0          36m
     pod/fluentd-ntl6q                                          1/1     Running            0          61m
     pod/fluentd-tvnrl                                          1/1     Running            0          61m
    
     NAME                                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          
                    AGE
     service/clara                                NodePort    10.101.135.71    <none>        50051:32455/TCP  
                    61m
     service/clara-console                        NodePort    10.99.119.217    <none>        8080:32002/TCP,50
     00:32003/TCP   19m
     service/clara-console-mongodb                ClusterIP   10.102.177.195   <none>        27017/TCP        
                    19m
     service/clara-dicom-adapter                  NodePort    10.100.25.126    <none>        104:31985/TCP,500
     0:30647/TCP    58m
     service/clara-monitor-server                 ClusterIP   10.111.167.160   <none>        50051/TCP        
                    36m
     service/clara-monitor-server-grafana         NodePort    10.100.148.116   <none>        80:32000/TCP     
                    36m
     service/clara-renderer-clara-render-server   NodePort    10.108.60.232    <none>        8070:30105/TCP,80
     60:32006/TCP   48m
     service/clara-resultsservice                 ClusterIP   10.109.206.204   <none>        8088/TCP         
                    61m
     service/clara-ui                             ClusterIP   10.101.195.91    <none>        80/TCP           
                    61m
     service/elasticsearch-master                 ClusterIP   10.108.240.18    <none>        9200/TCP,9300/TCP
                    36m
     service/elasticsearch-master-headless        ClusterIP   None             <none>        9200/TCP,9300/TCP
                    36m
     service/kubernetes                           ClusterIP   10.96.0.1        <none>        443/TCP          
                    65m
    
     NAME                                                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAI
     LABLE   NODE SELECTOR   AGE
     daemonset.apps/clara-monitor-server-fluentd-elasticsearch   2         2         2       2            2   
             <none>          36m
     daemonset.apps/fluentd                                      2         2         2       2            2   
             <none>          61m
    
     NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
     deployment.apps/clara-clara-platformapiserver         1/1     1            1           61m
     deployment.apps/clara-console                         1/1     1            1           19m
     deployment.apps/clara-console-mongodb                 1/1     1            1           19m
     deployment.apps/clara-dicom-adapter                   1/1     1            1           58m
     deployment.apps/clara-monitor-server-grafana          1/1     1            1           36m
     deployment.apps/clara-monitor-server-monitor-server   0/1     1            0           36m
     deployment.apps/clara-render-server-clara-renderer    0/1     1            0           48m
     deployment.apps/clara-resultsservice                  1/1     1            1           61m
     deployment.apps/clara-ui                              1/1     1            1           61m
     deployment.apps/clara-workflow-controller             1/1     1            1           61m
    
     NAME                                                             DESIRED   CURRENT   READY   AGE
     replicaset.apps/clara-clara-platformapiserver-54c5c44bbd         1         1         1       61m
     replicaset.apps/clara-console-8565b4d565                         1         1         1       19m
     replicaset.apps/clara-console-mongodb-85f8bd5f95                 1         1         1       19m
     replicaset.apps/clara-dicom-adapter-7948fcd445                   1         1         1       58m
     replicaset.apps/clara-monitor-server-grafana-5f874b974d          1         1         1       36m
     replicaset.apps/clara-monitor-server-monitor-server-59c8bf68f7   1         1         0       36m
     replicaset.apps/clara-render-server-clara-renderer-d79dd4779     1         1         0       48m
     replicaset.apps/clara-resultsservice-664477898f                  1         1         1       61m
     replicaset.apps/clara-ui-6f89b97df8                              1         1         1       61m
     replicaset.apps/clara-workflow-controller-69cbb55fc8             1         1         1       61m
    
     NAME                                    READY   AGE
     statefulset.apps/elasticsearch-master   2/2     36m
    
    



當然,如果像我這個白目的安裝就沒有那順利了…

錯誤嘗試:部屬 Clara Platform 與 啟動 Helm Chart

在環境要求的部份,對我來說比較麻煩的是 Kubernetes 與 Helm 的版號,因為我的伺服器環境是與組員共用,所以一開始我決定保留同事需要的環境來硬幹,試試能不能安裝成功,如果真的不行再來嘗試退版安裝。

所以這段如果只是要完成 Deploy SDK 安裝的可以跳過,這邊只是因為我的一時興起所產生的錯誤記錄而已,當然如果想看我怎麼焦頭爛額的可以繼續往下拉。

恩…我先跟大家說最後的嘗試結果好了,我最後還是退版了。不過我有將過程的一些錯誤記錄保留下來,看看之後還有沒機會回來再看,絕對不是因為單純湊數字 XDDD


這邊接續安裝步驟-配置 NGC 憑證,在完成 platform chart 的下載後,試著啟動 platform,得到了第一條錯誤訊息:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ clara platform start                  
Error: could not find tiller
Usage:
  platform start [flags]

Flags:
  -h, --help   help for start

Global Flags:
      --config string   config file (default is $HOME/.clara/config.yaml)
      --verbose         verbose output

Error: could not find tiller


Stack Overflow 上看到了一條類似錯誤訊息的提問,似乎重新初始化 Helm 即可:

1
2
3
4
5
6
7
$ helm init
Error: unknown command "init" for "helm"

Did you mean this?
        lint

Run 'helm --help' for usage.


結果沒有 helm init

查詢了一下 Helm 所找不到的 Tiller 到底是啥,根據 smalltown 所說,在 Helm2 中,Tiller 是用來安裝與管理其他應用服務的 K8S 元件,簡單來說 Tiller 是一個用來與 K8S API Server 溝通的 Service,不過由於權限設置與管理的問題,在 Helm3 的推出後就走向歷史了。

很不幸的,我的 Helm 是 v3 的版本:

1
2
3
$ helm version
version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"$
lean", GoVersion:"go1.13.8"}


Helm2 與 Helm3 的變動已經屬於系統架構的變動,這個實在不好改。經過調查與論壇上發問,最後只好將 Helm 降版。我是透過 Binary Releases 安裝的方式,將版本降回到 v2.15.2

降版後再次檢查 Helm 的版號,版號是正確了,但錯誤訊息依舊沒有消失:

1
2
3
4
$ helm version
Client: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTreeS
tate:"clean"}
Error: could not find tiller


不過版本都降了,helm init 指令應該可以使用了:

1
2
3
4
5
6
7
8
9
$ helm init
$HELM_HOME has been configured at /home/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-h$
lm-installation


再次檢查 Helm 的版號,可以發現多出了 Server,用 kubectl 查看正在運行的 Pod,可以看到 Tiller 正在努力工作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ helm version
Client: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTree$
tate:"clean"}
Server: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTree$
tate:"clean"}

$ kubectl get pods --namespace kube-system
NAME                                               READY   STATUS    RESTARTS   AGE
coredns-6955765f44-bl6jd                           1/1     Running   0          18h
coredns-6955765f44-mtlxv                           1/1     Running   0          18h
etcd-esccluster-control-plane                      1/1     Running   0          18h
kindnet-dxlgj                                      1/1     Running   0          18h
kindnet-hpvkw                                      1/1     Running   0          18h
kindnet-qb5lm                                      1/1     Running   0          18h
kube-apiserver-esccluster-control-plane            1/1     Running   0          18h
kube-controller-manager-esccluster-control-plane   1/1     Running   0          18h
kube-proxy-cvpmt                                   1/1     Running   0          18h
kube-proxy-nspv9                                   1/1     Running   0          18h
kube-proxy-trkh5                                   1/1     Running   0          18h
kube-scheduler-esccluster-control-plane            1/1     Running   0          18h
tiller-deploy-58f57c5787-bsfkh                     1/1     Running   0          14m


好了,排除 Tiller 的錯誤訊息後,重新 platform 的 Chart 後再重新 start 一次,看看會不會成功。

1
2
3
4
5
6
7
8
9
10
$ clara pull platform
✔ Yes
Clara Platform 0.7.1-2008.1
Chart saved at: /home/.clara/charts/clara
Hint: use "clara platform start" or "clara platform restart" to deploy pulled Clara Platform.


$ clara platform start
Starting clara...
RPC error: code = Unknown desc = namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "default"


呃,又出現 rpc 的 error。看到錯誤訊息,幾個可能的猜測:

  1. Server 安裝配置的問題:會考慮這個是因為我的 Tiller 後來降版後我自己重起的。
  2. 權限問題:這個的機會比較大,在查資料的時候,幾乎碰到的是這個狀況。


算了先試試看 Helm2 文件中說的本地運行分 Tiller 試試看:

1
2
3
4
5
6
$ ~/bin/tiller 
[main] 2020/10/27 10:53:03 Starting Tiller v2.15.2 (tls=false)
[main] 2020/10/27 10:53:03 GRPC listening on :44134
[main] 2020/10/27 10:53:03 Probes listening on :44135
[main] 2020/10/27 10:53:03 Storage driver is ConfigMap
[main] 2020/10/27 10:53:03 Max history per release is 0

但文件中第二步驟連接到新的本地 Tiller 主機,看起來怪怪的,所還是沒做了,


直接放棄第一條路,先是試著處理權限問題好了,根據 Helm2 的 Role-based Access Control 說明與 GitHub 上的大神討論重新設定了連接,並 start platform:

1
2
3
4
$clara platform start
Starting clara...
NAME:   clara
Note: If there is a running instance of Clara Console, Clara Dicom Adapter or Clara Renderer, they should be restarted.



喔耶! platform 起動後,就跟前面一樣來下載 Clara Deploy Services 的 Helm Charts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ clara pull dicom
✔ Yes
Clara Dicom Adapter 0.7.1-2008.1
Chart saved at: /home/.clara/charts/dicom-adapter
Hint: use "clara dicom start" or "clara dicom restart" to deploy pulled Clara Dicom Adapter.

$ clara pull render
✔ Yes
Clara Renderer 0.7.1-2008.1
Chart saved at: /home/.clara/charts/clara-renderer
Hint: use "clara render start" or "clara render restart" to deploy pulled Clara Renderer.

$ clara pull monitor
✔ Yes
Clara Monitor Server 0.7.1-2008.1
Chart saved at: /home/.clara/charts/clara-monitor-server
Hint: use "clara monitor start" or "clara monitor restart" to deploy pulled Clara Monitor Server.


$ clara pull console
✔ Yes
Clara Management Console 0.7.1-2008.1
Chart saved at: /home/.clara/charts/clara-console
Hint: use "clara console start" or "clara console restart" to deploy pulled Clara Management Console.

最後是心驚膽戰的最後一步:

1
2
3
4
5
6
7
8
9
10
11
12
$ clara dicom start
Starting DICOM Adapter...
NAME: clara-dicom-adapter

$ clara render start
NAME: clara-render-server

$ clara monitor start
Error: rpc error: code = Unknown desc = validation failed: [unable to recognize "": no matches for kind "PodSecurityPolicy" in version "extensions/v1beta1", unable to recognize "": no matches for kind "Deployment" in version "apps/v1beta2", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta1"]

$ clara console start
NAME: clara-console


就知道沒這麼好過年的,是有看到有個 Issue 在討論這問題的,不過必須承認的是,這個討論超過我這個初學者對於 K8S 的掌握了,我才剛入門沒幾天阿(崩潰

問了論壇的人得到的回覆還是要我降版 K8S,所以最終只能鼻子摸摸開始降版了:

1
2
3
4
5
6
$ sudo apt remove kubectl kubeadm kubelet kubernetes-cni
$ rm -rf $HOME/.kube/config
$ sudo apt-get install -y kubelet=1.15.4-00 kubectl=1.15.4-00 kubeadm=1.15.4-00
$ kubectl version --short
Client Version: v1.15.4
Server Version: v1.15.6

重新啟動剛剛失敗 monitor Chart:

1
2
$ clara monitor start
NAME: clara-monitor-server



參考資料

  1. Clara Deploy Platform。檢自 NVIDIA NGC (2021-02-02)。
  2. NVIDIA Taiwan (2020-11-02)。NVIDIA Clara Deploy。檢自 Youtube (2021-02-02)。
  3. 2. Installation。檢自 Clara Deploy SDK https://hackmd.io/0.7.3 documentation (2021-02-02)。
  4. Community (2019-05-18)。openshift - Helm: could not find tiller。檢自 Stack Overflow (2021-02-02)。
  5. smalltown (2020-05-17)。Helm 3 踹踹看。檢自 Starbugs Weekly 星巴哥技術專欄|Medium (2021-02-02)。
  6. godleon (2021-01-24)。[Kubernetes] Package Manager - Helm 簡介。檢自 小信豬的原始部落 (2021-02-02)。
  7. postak (2018-04-10)。forbidden: User “system:serviceaccount:kube-system:default” cannot get namespaces in the namespace “default。檢自 fnproject/fn-helm|GitHub (2021-02-02)。
  8. noprom (2017-11-12)。User “system:serviceaccount:kube-system:default” cannot get namespaces in the namespace “default”。檢自 helm/helm|GitHub (2021-02-02)。
  9. Helm v2。檢自 Helm 官網 (2021-02-02)。
  10. Nick (2019-10-12)。[Day27] k8s應用篇(一):Helm部署apps、HPA和CA。檢自 iT 邦幫忙 (2021-02-02)。
  11. Terrones-Oscar (2020-08-13)。helm fails Error: validation failed: [unable to recognize “”: no matches for kind “PodSecurityPolicy”]。檢自 helm/charts|GitHub (2021-02-02)。
  12. jckasper (2019-09-06)。Helm init fails on Kubernetes 1.16.0。檢自 helm/helm|GitHub (2021-02-02)。
  13. Zz Chen (2018-07-03)。Helm 部署在 GKE 上的權限問題。檢自 smalltowntechblog|Medium (2021-02-02)。
  14. MengYun (2019-10-27)。Not responding when running “clara render start”。檢自 NVIDIA Developer Forums (2021-02-02)。



更新紀錄

最後更新日期:2021-03-07
     
  • 2021-03-07 發布
  •  
  • 2021-02-03 完稿
  •  
  • 2020-11-09 起稿