본문 바로가기
Kubernetes

Onpremise에 Kubeflow 구축하기

by study4me 2024. 12. 3.
반응형

Kubernetes 환경 구축은 다음과 같이 진행하였다.

Kubernetes 구축

 

본 게시글에서는 kubeflow 1.9.1 Version으로 구축을 진행한다.
구축 환경은 인터넷이 가능한 환경이며, 서버 간 적절한 방화벽이 오픈되어 있다고 가정한다.
별도 Private Repository를 사용하지 않는다.
공식 문서를 기반으로 구축을 진행한다.

root 계정으로 명령어를 수행한다.


 

Kubeflow 구축 전 준비 사항

Kubeflow 구축에 NAS가 필요하다.

NAS 서버가 없을 경우 다음과 같이 NAS 서버를 간단하게 구축한다.

NAS 서버 구축

# NFS 설치
yum update -y
yum install -y nfs-utils 

# NFS 서비스 기동
systemctl start nfs-server.service
systemctl enable nfs-server.service
systemctl status nfs-server.service --no-pager

# 서비스의 포트와 프로토콜 확인
rpcinfo -p | grep nfs

# 공유폴더 권한 및 소유자 변경 
mkdir -p /nfs/kubeflow
# chmod -R 777 /nfs/kubeflow
chown -R 999:999 /nfs/kubeflow

systemctl restart nfs-server.service

NET_CIDR="10.0.0.0/16"
cat << EOF | tee /etc/exports
/nfs/kubeflow      $NET_CIDR(rw,sync,no_subtree_check,no_root_squash)
EOF

# exports 적용
exportfs -arv
exportfs -s

Kubeflow 구축

각 Worker Node, Master Node에서 다음 명령어를 실행한다.

Master Node에서는 0번 ~ 6번까지 수행

Worker Node에서는 0번 ~ 3번까지 수행

0. 사용할 버전 셋팅

# Master, Worker Node
export nfs_provisioner_ver_micro='4.0.18'
export kubeflow_ver_micro='1.9.1'
export helm_ver_micro='3.16.3'
export kustomize_ver_micro='5.5.0'
export NFS_IP='10.0.1.142' ##변경 필요

1. 서버에서 NAS Mount 테스트

# Master, Worker Node
# NFS 설치
# start, enable할 필요 없음
yum install nfs-utils -y
# Master, Worker Node
# NAS 마운트 테스트
mkdir -p /nfs/kubeflow
mount -t nfs ${NFS_IP}:/nfs/kubeflow /nfs/kubeflow
df -h |grep kubeflow
showmount -e $NFS_IP

# 테스트 완료 후 제거
umount /nfs/kubeflow
rm -rf /nfs

2. kubeflow 내 istio를 위한 모듈 로드

https://istio.io/latest/docs/ops/deployment/platform-requirements/

cat << EOF | sudo tee /etc/modules-load.d/istio-iptables.conf
br_netfilter
nf_nat
xt_REDIRECT
xt_owner
iptable_nat
iptable_mangle
iptable_filter
EOF

modprobe br_netfilter
modprobe nf_nat
modprobe xt_REDIRECT
modprobe xt_owner
modprobe iptable_nat
modprobe iptable_mangle
modprobe iptable_filter
lsmod | grep -E 'br_netfilter|nf_nat|xt_REDIRECT|xt_owner|iptable_nat|iptable_mangle|iptable_filter'

3. Linux 커널 하위 시스템 변경

https://github.com/kubeflow/manifests/tree/v1.9.1-branch?tab=readme-ov-file#prerequisites-1
Linux kernel subsystem changes to support many pods
sudo sysctl fs.inotify.max_user_instances=2280
sudo sysctl fs.inotify.max_user_watches=1255360
cat << EOF | tee -a /etc/sysctl.d/k8s.conf
fs.inotify.max_user_instances       = 2280
fs.inotify.max_user_watches         = 1255360
EOF

sysctl --system

4. Helm, Kustomize 설치

mkdir ~/kubeflow_workspace
cd ~/kubeflow_workspace

# helm 설치
curl -LO https://get.helm.sh/helm-v${helm_ver_micro}-linux-amd64.tar.gz
tar -zxvf helm-v${helm_ver_micro}-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
helm version

# kustomize 설치
curl -LO https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv5.5.0/kustomize_v${kustomize_ver_micro}_linux_amd64.tar.gz
tar -zxvf kustomize_v${kustomize_ver_micro}_linux_amd64.tar.gz
mv kustomize /usr/local/bin/kustomize
kustomize version

 

5. Kubernetes에서 NFS 연동 테스트( NFS-SUBDIR-EXTERNAL-PROVISIONER 구축)

# NFS-SUBDIR-EXTERNAL-PROVISIONER 구축
cd ~/kubeflow_workspace
yum install -y git
git clone --branch nfs-subdir-external-provisioner-${nfs_provisioner_ver_micro} https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner.git
cd nfs-subdir-external-provisioner/charts

helm install kf-nfs nfs-subdir-external-provisioner \
    --set nfs.server=$NFS_IP \
    --set nfs.path=/nfs/kubeflow

kubectl get sc
kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubectl get sc


# TEST 진행
cat << EOF > ~/test-claim.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim
spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
EOF

kubectl apply -f ~/test-claim.yaml
kubectl get pvc 
kubectl get pv

# TEST 완료 후 삭제
kubectl delete -f ~/test-claim.yaml
rm -f ~/test-claim.yaml

6. Kubeflow 배포

Kubeflow manifests에서 코드를 clone 해온다.

# Kubeflow 배포
cd ~/kubeflow_workspace
git clone --branch v${kubeflow_ver_micro}-branch https://github.com/kubeflow/manifests.git
mv manifests manifests-${kubeflow_ver_micro}
cd ~/kubeflow_workspace/manifests-${kubeflow_ver_micro}/

 

공식 문서에 있는 kustomize build 명령어를 통해 배포해준다.

해당 명령어를 일괄로 긁어서 실행하면 한번에 안되는 경우가 있어서 여러번 실행해준다.

# Install cert-manager
kustomize build common/cert-manager/cert-manager/base | kubectl apply -f -
kustomize build common/cert-manager/kubeflow-issuer/base | kubectl apply -f -
echo "Waiting for cert-manager to be ready ..."
kubectl wait --for=condition=ready pod -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager
kubectl wait --for=jsonpath='{.subsets[0].addresses[0].targetRef.kind}'=Pod endpoints -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager

# Install Istio
echo "Installing Istio configured with external authorization..."
kustomize build common/istio-1-22/istio-crds/base | kubectl apply -f -
kustomize build common/istio-1-22/istio-namespace/base | kubectl apply -f -
kustomize build common/istio-1-22/istio-install/overlays/oauth2-proxy | kubectl apply -f -

echo "Waiting for all Istio Pods to become ready..."
kubectl wait --for=condition=Ready pods --all -n istio-system --timeout 300s

# Oauth2-proxy
echo "Installing oauth2-proxy..."
# Only uncomment ONE of the following overlays, they are mutually exclusive,
# see `common/oauth2-proxy/overlays/` for more options.
# OPTION 1: works on most clusters, does NOT allow K8s service account 
#           tokens to be used from outside the cluster via the Istio ingress-gateway.
#
kustomize build common/oauth2-proxy/overlays/m2m-dex-only/ | kubectl apply -f -
kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy
# Option 2: works on Kind/K3D and other clusters with the proper configuration, and allows K8s service account tokens to be used
#           from outside the cluster via the Istio ingress-gateway. For example for automation with github actions.
# 
#kustomize build common/oauth2-proxy/overlays/m2m-dex-and-kind/ | kubectl apply -f -
#kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy
#kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/name=cluster-jwks-proxy' --timeout=180s -n istio-system

# Dex
echo "Installing Dex..."
kustomize build common/dex/overlays/oauth2-proxy | kubectl apply -f -
kubectl wait --for=condition=ready pods --all --timeout=180s -n auth

# Knative
kustomize build common/knative/knative-serving/overlays/gateways | kubectl apply -f -
kustomize build common/istio-1-22/cluster-local-gateway/base | kubectl apply -f -
kustomize build common/knative/knative-eventing/base | kubectl apply -f -

# Kubeflow Namespace
kustomize build common/kubeflow-namespace/base | kubectl apply -f -

# Network Policies
kustomize build common/networkpolicies/base | kubectl apply -f -

# Kubeflow Roles
kustomize build common/kubeflow-roles/base | kubectl apply -f -

# Kubeflow Istio Resources
kustomize build common/istio-1-22/kubeflow-istio-resources/base | kubectl apply -f -

# Kubeflow Pipelines
kustomize build apps/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user | kubectl apply -f -

# KServe
kustomize build contrib/kserve/kserve | kubectl apply -f -
kustomize build contrib/kserve/models-web-app/overlays/kubeflow | kubectl apply -f -

# Katib
kustomize build apps/katib/upstream/installs/katib-with-kubeflow | kubectl apply -f -

# Central Dashboard
kustomize build apps/centraldashboard/overlays/oauth2-proxy | kubectl apply -f -

# Admission Webhook
kustomize build apps/admission-webhook/upstream/overlays/cert-manager | kubectl apply -f -

# Notebooks 1.0
kustomize build apps/jupyter/notebook-controller/upstream/overlays/kubeflow | kubectl apply -f -
kustomize build apps/jupyter/jupyter-web-app/upstream/overlays/istio | kubectl apply -f -

# PVC Viewer Controller
kustomize build apps/pvcviewer-controller/upstream/default | kubectl apply -f -

# Profiles + KFAM
kustomize build apps/profiles/upstream/overlays/kubeflow | kubectl apply -f -

# Volumes Web Application
kustomize build apps/volumes-web-app/upstream/overlays/istio | kubectl apply -f -

# Tensorboard
kustomize build apps/tensorboard/tensorboards-web-app/upstream/overlays/istio | kubectl apply -f -
kustomize build apps/tensorboard/tensorboard-controller/upstream/overlays/kubeflow | kubectl apply -f -

# Training Operator
kustomize build apps/training-operator/upstream/overlays/kubeflow | kubectl apply -f -

# User Namespaces
kustomize build common/user-namespace/base | kubectl apply -f -

 

배포된 Pod들 상태 확인

kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
인증서 해제하는 부분 수정하기

Kubeflow 구축 후 작업

1. http 접근 시 추가 작업 필요

추가 예정

2. 앞단에 Nginx & Nginx Ingress Controller를 통해 외부 노출

추가 예정

반응형