반응형
Kubernetes 환경 구축은 다음과 같이 진행하였다.
본 게시글에서는 kubeflow 1.9.1 Version으로 구축을 진행한다.
구축 환경은 인터넷이 가능한 환경이며, 서버 간 적절한 방화벽이 오픈되어 있다고 가정한다.
별도 Private Repository를 사용하지 않는다.
공식 문서를 기반으로 구축을 진행한다.
root 계정으로 명령어를 수행한다.
Kubeflow 구축 전 준비 사항
Kubeflow 구축에 NAS가 필요하다.
NAS 서버가 없을 경우 다음과 같이 NAS 서버를 간단하게 구축한다.
NAS 서버 구축
# NFS 설치
yum update -y
yum install -y nfs-utils
# NFS 서비스 기동
systemctl start nfs-server.service
systemctl enable nfs-server.service
systemctl status nfs-server.service --no-pager
# 서비스의 포트와 프로토콜 확인
rpcinfo -p | grep nfs
# 공유폴더 권한 및 소유자 변경
mkdir -p /nfs/kubeflow
# chmod -R 777 /nfs/kubeflow
chown -R 999:999 /nfs/kubeflow
systemctl restart nfs-server.service
NET_CIDR="10.0.0.0/16"
cat << EOF | tee /etc/exports
/nfs/kubeflow $NET_CIDR(rw,sync,no_subtree_check,no_root_squash)
EOF
# exports 적용
exportfs -arv
exportfs -s
Kubeflow 구축
각 Worker Node, Master Node에서 다음 명령어를 실행한다.
Master Node에서는 0번 ~ 6번까지 수행
Worker Node에서는 0번 ~ 3번까지 수행
0. 사용할 버전 셋팅
# Master, Worker Node
export nfs_provisioner_ver_micro='4.0.18'
export kubeflow_ver_micro='1.9.1'
export helm_ver_micro='3.16.3'
export kustomize_ver_micro='5.5.0'
export NFS_IP='10.0.1.142' ##변경 필요
1. 서버에서 NAS Mount 테스트
# Master, Worker Node
# NFS 설치
# start, enable할 필요 없음
yum install nfs-utils -y
# Master, Worker Node
# NAS 마운트 테스트
mkdir -p /nfs/kubeflow
mount -t nfs ${NFS_IP}:/nfs/kubeflow /nfs/kubeflow
df -h |grep kubeflow
showmount -e $NFS_IP
# 테스트 완료 후 제거
umount /nfs/kubeflow
rm -rf /nfs
2. kubeflow 내 istio를 위한 모듈 로드
https://istio.io/latest/docs/ops/deployment/platform-requirements/
cat << EOF | sudo tee /etc/modules-load.d/istio-iptables.conf
br_netfilter
nf_nat
xt_REDIRECT
xt_owner
iptable_nat
iptable_mangle
iptable_filter
EOF
modprobe br_netfilter
modprobe nf_nat
modprobe xt_REDIRECT
modprobe xt_owner
modprobe iptable_nat
modprobe iptable_mangle
modprobe iptable_filter
lsmod | grep -E 'br_netfilter|nf_nat|xt_REDIRECT|xt_owner|iptable_nat|iptable_mangle|iptable_filter'
3. Linux 커널 하위 시스템 변경
https://github.com/kubeflow/manifests/tree/v1.9.1-branch?tab=readme-ov-file#prerequisites-1
Linux kernel subsystem changes to support many pods
sudo sysctl fs.inotify.max_user_instances=2280
sudo sysctl fs.inotify.max_user_watches=1255360
cat << EOF | tee -a /etc/sysctl.d/k8s.conf
fs.inotify.max_user_instances = 2280
fs.inotify.max_user_watches = 1255360
EOF
sysctl --system
4. Helm, Kustomize 설치
mkdir ~/kubeflow_workspace
cd ~/kubeflow_workspace
# helm 설치
curl -LO https://get.helm.sh/helm-v${helm_ver_micro}-linux-amd64.tar.gz
tar -zxvf helm-v${helm_ver_micro}-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
helm version
# kustomize 설치
curl -LO https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv5.5.0/kustomize_v${kustomize_ver_micro}_linux_amd64.tar.gz
tar -zxvf kustomize_v${kustomize_ver_micro}_linux_amd64.tar.gz
mv kustomize /usr/local/bin/kustomize
kustomize version
5. Kubernetes에서 NFS 연동 테스트( NFS-SUBDIR-EXTERNAL-PROVISIONER 구축)
# NFS-SUBDIR-EXTERNAL-PROVISIONER 구축
cd ~/kubeflow_workspace
yum install -y git
git clone --branch nfs-subdir-external-provisioner-${nfs_provisioner_ver_micro} https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner.git
cd nfs-subdir-external-provisioner/charts
helm install kf-nfs nfs-subdir-external-provisioner \
--set nfs.server=$NFS_IP \
--set nfs.path=/nfs/kubeflow
kubectl get sc
kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubectl get sc
# TEST 진행
cat << EOF > ~/test-claim.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-claim
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
EOF
kubectl apply -f ~/test-claim.yaml
kubectl get pvc
kubectl get pv
# TEST 완료 후 삭제
kubectl delete -f ~/test-claim.yaml
rm -f ~/test-claim.yaml
6. Kubeflow 배포
Kubeflow manifests에서 코드를 clone 해온다.
# Kubeflow 배포
cd ~/kubeflow_workspace
git clone --branch v${kubeflow_ver_micro}-branch https://github.com/kubeflow/manifests.git
mv manifests manifests-${kubeflow_ver_micro}
cd ~/kubeflow_workspace/manifests-${kubeflow_ver_micro}/
공식 문서에 있는 kustomize build 명령어를 통해 배포해준다.
해당 명령어를 일괄로 긁어서 실행하면 한번에 안되는 경우가 있어서 여러번 실행해준다.
# Install cert-manager
kustomize build common/cert-manager/cert-manager/base | kubectl apply -f -
kustomize build common/cert-manager/kubeflow-issuer/base | kubectl apply -f -
echo "Waiting for cert-manager to be ready ..."
kubectl wait --for=condition=ready pod -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager
kubectl wait --for=jsonpath='{.subsets[0].addresses[0].targetRef.kind}'=Pod endpoints -l 'app in (cert-manager,webhook)' --timeout=180s -n cert-manager
# Install Istio
echo "Installing Istio configured with external authorization..."
kustomize build common/istio-1-22/istio-crds/base | kubectl apply -f -
kustomize build common/istio-1-22/istio-namespace/base | kubectl apply -f -
kustomize build common/istio-1-22/istio-install/overlays/oauth2-proxy | kubectl apply -f -
echo "Waiting for all Istio Pods to become ready..."
kubectl wait --for=condition=Ready pods --all -n istio-system --timeout 300s
# Oauth2-proxy
echo "Installing oauth2-proxy..."
# Only uncomment ONE of the following overlays, they are mutually exclusive,
# see `common/oauth2-proxy/overlays/` for more options.
# OPTION 1: works on most clusters, does NOT allow K8s service account
# tokens to be used from outside the cluster via the Istio ingress-gateway.
#
kustomize build common/oauth2-proxy/overlays/m2m-dex-only/ | kubectl apply -f -
kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy
# Option 2: works on Kind/K3D and other clusters with the proper configuration, and allows K8s service account tokens to be used
# from outside the cluster via the Istio ingress-gateway. For example for automation with github actions.
#
#kustomize build common/oauth2-proxy/overlays/m2m-dex-and-kind/ | kubectl apply -f -
#kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/name=oauth2-proxy' --timeout=180s -n oauth2-proxy
#kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/name=cluster-jwks-proxy' --timeout=180s -n istio-system
# Dex
echo "Installing Dex..."
kustomize build common/dex/overlays/oauth2-proxy | kubectl apply -f -
kubectl wait --for=condition=ready pods --all --timeout=180s -n auth
# Knative
kustomize build common/knative/knative-serving/overlays/gateways | kubectl apply -f -
kustomize build common/istio-1-22/cluster-local-gateway/base | kubectl apply -f -
kustomize build common/knative/knative-eventing/base | kubectl apply -f -
# Kubeflow Namespace
kustomize build common/kubeflow-namespace/base | kubectl apply -f -
# Network Policies
kustomize build common/networkpolicies/base | kubectl apply -f -
# Kubeflow Roles
kustomize build common/kubeflow-roles/base | kubectl apply -f -
# Kubeflow Istio Resources
kustomize build common/istio-1-22/kubeflow-istio-resources/base | kubectl apply -f -
# Kubeflow Pipelines
kustomize build apps/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user | kubectl apply -f -
# KServe
kustomize build contrib/kserve/kserve | kubectl apply -f -
kustomize build contrib/kserve/models-web-app/overlays/kubeflow | kubectl apply -f -
# Katib
kustomize build apps/katib/upstream/installs/katib-with-kubeflow | kubectl apply -f -
# Central Dashboard
kustomize build apps/centraldashboard/overlays/oauth2-proxy | kubectl apply -f -
# Admission Webhook
kustomize build apps/admission-webhook/upstream/overlays/cert-manager | kubectl apply -f -
# Notebooks 1.0
kustomize build apps/jupyter/notebook-controller/upstream/overlays/kubeflow | kubectl apply -f -
kustomize build apps/jupyter/jupyter-web-app/upstream/overlays/istio | kubectl apply -f -
# PVC Viewer Controller
kustomize build apps/pvcviewer-controller/upstream/default | kubectl apply -f -
# Profiles + KFAM
kustomize build apps/profiles/upstream/overlays/kubeflow | kubectl apply -f -
# Volumes Web Application
kustomize build apps/volumes-web-app/upstream/overlays/istio | kubectl apply -f -
# Tensorboard
kustomize build apps/tensorboard/tensorboards-web-app/upstream/overlays/istio | kubectl apply -f -
kustomize build apps/tensorboard/tensorboard-controller/upstream/overlays/kubeflow | kubectl apply -f -
# Training Operator
kustomize build apps/training-operator/upstream/overlays/kubeflow | kubectl apply -f -
# User Namespaces
kustomize build common/user-namespace/base | kubectl apply -f -
배포된 Pod들 상태 확인
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
인증서 해제하는 부분 수정하기
Kubeflow 구축 후 작업
1. http 접근 시 추가 작업 필요
추가 예정
2. 앞단에 Nginx & Nginx Ingress Controller를 통해 외부 노출
추가 예정
반응형
'Kubernetes' 카테고리의 다른 글
kubeadm으로 Kubernetes 구축하기 on AWS EC2 (Containerd, Calico, RHEL8/RHEL9) (1) | 2024.11.23 |
---|---|
Kubernetes 버전에 따른 kube-system 이미지 버전 찾는 법 (0) | 2024.11.23 |