这篇文章上次修改于 470 天前,可能其部分内容已经发生变化,如有疑问可询问作者。

前言

在前面的两篇我们介绍了本次实践的方案和Ceph的部署。在部署完Ceph之后,我们即将进入本系列的正题——Kubernetes部署。

实践环境声明

在开始之前,再放一下机器配置:

 节点名(Host)              内网IP          公网访问          硬件配置
ap-zj-storage-0       192.168.100.12        NAT             6C/12G
ap-zj-storage-1       192.168.100.11        NAT             6C/12G
ap-zj-worker-0        192.168.100.7         NAT             20C/72G
ap-zj-worker-1        192.168.100.6         NAT             20C/72G
ap-zj-master-0        192.168.100.3      直连/NAT(默认)      4C/8G
ap-zj-master-1        192.168.100.4         NAT             4C/8G
ap-zj-master-2        192.168.100.5         NAT             4C/8G

实践

软件环境初始化

Kubernetes需要socat和conntrack,在集群的所有机器上都安装:

sudo apt install socat conntrack

Docker安装

考虑到国内复杂的网络环境,我们需要提前安装并配置Docker:

sudo apt update
sudo apt install docker.io

安装完后,使用docker version确定docker相关版本无误,笔者的输出是:

Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.17.3
 Git commit:        20.10.12-0ubuntu4
 Built:             Mon Mar  7 17:10:06 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.3
  Git commit:       20.10.12-0ubuntu4
  Built:            Mon Mar  7 15:57:50 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu3.1
  GitCommit:        
 runc:
  Version:          1.1.0-0ubuntu1.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        

然后,编辑/etc/docker/daemon.json并输入如下配置:

{
  "registry-mirrors": ["https://<DOCKER-REGISTRY-MIRROR>"]
}

笔者此处不对registry mirror进行公开,星河成员请联系我获取内部mirror地址。
配置好这些之后,我们就能进入集群部署了。
[tip type="yellow"]具体情况可能随着时间推移产生变化,本段内容您可结合Kubesphere文档进行理解,本文所写的内容存在时效性。[/tip]

KubeKey安装

您可以从GitHub Release下载安装:

sudo mkdir /opt/kube-data
sudo cd /opt/kube-data
sudo wget https://ghproxy.com/github.com/kubesphere/kubekey/releases/download/v3.0.6/kubekey-v3.0.6-linux-amd64.tar.gz
sudo export KKZONE=cn
sudo tar -xzvf kubekey-v3.0.6-linux-amd64.tar.gz
chmod +x kk

也可以通过官方脚本安装:

sudo mkdir /opt/kube-data
sudo cd /opt/kube-data
sudo export KKZONE=cn
sudo curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.6 sh -
sudo chmod +x kk

KubeKey配置文件创建

先让KubeKey导出一个样板配置文件:

./kk create config --with-kubesphere v3.3.0 --with-kubernetes v1.21.5

考虑到生产环境需要,此处使用比较老的稳定版本,您可根据需求自行调整。
然后按照实际更改相关内容,贴一下笔者修改后的config:

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts: 
  - {name: zj-cluster-master-001, address: 192.168.100.3, internalAddress: 192.168.100.3, user: root}
  - {name: zj-cluster-master-002, address: 192.168.100.4, internalAddress: 192.168.100.4, user: root}
  - {name: zj-cluster-master-003, address: 192.168.100.5, internalAddress: 192.168.100.5, user: root}
  - {name: zj-cluster-worker-0, address: 192.168.100.7, internalAddress: 192.168.100.7, user: root}
  - {name: zj-cluster-worker-1, address: 192.168.100.6, internalAddress: 192.168.100.6, user: root}
  roleGroups:
    etcd:
    - zj-cluster-master-001
    - zj-cluster-master-002
    - zj-cluster-master-003
    master:
    - zj-cluster-master-001
    - zj-cluster-master-002
    - zj-cluster-master-003
    worker:
    - zj-cluster-worker-0
    - zj-cluster-worker-1
  controlPlaneEndpoint:
    ##Internal loadbalancer for apiservers
    ## 注意,如果master节点小于三个,可以不用部署lb
    internalLoadbalancer: haproxy

    ##If the external loadbalancer was used, 'address' should be set to loadbalancer's ip.
    domain: <按照实际需求设置,最好为非公网解析的域名,此处就不贴出来了>
    address: ""
    port: 6443
  kubernetes:
    version: v1.21.5
    clusterName: <按喜好设置>
    proxyMode: ipvs
    masqueradeAll: false
    maxPods: 150
    nodeCidrMaskSize: 24
  network:
    plugin: calico
    kubePodsCIDR: 10.99.64.0/18
    kubeServiceCIDR: 10.99.0.0/18
  registry:
    privateRegistry: ""

---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.3.0
spec:
  persistence:
    storageClass: ""
  authentication:
    jwtSecret: ""
  zone: ""
  local_registry: ""
  namespace_override: ""
  # dev_tag: ""
  etcd:
    monitoring: true
    endpointIps: localhost
    port: 2379
    tlsEnable: true
  common:
    core:
      console:
        enableMultiLogin: true
        port: 30880
        type: NodePort
    # apiserver:
    #  resources: {}
    # controllerManager:
    #  resources: {}
    redis:
      enabled: false
      volumeSize: 2Gi
    openldap:
      enabled: false
      volumeSize: 2Gi
    minio:
      volumeSize: 20Gi
    monitoring:
      # type: external
      endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
      GPUMonitoring:
        enabled: false
    gpu:
      kinds:
      - resourceName: "nvidia.com/gpu"
        resourceType: "GPU"
        default: true
    es:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      logMaxAge: 7
      elkPrefix: logstash
      basicAuth:
        enabled: false
        username: ""
        password: ""
      externalElasticsearchHost: ""
      externalElasticsearchPort: ""
  alerting:
    enabled: false
    # thanosruler:
    #   replicas: 1
    #   resources: {}
  auditing:
    enabled: false
    # operator:
    #   resources: {}
    # webhook:
    #   resources: {}
  devops:
    enabled: false
    # resources: {}
    jenkinsMemoryLim: 8Gi
    jenkinsMemoryReq: 5000Mi
    jenkinsVolumeSize: 40Gi
    jenkinsJavaOpts_Xms: 1200m
    jenkinsJavaOpts_Xmx: 1600m
    jenkinsJavaOpts_MaxRAM: 2g
  events:
    enabled: false
    # operator:
    #   resources: {}
    # exporter:
    #   resources: {}
    # ruler:
    #   enabled: true
    #   replicas: 2
    #   resources: {}
  logging:
    enabled: false
    logsidecar:
      enabled: true
      replicas: 2
      # resources: {}
  metrics_server:
    enabled: false
  monitoring:
    storageClass: ""
    node_exporter:
      port: 9100
      # resources: {}
    # kube_rbac_proxy:
    #   resources: {}
    # kube_state_metrics:
    #   resources: {}
    # prometheus:
    #   replicas: 1
    #   volumeSize: 20Gi
    #   resources: {}
    #   operator:
    #     resources: {}
    # alertmanager:
    #   replicas: 1
    #   resources: {}
    # notification_manager:
    #   resources: {}
    #   operator:
    #     resources: {}
    #   proxy:
    #     resources: {}
    gpu:
      nvidia_dcgm_exporter:
        enabled: false
        # resources: {}
  multicluster:
    clusterRole: none
  network:
    networkpolicy:
      enabled: false
    ippool:
      type: none
    topology:
      type: none
  openpitrix:
    store:
      enabled: true
  servicemesh:
    enabled: false
    istio:
      components:
        ingressGateways:
        - name: istio-ingressgateway
          enabled: false
        cni:
          enabled: false
  edgeruntime:
    enabled: false
    kubeedge:
      enabled: false
      cloudCore:
        cloudHub:
          advertiseAddress:
            - ""
        service:
          cloudhubNodePort: "30000"
          cloudhubQuicNodePort: "30001"
          cloudhubHttpsNodePort: "30002"
          cloudstreamNodePort: "30003"
          tunnelNodePort: "30004"
        # resources: {}
        # hostNetWork: false
      iptables-manager:
        enabled: true
        mode: "external"
        # resources: {}
      # edgeService:
      #   resources: {}
  terminal:
    timeout: 600
  telemetry_enabled: false

然后在你启动KubeKey的机器上创建SSH公钥:

ssh-keygen -o

创建之后将它同步到集群的所有机器的authorized_keys文件内,方便KubeKey连接操作。

部署Kubernetes和Kubesphere

然后就可以启动部署流程了:

./kk create cluster -f config.yaml

如果前面的操作没问题的话,此处KubeKey将连接上集群内的所有机器并进行安装操作,确认安装后将会自动进行部署。