插件窝 干货文章 配置Linux系统以支持云计算和大规模集群开发

配置Linux系统以支持云计算和大规模集群开发

sudo apt install bash 278    来源:    2025-04-16

Linux系统配置以支持云计算和大规模集群开发

基础系统配置

1. 系统选择与安装

  • 推荐发行版:CentOS/RHEL 8+, Ubuntu Server 20.04 LTS+, Debian 11+
  • 最小化安装:仅安装必要组件,减少安全风险
  • 分区方案
    • / (根分区):50GB+
    • /var:单独分区,50GB+ (日志和临时文件)
    • /home:按需分配
    • 交换分区:物理内存的1-2倍(最大不超过32GB)

2. 内核参数优化

修改/etc/sysctl.conf

# 网络性能优化
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 8192
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535

# 内存管理
vm.swappiness = 10
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

# 文件系统
fs.file-max = 2097152
fs.aio-max-nr = 1048576

3. 资源限制调整

修改/etc/security/limits.conf

* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited

云计算环境配置

1. 容器支持

Docker安装与配置

# Ubuntu示例
sudo apt-get update
sudo apt-get install -y docker.io
sudo systemctl enable --now docker

# 配置daemon.json
sudo tee /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
EOF

Podman替代方案

sudo apt-get install -y podman

2. 虚拟化支持

KVM/QEMU安装

sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
sudo systemctl enable --now libvirtd

验证虚拟化支持

egrep -c '(vmx|svm)' /proc/cpuinfo  # 应返回大于0
kvm-ok  # 验证KVM是否可用

集群管理工具

1. Kubernetes集群配置

安装kubeadm, kubelet和kubectl

sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

初始化主节点

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

2. 配置网络插件(如Calico)

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

存储配置

1. 分布式存储选项

Ceph集群配置

sudo apt-get install -y ceph-deploy
ceph-deploy new node1 node2 node3
ceph-deploy install node1 node2 node3
ceph-deploy mon create-initial
ceph-deploy admin node1 node2 node3

2. NFS共享存储

sudo apt-get install -y nfs-kernel-server
sudo mkdir -p /mnt/nfs_share
sudo chown nobody:nogroup /mnt/nfs_share
sudo chmod 777 /mnt/nfs_share

# 编辑/etc/exports
echo "/mnt/nfs_share *(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
sudo exportfs -a
sudo systemctl restart nfs-kernel-server

监控与日志

1. Prometheus + Grafana监控

安装Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.30.3/prometheus-2.30.3.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*
./prometheus --config.file=prometheus.yml

安装Grafana

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl enable --now grafana-server

2. ELK日志系统

安装Elasticsearch, Logstash和Kibana

# Elasticsearch
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update && sudo apt-get install -y elasticsearch

# Logstash
sudo apt-get install -y logstash

# Kibana
sudo apt-get install -y kibana
sudo systemctl enable --now elasticsearch kibana

安全配置

1. 基础安全加固

# 禁用root SSH登录
sudo sed -i 's/PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config

# 配置防火墙
sudo apt-get install -y ufw
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 6443  # Kubernetes API
sudo ufw enable

# 自动安全更新
sudo apt-get install -y unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades

2. SELinux/AppArmor配置

AppArmor(适用于Ubuntu)

sudo systemctl enable --now apparmor
sudo aa-status  # 查看状态

SELinux(适用于RHEL/CentOS)

sudo setenforce 1
sudo sed -i 's/SELINUX=permissive/SELINUX=enforcing/' /etc/selinux/config

性能调优

1. CPU调度优化

# 对于计算密集型节点
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# 对于I/O密集型节点
echo deadline | sudo tee /sys/block/sd*/queue/scheduler

2. 内存管理

# 调整透明大页
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

自动化与编排

1. Ansible配置管理

sudo apt-get install -y ansible

# 示例inventory文件
echo "[cluster_nodes]
node1 ansible_host=192.168.1.101
node2 ansible_host=192.168.1.102
node3 ansible_host=192.168.1.103" | sudo tee /etc/ansible/hosts

2. Terraform基础设施即代码

wget https://releases.hashicorp.com/terraform/1.0.10/terraform_1.0.10_linux_amd64.zip
unzip terraform_*.zip
sudo mv terraform /usr/local/bin/

网络优化

1. 高性能网络配置

# 安装DPDK(可选)
sudo apt-get install -y dpdk dpdk-dev

# 调整网络缓冲区
echo 'net.core.rmem_max=16777216' | sudo tee -a /etc/sysctl.conf
echo 'net.core.wmem_max=16777216' | sudo tee -a /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem=4096 87380 16777216' | sudo tee -a /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem=4096 65536 16777216' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

2. 多网卡绑定

sudo apt-get install -y ifenslave

# 编辑/etc/network/interfaces
echo "auto bond0
iface bond0 inet dhcp
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate 1
    bond-slaves eth0 eth1" | sudo tee -a /etc/network/interfaces

维护与更新策略

1. 滚动更新策略

# 使用kured实现Kubernetes节点自动重启
kubectl apply -f https://github.com/weaveworks/kured/releases/download/1.10.0/kured-1.10.0-dockerhub.yaml

2. 备份策略

# 使用Velero进行Kubernetes集群备份
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.1 \
    --bucket my-backup-bucket \
    --secret-file ./credentials-velero \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.example.com:9000

以上配置提供了一个全面的Linux系统配置方案,用于支持云计算和大规模集群开发环境。根据实际需求和硬件条件,可能需要进一步调整和优化特定参数。