Kubernetes的hpa和自定义指标hpa

kubernetes 的动态伸缩 HPA 是非常有用的特性。

我们的服务器托管在阿里云的 ACK 上,k8s 根据 cpu 或者 内存的使用情况,会自动伸缩关键 pod 的数量,以应对大流量的情形。而且更妙的是,动态扩展的 pod 并不是使用自己的固定服务器,而是使用阿里动态的 ECI 虚拟节点服务器,这样就真的是即开即用,用完即毁。有多大流量付多少钱,物尽其用。

我们先明确一下概念:

k8s 的资源指标获取是通过 api 接口来获得的,有两种 api,一种是核心指标,一种是自定义指标。

  • 核心指标:Core metrics,由metrics-server提供API,即 metrics.k8s.io,仅提供Node和Pod的CPU和内存使用情况。api 是 metrics.k8s.io

  • 自定义指标:Custom Metrics,由Prometheus Adapter提供API,即 custom.metrics.k8s.io,由此可支持任意Prometheus采集到的自定义指标。api 是 custom.metrics.k8s.ioexternal.metrics.k8s.io

一、核心指标metrics-server

阿里的 ACK 缺省是装了 metrics-server 的,看一下,系统里有一个metrics-server

1kubectl get pods -n kube-system

image-20211125092117637

再看看 api 的核心指标能拿到什么,先看 Node 的指标:

1kubectl get --raw "/apis/metrics.k8s.io" | jq .
2kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq .
3kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq . 

image-20211125092534308

可以看到阿里 eci 虚拟节点的 cpu 和 memory 资源。

再看看 Pod 的指标:

1kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq . 

image-20211125092730720

可以清楚得看到调度到虚拟节点上的 Pod 的 cpu 和 memory 资源使用情况。大家看到只有 cpu 和 memory,这就够了。

给个全部使用自己 Work Node 实体节点伸缩的例子:

php-hpa.yaml (这里定义了平均cpu使用到达80%或者内存平均使用到200M就伸缩):

php-hpa 规范了 php-deploy 如何伸缩,最小2,最大10:

 1apiVersion: autoscaling/v2beta1 
 2kind: HorizontalPodAutoscaler 
 3metadata: 
 4  name: php-hpa
 5  namespace: default
 6spec: 
 7  scaleTargetRef: 
 8    apiVersion: extensions/v1beta1 
 9    kind: Deployment 
10    name: php-deploy
11  minReplicas: 2 
12  maxReplicas: 10 
13  metrics: 
14  - type: Resource 
15    resource: 
16      name: cpu 
17      targetAverageUtilization: 80 
18  - type: Resource 
19    resource: 
20      name: memory 
21      targetAverageValue: 200Mi

再给个阿里 ACK 使用 ECI 虚拟节点伸缩的例子:

我们首先定义一个正常的 deployment ,php-deploy,这个就正常定义跟别的没区别。

然后再定义扩展到 eci 节点的 ElasticWorload,elastic-php,这个用来控制 php-deploy 拓展到 eci 虚拟节点去

下面是固定6个,动态24个,合计30个

 1apiVersion: autoscaling.alibabacloud.com/v1beta1
 2kind: ElasticWorkload
 3metadata:
 4  name: elastic-php
 5spec:
 6  sourceTarget:
 7    name: php-deploy
 8    kind: Deployment
 9    apiVersion: apps/v1
10    min: 0
11    max: 6
12  elasticUnit:
13  - name: virtual-kubelet
14    labels:
15      virtual-kubelet: "true"
16      alibabacloud.com/eci: "true"
17    annotations:
18      virtual-kubelet: "true"
19    nodeSelector:
20      type: "virtual-kubelet"
21    tolerations:
22    - key: "virtual-kubelet.io/provider"
23      operator: "Exists"
24      min: 0
25      max: 24
26  replicas: 30

然后定义 HPA php-hpa 来控制 elastic-php 的自动伸缩

 1apiVersion: autoscaling/v2beta2 
 2kind: HorizontalPodAutoscaler 
 3metadata: 
 4  name: php-hpa 
 5  namespace: default 
 6spec: 
 7  scaleTargetRef: 
 8    apiVersion: autoscaling.alibabacloud.com/v1beta1 
 9    kind: ElasticWorkload 
10    name: elastic-php 
11  minReplicas: 6 
12  maxReplicas: 30 
13  metrics: 
14  - type: Resource 
15    resource: 
16      name: cpu 
17      target: 
18        type: Utilization 
19        averageUtilization: 90 
20  behavior: 
21    scaleUp: 
22      policies: 
23      #- type: percent 
24      #  value: 500% 
25      - type: Pods 
26        value: 5 
27        periodSeconds: 180 
28    scaleDown: 
29      policies: 
30      - type: Pods 
31        value: 1 
32        periodSeconds: 600 

上面的 ElasticWorkload 需要仔细解释一下,php-deploy 定义的 pod 副本固定是6个,这6个都是在我们自己节点上不用再付费,然后 ECI 的 pod 副本数是0个到24个,那么总体pod数量就是 6+24 = 30 个,其中 24个是可以在虚拟节点上伸缩的。而 php-hpa 定义了伸缩范围是 6-30,那就意味着平时流量小的时候用的都是自己服务器上那6个固定 pod,如果流量大了,就会扩大到 eci 虚拟节点上,虚拟节点最大量是24个,如果流量降下来了,就会缩回去,缩到自己服务器的6个pod 上去。这样可以精确控制成本。

php-hpa 定义的最下面,扩大的时候如果 cpu 到了 90%,那么一次性扩大5个pod,缩小的时候一个一个缩,这样避免带来流量的毛刺。

二、自定义指标Prometheus

如上其实已经可以满足大多数要求了,但是想更进一步,比如想从 Prometheus 拿到的指标来进行 hpa 伸缩。

那就比较麻烦了

image-20211125103113631

看上图,Prometheus Operator 通过 http 拉取 pod 的 metric 指标,Prometheus Adaptor 再拉取 Prometheus Operator 存储的数据并且暴露给 Custom API 使用。为啥要弄这二个东西呢?因为 Prometheus 采集到的 metrics 数据并不能直接给 k8s 用,因为两者数据格式不兼容,还需要另外一个组件(kube-state-metrics),将prometheus 的 metrics 数据格式转换成 k8s API 接口能识别的格式,转换以后,因为是自定义API,所以还需要用 Kubernetes aggregator 在主 API 服务器中注册,以便其他程序直接通过 /apis/ 来访问。

我们首先来看看如何安装这二个东西:

首先装 Prometheus Operator,这家伙会自动装一捆东西, Pormetheus、Grafana、Alert manager等,所以最好给它单独弄一个命名空间

 1#安装
 2helm install --name prometheus --namespace monitoring  stable/prometheus-operator
 3
 4#开个端口本地访问 prometheus 的面板,curl http://localhost:9090
 5kubectl port-forward --namespace monitoring svc/prometheus-operator-prometheus 9090:9090
 6
 7
 8#看看都有什么pod
 9kubectl get pod -n monitoring
10NAME                                                          READY   STATUS    RESTARTS   AGE
11pod/alertmanager-prometheus-operator-alertmanager-0           2/2     Running   0          98m
12pod/prometheus-operator-grafana-857dfc5fc8-vdnff              2/2     Running   0          99m
13pod/prometheus-operator-kube-state-metrics-66b4c95cd9-mz8nt   1/1     Running   0          99m
14pod/prometheus-operator-operator-56964458-8sspk               2/2     Running   0          99m
15pod/prometheus-operator-prometheus-node-exporter-dcf5p        1/1     Running   0          99m
16pod/prometheus-operator-prometheus-node-exporter-nv6ph        1/1     Running   0          99m
17pod/prometheus-prometheus-operator-prometheus-0               3/3     Running   1          98m
18
19#看看都有什么svc
20kubectl get svc -n monitoring
21NAME                                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
22alertmanager-operated                          ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   100m
23prometheus-operated                            ClusterIP   None           <none>        9090/TCP                     100m
24prometheus-operator-alertmanager               NodePort    10.1.238.78    <none>        9093:31765/TCP               102m
25prometheus-operator-grafana                    NodePort    10.1.125.228   <none>        80:30284/TCP                 102m
26prometheus-operator-kube-state-metrics         ClusterIP   10.1.187.129   <none>        8080/TCP                     102m
27prometheus-operator-operator                   ClusterIP   10.1.242.61    <none>        8080/TCP,443/TCP             102m
28prometheus-operator-prometheus                 NodePort    10.1.156.181   <none>        9090:30268/TCP               102m
29prometheus-operator-prometheus-node-exporter   ClusterIP   10.1.226.134   <none>        9100/TCP                     102m

我们看到有 prometheus-operated 这个ClusterIP svc,注意 k8s 的 coredns 域名解析方式,集群的内部域名是 hbb.local,那么这个 svc 的全 hostname 就是 prometheus-operated.monitoring.svc.hbb.local,集群中可以取舍到 prometheus-operated.monitoring 或者 prometheus-operated.monitoring.svc 来访问。

然后再装 Prometheus Adaptor,我们要根据上面的具体情况来设置 prometheus.url:

1helm install --name prometheus-adapter stable/prometheus-adapter --set prometheus.url="http://prometheus-operated.monitoring.svc",prometheus.port="9090" --set image.tag="v0.4.1" --set rbac.create="true" --namespace custom-metrics

访问 external 和 custom 验证一下:

 1kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq
 2{
 3  "kind": "APIResourceList",
 4  "apiVersion": "v1",
 5  "groupVersion": "external.metrics.k8s.io/v1beta1",
 6  "resources": []
 7}
 8
 9kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
10{
11  "kind": "APIResourceList",
12  "apiVersion": "v1",
13  "groupVersion": "custom.metrics.k8s.io/v1beta1",
14  "resources": [
15    {
16      "name": "*/agent.googleapis.com|agent|api_request_count",
17      "singularName": "",
18      "namespaced": true,
19      "kind": "MetricValueList",
20      "verbs": [
21        "get"
22      ]
23    },
24[...lots more metrics...]
25    {
26      "name": "*/vpn.googleapis.com|tunnel_established",
27      "singularName": "",
28      "namespaced": true,
29      "kind": "MetricValueList",
30      "verbs": [
31        "get"
32      ]
33    }
34  ]
35}

重头戏在下面,其实 Prometheus Adaptor 从 Prometheus 拿的指标也是有限的,如果有自定义指标,或者想多拿些,就得继续拓展!!!

最快的方式是编辑 namespace 是 custom-metrics 下的 configmap ,名字是 Prometheus-adapter,增加seriesQuery

seriesQuery长这样子,下面是统计了所有 app=shopping-kart 的 pod 5分钟之内的变化速率总和:

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  labels:
 5    app: prometheus-adapter
 6    chart: prometheus-adapter-v1.2.0
 7    heritage: Tiller
 8    release: prometheus-adapter
 9  name: prometheus-adapter
10data:
11  config.yaml: |
12- seriesQuery: '{app="shopping-kart",kubernetes_namespace!="",kubernetes_pod_name!=""}'
13        seriesFilters: []
14        resources:
15          overrides:
16            kubernetes_namespace:
17              resource: namespace
18            kubernetes_pod_name:
19              resource: pod
20        name:
21          matches: ""
22          as: ""
23        metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)

Adapter configmap的配置如下:

  1. seriesQuery tells the Prometheus Metric name to the adapter(要去prometheus拿什么指标)

  2. resources tells which Kubernetes resources each metric is associated with or which labels does the metric include, e.g., namespace, pod etc.(关联的资源,最常用的就是 pod 和 namespace)

  3. metricsQuery is the actual Prometheus query that needs to be performed to calculate the actual values.(叠加 seriesQuery 后发送给prometheus的实际查询,用于得出最终的指标值)

  4. name with which the metric should be exposed to the custom metrics API(暴露给API的指标名)

举个例子:如果我们要计算 container_network_receive_packets_total,在Prometheus UI里我们要输入以下行来查询:

sum(rate(container_network_receive_packets_total{namespace=“default”,pod=~“php-deploy.*”}[10m])) by (pod)*

转换成 Adapter 的 metricsQuery 就变成这样了,很难懂:

*metricsQuery: ‘sum(rate(«.series»{«.labelmatchers»}10m])) by («.groupby»)’</.groupby></.labelmatchers></.series>*

再给个例子:

1rate(gorush_total_push_count{instance="push.server.com:80",job="push-server"}[5m])

image-20211125135531725

变成 adapter 的 configmap

 1apiVersion: v1
 2data:
 3  config.yaml: |
 4    rules:
 5    - seriesQuery: '{__name__=~"gorush_total_push_count"}'
 6      seriesFilters: []
 7      resources:
 8        overrides:
 9          namespace:
10            resource: namespace
11          pod:
12            resource: pod
13      name:
14        matches: ""
15        as: "gorush_push_per_second"
16      metricsQuery: rate(<<.Series>>{<<.LabelMatchers>>}[5m])

修改了configmap,必须重启prometheus-adapter的pod重新加载配置!!!

在hpa中应用的例子:

 1apiVersion: autoscaling/v2beta1
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: gorush-hpa
 5spec:
 6  scaleTargetRef:
 7    apiVersion: apps/v1
 8    kind: Deployment
 9    name: gorush
10  minReplicas: 1
11  maxReplicas: 5
12  metrics:
13  - type: Pods
14    pods:
15      metricName: gorush_push_per_second
16      targetAverageValue: 1m

再来一个,prometheus 函数名是 myapp_client_connected:

 1apiVersion: v1
 2data:
 3  config.yaml: |
 4    rules:
 5    - seriesQuery: '{__name__= "myapp_client_connected"}'
 6      seriesFilters: []
 7      resources:
 8        overrides:
 9          k8s_namespace:
10            resource: namespace
11          k8s_pod_name:
12            resource: pod
13      name:
14        matches: "myapp_client_connected"
15        as: ""
16      metricsQuery: <<.Series>>{<<.LabelMatchers>>,container_name!="POD"}

hpa的使用

 1apiVersion: autoscaling/v2beta1
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: hpa-sim
 5  namespace: default
 6spec:
 7  scaleTargetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: hpa-sim
11  minReplicas: 1
12  maxReplicas: 10
13  metrics:
14  - type: Pods
15    pods:
16      metricName: myapp_client_connected
17      targetAverageValue: 20

很复杂吧。我们下面给个详细例子

三、自定义指标全套例子

我们先定义一个 deployment,运行一个 nginx-vts 的 pod,这个镜像其实已经自己暴露出了 metric 指标

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: nginx-deploy
 5  annotations:
 6    prometheus.io/scrape: "true"
 7    prometheus.io/port: "80"
 8    prometheus.io/path: "/status/format/prometheus"
 9spec:
10  selector:
11    matchLabels:
12      app: nginx-deploy
13  template:
14    metadata:
15      labels:
16        app: nginx-deploy
17    spec:
18      containers:
19      - name: nginx-deploy
20        image: cnych/nginx-vts:v1.0
21        resources:
22          limits:
23            cpu: 50m
24          requests:
25            cpu: 50m
26        ports:
27        - containerPort: 80
28          name: http

然后定义个 svc,把80端口暴露出去

 1apiVersion: v1
 2kind: Service
 3metadata:
 4  name: nginx-svc
 5spec:
 6  ports:
 7  - port: 80
 8    targetPort: 80
 9    name: http
10  selector:
11    app: nginx-deploy
12  type: ClusterIP

prometheus 是自动发现的,所以 annotations 就会触发 prometheus 自动开始收集这些 nginx metric指标

集群内起个shell,访问看看

 1$ curl nginx-svc.default.svc.hbb.local/status/format/prometheus
 2# HELP nginx_vts_info Nginx info
 3# TYPE nginx_vts_info gauge
 4nginx_vts_info{hostname="nginx-deployment-65d8df7488-c578v",version="1.13.12"} 1
 5# HELP nginx_vts_start_time_seconds Nginx start time
 6# TYPE nginx_vts_start_time_seconds gauge
 7nginx_vts_start_time_seconds 1574283147.043
 8# HELP nginx_vts_main_connections Nginx connections
 9# TYPE nginx_vts_main_connections gauge
10nginx_vts_main_connections{status="accepted"} 215
11nginx_vts_main_connections{status="active"} 4
12nginx_vts_main_connections{status="handled"} 215
13nginx_vts_main_connections{status="reading"} 0
14nginx_vts_main_connections{status="requests"} 15577
15nginx_vts_main_connections{status="waiting"} 3
16nginx_vts_main_connections{status="writing"} 1
17# HELP nginx_vts_main_shm_usage_bytes Shared memory [ngx_http_vhost_traffic_status] info
18# TYPE nginx_vts_main_shm_usage_bytes gauge
19nginx_vts_main_shm_usage_bytes{shared="max_size"} 1048575
20nginx_vts_main_shm_usage_bytes{shared="used_size"} 3510
21nginx_vts_main_shm_usage_bytes{shared="used_node"} 1
22# HELP nginx_vts_server_bytes_total The request/response bytes
23# TYPE nginx_vts_server_bytes_total counter
24# HELP nginx_vts_server_requests_total The requests counter
25# TYPE nginx_vts_server_requests_total counter
26# HELP nginx_vts_server_request_seconds_total The request processing time in seconds
27# TYPE nginx_vts_server_request_seconds_total counter
28# HELP nginx_vts_server_request_seconds The average of request processing times in seconds
29# TYPE nginx_vts_server_request_seconds gauge
30# HELP nginx_vts_server_request_duration_seconds The histogram of request processing time
31# TYPE nginx_vts_server_request_duration_seconds histogram
32# HELP nginx_vts_server_cache_total The requests cache counter
33# TYPE nginx_vts_server_cache_total counter
34nginx_vts_server_bytes_total{host="_",direction="in"} 3303449
35nginx_vts_server_bytes_total{host="_",direction="out"} 61641572
36nginx_vts_server_requests_total{host="_",code="1xx"} 0
37nginx_vts_server_requests_total{host="_",code="2xx"} 15574
38nginx_vts_server_requests_total{host="_",code="3xx"} 0
39nginx_vts_server_requests_total{host="_",code="4xx"} 2
40nginx_vts_server_requests_total{host="_",code="5xx"} 0
41nginx_vts_server_requests_total{host="_",code="total"} 15576
42nginx_vts_server_request_seconds_total{host="_"} 0.000
43nginx_vts_server_request_seconds{host="_"} 0.000
44nginx_vts_server_cache_total{host="_",status="miss"} 0
45nginx_vts_server_cache_total{host="_",status="bypass"} 0
46nginx_vts_server_cache_total{host="_",status="expired"} 0
47nginx_vts_server_cache_total{host="_",status="stale"} 0
48nginx_vts_server_cache_total{host="_",status="updating"} 0
49nginx_vts_server_cache_total{host="_",status="revalidated"} 0
50nginx_vts_server_cache_total{host="_",status="hit"} 0
51nginx_vts_server_cache_total{host="_",status="scarce"} 0
52nginx_vts_server_bytes_total{host="*",direction="in"} 3303449
53nginx_vts_server_bytes_total{host="*",direction="out"} 61641572
54nginx_vts_server_requests_total{host="*",code="1xx"} 0
55nginx_vts_server_requests_total{host="*",code="2xx"} 15574
56nginx_vts_server_requests_total{host="*",code="3xx"} 0
57nginx_vts_server_requests_total{host="*",code="4xx"} 2
58nginx_vts_server_requests_total{host="*",code="5xx"} 0
59nginx_vts_server_requests_total{host="*",code="total"} 15576
60nginx_vts_server_request_seconds_total{host="*"} 0.000
61nginx_vts_server_request_seconds{host="*"} 0.000
62nginx_vts_server_cache_total{host="*",status="miss"} 0
63nginx_vts_server_cache_total{host="*",status="bypass"} 0
64nginx_vts_server_cache_total{host="*",status="expired"} 0
65nginx_vts_server_cache_total{host="*",status="stale"} 0
66nginx_vts_server_cache_total{host="*",status="updating"} 0
67nginx_vts_server_cache_total{host="*",status="revalidated"} 0
68nginx_vts_server_cache_total{host="*",status="hit"} 0
69nginx_vts_server_cache_total{host="*",status="scarce"} 0

然后用 wrk 随机发狂发请求压一把,我们去 prometheus 的面板看看指标被收集到没有

image-20211125152110040

很疯狂啊。我们编辑 Prometheus-Adapter 的 configmap ,加上如下内容

 1rules:
 2- seriesQuery: 'nginx_vts_server_requests_total'
 3  seriesFilters: []
 4  resources:
 5    overrides:
 6      kubernetes_namespace:
 7        resource: namespace
 8      kubernetes_pod_name:
 9        resource: pod
10  name:
11    matches: "^(.*)_total"
12    as: "${1}_per_second"
13  metricsQuery: (sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>))

然后杀了 Prometheus-Adapter 的 Pod 让它重启重新加载配置,过段时间访问一下,看看值,是527m

 1kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/nginx_vts_server_requests_per_second" | jq .
 2{
 3  "kind": "MetricValueList",
 4  "apiVersion": "custom.metrics.k8s.io/v1beta1",
 5  "metadata": {
 6    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/nginx_vts_server_requests_per_second"
 7  },
 8  "items": [
 9    {
10      "describedObject": {
11        "kind": "Pod",
12        "namespace": "default",
13        "name": "hpa-prom-demo-755bb56f85-lvksr",
14        "apiVersion": "/v1"
15      },
16      "metricName": "nginx_vts_server_requests_per_second",
17      "timestamp": "2020-04-07T09:45:45Z",
18      "value": "527m",
19      "selector": null
20    }
21  ]
22}

ok,没问题,我们定义个hpa,根据这个指标来伸缩

 1apiVersion: autoscaling/v2beta1
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: nginx-hpa
 5spec:
 6  scaleTargetRef:
 7    apiVersion: apps/v1
 8    kind: Deployment
 9    name: nginx-deploy
10  minReplicas: 2
11  maxReplicas: 5
12  metrics:
13  - type: Pods
14    pods:
15      metricName: nginx_vts_server_requests_per_second
16      targetAverageValue: 10

这样就好了。

如果 pod 本身不能暴露 metric ,我们可以在 sidecar 里安装 exporter 来收集数据并暴露出去就可以了。


Ansible Vault加密的使用
替代kubernetes Crontab的神器kala
comments powered by Disqus