K8S集群使用cronjob自动更新AWS ECR Token
AWS ECR提供私有镜像仓库,访问之前需要通过AWS ECR的认证,然后才能操作。原始的docker认证方式是这样的:
aws ecr get-login-password --region cn-northwest-1 | docker login --username AWS --password-stdin 123456.dkr.ecr.cn-northwest-1.amazonaws.com.cn
之后就可以使用docker操作pull或者push了。
对于K8S来说,我们并不直接操作docker,而是由kubelet管理容器。在处理认证这方面,我们可以用到K8S的secret资源,它可以存储docker的认证信息。
不过ECR的凭证有效期只有12小时,因此长远来看,需要一直保持可用的话需要周期性重新获取凭证。那么可以使用cronjob来做这个任务,因此我们需要一个安装了aws cli和kubectl的镜像。
构建一个安装aws cli和kubectl的镜像
使用的Dockerfile如下:
FROM alpine:latest
COPY server.crt server.key /
RUN apk --no-cache add aws-cli wget gcompat \
&& wget "https://dl.k8s.io/release/v1.23.9/bin/linux/amd64/kubectl" \
&& wget "https://rolesanywhere.amazonaws.com/releases/1.3.0/X86_64/Linux/aws_signing_helper" \
&& mv kubectl /usr/local/bin/kubectl \
&& chmod +x /usr/local/bin/kubectl aws_signing_helper \
&& apk del wget
需要注意的是,我这个构建过程是在海外做的,因为我要直接从dockerhub上拉alpine镜像,并且apk add的时候也快一点。
要想获取到ECR的凭证,需要有对应的AWS凭证,例如AKSK等,但我并不想在YAML里存放AKSK。我选择IAM Roleanywhere,用证书方式生成临时凭证。因此我还装了roleanywhere所需要的 aws_signing_helper和证书文件以及私钥。
创建cronjob
这里理清几个步骤,
- 要给某个具体的namespace创建secret,获取ns的名称。
- 启动容器之后,先用roleanywhere获取执行AWS CLI所需的凭证。
- 运行aws ecr命令获取ecr的凭证。
- 运行kubectl创建secret。
我这里为ingress-nginx这个namespace创建secret,用来使用ingress nginx controller,因为我把contoller相关的镜像放到了ECR里。使用的yaml如下:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ingress-nginx
name: ecr-auth
rules:
- apiGroups: [""]
resources: ["secrets","serviceaccounts"]
verbs: ["delete","create","get","patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: bind-default-secret-deleter
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ecr-auth
subjects:
- kind: ServiceAccount
name: default
namespace: default
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-ecr-auth
namespace: default
spec:
template:
metadata:
labels:
app: job-ecr-auth
spec:
imagePullSecrets:
- name: ecr
containers:
- command:
- /bin/sh
- -c
- |-
REGION=cn-northwest-1
ECRENDPOINT=123456789.dkr.ecr.cn-northwest-1.amazonaws.com.cn
SECRET_NAME=ecr
TARGET_NS=ingress-nginx
mkdir ~/.aws && touch ~/.aws/config && echo -e "[default]\n region = cn-northwest-1 \n credential_process = ./aws_signing_helper credential-process --certificate server.crt --private-key server.key --trust-anchor-arn arn:aws-cn:rolesanywhere:cn-northwest-1:123456789:trust-anchor/6a6966c2-912d-44fb-a468-d6faf8437334 --profile-arn arn:aws-cn:rolesanywhere:cn-northwest-1:123456789:profile/b174ad67-e035-4f86-ac34-c68bb313e785 --role-arn arn:aws-cn:iam::123456789:role/test" > ~/.aws/config
TOKEN=`aws ecr get-login-password --region ${REGION}`
echo "ENV variables setup done."
/usr/local/bin/kubectl delete secret ${SECRET_NAME} -n ${TARGET_NS} --ignore-not-found
/usr/local/bin/kubectl create secret docker-registry ${SECRET_NAME} \
--docker-server=${ECRENDPOINT} \
--docker-username=AWS \
--docker-password="${TOKEN}" -n ${TARGET_NS}
echo "Secret created by name. $SECRET_NAME"
#/usr/local/bin/kubectl -n ${TARGET_NS} patch serviceaccount ingress-nginx,ingress-nginx-admission -p '{"imagePullSecrets": [{"name": "'$SECRET_NAME'"}]}'
echo "All done."
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: 123456789.dkr.ecr.cn-northwest-1.amazonaws.com.cn/test:ecrauth
imagePullPolicy: IfNotPresent
name: ecr-auth
resources: {}
securityContext:
capabilities: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeSelector:
node-role.kubernetes.io/master: ""
tolerations:
- key: "node-role.kubernetes.io/master"
value:
dnsPolicy: Default
hostNetwork: true
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
backoffLimit: 4
---
apiVersion: batch/v1
kind: CronJob
metadata:
annotations:
name: ecr-auth
namespace: default
spec:
concurrencyPolicy: Allow
schedule: "0 */11 * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 3
suspend: false
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
imagePullSecrets:
- name: ecr
containers:
- command:
- /bin/sh
- -c
- |-
REGION=cn-northwest-1
ECRENDPOINT=123456789.dkr.ecr.cn-northwest-1.amazonaws.com.cn
SECRET_NAME=ecr
TARGET_NS=ingress-nginx
mkdir ~/.aws && touch ~/.aws/config && echo -e "[default]\n region = cn-northwest-1 \n credential_process = ./aws_signing_helper credential-process --certificate server.crt --private-key server.key --trust-anchor-arn arn:aws-cn:rolesanywhere:cn-northwest-1:123456789:trust-anchor/6a6966c2-912d-44fb-a468-d6faf8437334 --profile-arn arn:aws-cn:rolesanywhere:cn-northwest-1:123456789:profile/b174ad67-e035-4f86-ac34-c68bb313e785 --role-arn arn:aws-cn:iam::123456789:role/test" > ~/.aws/config
TOKEN=`aws ecr get-login-password --region ${REGION}`
echo "ENV variables setup done."
/usr/local/bin/kubectl delete secret ${SECRET_NAME} -n ${TARGET_NS} --ignore-not-found
/usr/local/bin/kubectl create secret docker-registry ${SECRET_NAME} \
--docker-server=${ECRENDPOINT} \
--docker-username=AWS \
--docker-password="${TOKEN}" -n ${TARGET_NS}
echo "Secret created by name. $SECRET_NAME"
#/usr/local/bin/kubectl -n ${TARGET_NS} patch serviceaccount ingress-nginx,ingress-nginx-admission -p '{"imagePullSecrets": [{"name": "'$SECRET_NAME'"}]}'
echo "All done."
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: 123456789.dkr.ecr.cn-northwest-1.amazonaws.com.cn/test:ecrauth
imagePullPolicy: IfNotPresent
name: ecr-auth
resources: {}
securityContext:
capabilities: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeSelector:
node-role.kubernetes.io/master: ""
tolerations:
- key: "node-role.kubernetes.io/master"
value:
dnsPolicy: Default
hostNetwork: true
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
这里对照yaml说明一下:
- 我的环境比较特殊,我把刚才构建的镜像也放到ecr里面了,因此使用这个镜像也需要先创建一个secret。
- 鉴于#1,我把这个cronjob放到了default namespace里,在这里去为其他ns创建secret,以保证目标NS的纯净性。
- 为了做#2,不这又牵扯到default serviceaccount无法对其他namespace资源做操作,还需要创建角色,和serviceaccount绑定。
- cronjob根据定义的周期来运行,为了最快看到效果,我多加了一个job,使得刚开始secret就创建出来。
因此这个环境比较杂乱,但能够领会到思路,也能够学习到更多的资源用法,后面如果真的有类似的需求,是可以做调整的。
实验效果
我在ingress-nginx的yaml里已经提前写好了使用secret拉取镜像。
imagePullSecrets:
- name: ecr
但现在还没有ecr这个secret创建,因此出现下面的报错,和不带凭证直接去拉取镜像报错是一样的。
[root@master ec2-user]# kubectl -n ingress-nginx get pod
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-2hkj6 0/1 ImagePullBackOff 0 27s
ingress-nginx-admission-patch-c6d87 0/1 ImagePullBackOff 0 27s
ingress-nginx-controller-5fdf68b6bc-nnlph 0/1 ImagePullBackOff 0 27s
*****
Warning Failed 6s (x2 over 19s) kubelet Failed to pull image "123456789.dkr.ecr.cn-northwest-1.amazonaws.com.cn/test:controller": rpc error: code = Unknown desc = Error response from daemon: Head "https://123456789.dkr.ecr.cn-northwest-1.amazonaws.com.cn/v2/test/manifests/controller": no basic auth credentials
现在部署获取凭证的yaml.可以看到job已经完成了。pod的日志显示已经创建成功。查看secrets也已经出现了一个名叫ecr的secret。
[root@master ec2-user]# kubectl -n ingress-nginx get cj
No resources found in ingress-nginx namespace.
[root@master ec2-user]# kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
ecr-auth 0 */11 * * * False 0 <none> 13s
[root@master ec2-user]#
[root@master ec2-user]# kubectl get job
NAME COMPLETIONS DURATION AGE
job-ecr-auth 1/1 5s 15s
[root@master ec2-user]# kubectl get pod
NAME READY STATUS RESTARTS AGE
job-ecr-auth-kzz69 0/1 Completed 0 19s
[root@master ec2-user]# kubectl logs job-ecr-auth-kzz69
ENV variables setup done.
secret/ecr created
Secret created by name. ecr
All done.
[root@master ec2-user]# kubectl -n ingress-nginx get secrets
NAME TYPE DATA AGE
default-token-8t7sg kubernetes.io/service-account-token 3 44h
ecr kubernetes.io/dockerconfigjson 1 18m
之后再次部署ingress-nginx,这次所有镜像都拉取成功了。
[root@master ec2-user]# kubectl -n ingress-nginx get pod
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-75p46 0/1 Completed 0 36s
ingress-nginx-admission-patch-6hgvp 0/1 Completed 0 36s
ingress-nginx-controller-5fdf68b6bc-k9lwg 1/1 Running 0 36s