GitLab runners, EKS, IRSA and S3 caching
Posted on September 9, 2022 (Last modified on July 2, 2024) • 8 min read • 1,590 wordsAWS_*
credentials all over your gitlab pipelinesThere’s an official AWS solution: “IRSA”, which stands for “IAM roles for k8s service accounts”. (There is at least another non-AWS one, kube2iam, but this is not subject of this post.)
IRSA assigns an AWS IAM role to a K8S serviceaccount, which then in turn can be specified for a running pod. There are a couple of pages documenting this, but they are all either too much or too little. I now try do write what I wanted to have.
Also, nobody has ever documented using that for configuring S3 caching on a K8S GitLab runner. Which is not easy, cause the documentation could honestly be better.
You need (out of scope):
For every role you want to use in the cluster, you need …
That’s what we will do in here.
You basically configure a role as every other one, but you will specify a special trust policy. That policy references the OIDC provider, the k8s namespace and the service account in that namespace.
# terraform code - die hard terraform addict here, sorry.
locals {
oidc_arn = "arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/deadbeefaffe1234deadbeefaffe1234"
oidc_url = "https://oidc.eks.eu-central-1.amazonaws.com/id/deadbeefaffe1234deadbeefaffe1234"
k8s_namespace = "gitlab"
k8s_service_account_name = "gitlab-runner"
}
# eks / pod / role stuff: https://is.gd/2skBH7
resource "aws_iam_role" "k8s_irsa_example_role" {
name = "k8s-irsa-example-role"
assume_role_policy = data.aws_iam_policy_document.trust_k8s_irsa_example_role.json
managed_policy_arns = [aws_iam_policy.some_policy.arn]
}
# this is the PERMISSIONS policy
data "aws_iam_policy_document" "perms_k8s_irsa_example_role" {
statement {
actions = ["s3:*", ]
resources = [
"arn:aws:s3:::my-super-duper-gitlab-runner-cache",
"arn:aws:s3:::my-super-duper-gitlab-runner-cache/*",
]
}}
# this is the TRUST policy
# the TRUST policy references k8s namespace and service account
data "aws_iam_policy_document" "trust_k8s_irsa_example_role" {
statement {
effect = "Allow"
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = [local.oidc.arn]
}
condition {
test = "StringEquals"
variable = "${replace(var.oidc_url, "https://", "")}:sub"
values = ["system:serviceaccount:${local.k8s_namespace}:${local.k8s_service_account_name}"]
}
}
}
For completeness the trust policy as JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/deadbeefaffe1234deadbeefaffe1234"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-central-1.amazonaws.com/id/deadbeefaffe1234deadbeefaffe1234:sub": "system:serviceaccount:gitlab:gitlab-runner"
}
}
}
]
}
This is, in fact, all you need from the AWS side of things.
Some notes:
Be absolutely aware that this policy references the name and the namespace in k8s. Also, the service account will reference this policy ARN. So if you change either name, you have to always adjust the other side!
For GitLab, you might want to have two of those (you’ll see why later)
So this is how any serviceAccount connected to an IAM role looks like:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
# this is the magic
# it references the role ARN, which contains the role's name
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/k8s-irsa-example-role
# the name referenced in the AWS IAM trust policy
# ("system:serviceaccount:<namespace>:<SERVICEACCOUNTNAME>")
name: gitlab-runner
# ("system:serviceaccount:<NAMESPACE>:<serviceaccountname>")
# if you change one, you change both - they reference each other BY NAME
namespace: gitlab
If you want to test this right now, feel free:
# $ kubectl apply -n gitlab -f THIS_FILE.yml
# $ kubectl exec -ti -n gitlab quick-debug-pod -- /bin/bash
# and then install awscli & ca-certificates to try accessing your s3 bucket :)
apiVersion: v1
kind: Pod
metadata:
name: quick-debug-pod
namespace: gitlab
spec:
serviceAccountName: gitlab-runner
containers:
- name: shell
image: "ubuntu:latest"
args: [sleep, infinity]
resources:
limits:
memory: "128Mi"
cpu: "500m"
REMARK: If you only want to enable your jobs to access AWS resources, you’re in luck. You can skip straight to “Addendum - enabling gitlab JOBS” and you’re done.
If you want to use the S3 cache, well, … more work.
Let’s start like this: WHAT YOU MUST KNOW, SUPER IMPORTANT:
Also, going forward I assume you deploy the gitlab runner using the GitLab runner helm chart.
If you …
… you’re almost done.
You just need to replace the serviceAccount name (gitlab-runner
) name with the actual
serviceAccount name on your system, and change one thing in the helm chart:
## https://is.gd/Qu1gGv
rbac:
create: true
serviceAccountAnnotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/k8s-irsa-example-role
See the bottom on how to configure the cache, this is just the role association.
In my case it was not so easy:
So I had to do a bit more work.
To work in K8S, the GitLab runner needs a …
We will create those three resources manually now, so we have control over naming and they can be re-used by several gitlab runners (e.g. fargate & cluster).
In K8S objects, that looks like this:
# the role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: gitlab-runner-common
namespace: gitlab
rules:
- apiGroups:
- ""
resources:
- "*"
verbs:
- "*"
---
# the service account
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/k8s-irsa-gitlab-runner-common
name: gitlab-runner-common
namespace: gitlab
---
# the role binding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: gitlab-runner-common
namespace: gitlab
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: gitlab-runner-common
subjects:
- kind: ServiceAccount
name: gitlab-runner-common
namespace: gitlab
(Btw, I’m using the serviceaccount, role & rolebinding charts here to simplify things - I only want to deal with helm and not with “plain” k8s objects.)
Let’s recap: We now have …
To do that, we need to …
Here are the relevant values.yaml
file parts:
## https://is.gd/Qu1gGv
rbac:
# this is basically documented NOWHERE, except here:
# https://gitlab.com/gitlab-org/gitlab-runner/-/issues/25972#note_477243643
create: false
serviceAccountName: gitlab-runner-common
Finally, still in the helm chart’s values file, we do this:
runners:
name: "whatever-runner"
config: |
[[runners]]
# ...
[runners.cache]
Type = "s3"
Path = "runners-all"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "my-super-duper-gitlab-runner-cache"
BucketLocation = "eu-central-1"
Insecure = false
# ...
Yup, that’s actually it.
We should be done.
Enjoy :)
Now, remember that anything from above will NOT give the actual pipline jobs any additional AWS permissions.
If you want to access AWS resources out of your jobs, just do this:
How? Simple. Modify the values file:
runners:
config: |
[[runners]]
# ...
[runners.kubernetes]
# ...
service_account = "gitlab-jobs"
# .gitlab-ci.yml
stages:
- test
variables:
CACHE_DIR: hey-ho-cache
.all: &all
before_script:
- mkdir -p "$CACHE_DIR"
- NOWDATE=$(date +%Y%m%d_%H%M%S)
- CACHEFILE="${CACHE_DIR}/heyho_${NOWDATE}"
- set -x
.cache: &cache
paths: [hey-ho-cache]
runner-test-create-cache:
<<: *all
image: alpine:edge
stage: test
script:
- echo "waahwaahboogah $NOWDATE" > $CACHEFILE
cache:
<<: *cache
tags:
- fargate-small
runner-test-check-cache:
<<: *all
image: alpine:edge
stage: test
script:
- cat $CACHE_DIR/*
cache:
<<: *cache
needs:
- runner-test-create-cache
tags:
- cluster
- kubernetes
It does.
If it does not, you have a naming error in your references, or you’re using a wrong serviceAccount for your pods, or you assigned the permission to the wrong entity (jobs instead of runner, or vice versa).
First, check your annotations and that they reference exiting things.
It does. Your references are wrong.
Check again.
Really.
Believe me.
NO. REALLY.
And if not, I can’t help you, cause this works.
(If I have an error here, the principle should be clear - nevertheless I would appreciate a hint if you find one)