Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AWS Credentials Support for Scanning Private Registry (ECR) #1103

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

VF-mbrauer
Copy link
Contributor

Added support for scanning the private registry of AWS-ECR.
Feature still in Beta-mode but usable when prerequisites are met.

Prerequisite:
K8S-Kiam or any other method IRSA (Future) to be used to allow Assuming Instance Roles.

Configurations:
Annotate the Starboard Namespace to be allowed to use KIAM:
Annotations: iam.amazonaws.com/permitted: .*

Add the Role you want to Assume which has the proper right to create ECR-Credentials

podAnnotations: 
    # iam.amazonaws.com/role: <yourRolewithPermissions>

Configuration in HELM value file to enable the feature
trivy.useEcrRoleCreds: false

@pschulten
Copy link

pschulten commented Apr 8, 2022

@VF-mbrauer there's an issue with your if config.UseECRCredentials guard, the bracketing is wrong. If the below patch is applied, it works flawlessly

Index: pkg/plugin/trivy/plugin.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/pkg/plugin/trivy/plugin.go b/pkg/plugin/trivy/plugin.go
--- a/pkg/plugin/trivy/plugin.go	(revision 1411d127246e8fe9e856dd436cdcee821ce1f1af)
+++ b/pkg/plugin/trivy/plugin.go	(date 1649427066287)
@@ -624,7 +624,7 @@
 				}
 			}
 
-			var creds (ecr_credentials) = ecr_credentials{aws_creds[0][1], aws_creds[0][2]}
+			var creds = ecr_credentials{aws_creds[0][1], aws_creds[0][2]}
 
 			env = append(env, corev1.EnvVar{
 				Name:  "TRIVY_USERNAME",
@@ -953,90 +953,90 @@
 					},
 				})
 			}
+		}
 
-			env, err = p.appendTrivyInsecureEnv(config, container.Image, env)
-			if err != nil {
-				return corev1.PodSpec{}, nil, err
-			}
+		env, err = p.appendTrivyInsecureEnv(config, container.Image, env)
+		if err != nil {
+			return corev1.PodSpec{}, nil, err
+		}
 
-			env, err = p.appendTrivyNonSSLEnv(config, container.Image, env)
-			if err != nil {
-				return corev1.PodSpec{}, nil, err
-			}
+		env, err = p.appendTrivyNonSSLEnv(config, container.Image, env)
+		if err != nil {
+			return corev1.PodSpec{}, nil, err
+		}
 
-			if config.GetServerInsecure() {
-				env = append(env, corev1.EnvVar{
-					Name:  "TRIVY_INSECURE",
-					Value: "true",
-				})
-			}
+		if config.GetServerInsecure() {
+			env = append(env, corev1.EnvVar{
+				Name:  "TRIVY_INSECURE",
+				Value: "true",
+			})
+		}
 
-			if config.IgnoreFileExists() {
-				volumes = []corev1.Volume{
-					{
-						Name: ignoreFileVolumeName,
-						VolumeSource: corev1.VolumeSource{
-							ConfigMap: &corev1.ConfigMapVolumeSource{
-								LocalObjectReference: corev1.LocalObjectReference{
-									Name: trivyConfigName,
-								},
-								Items: []corev1.KeyToPath{
-									{
-										Key:  keyTrivyIgnoreFile,
-										Path: ".trivyignore",
-									},
-								},
-							},
-						},
-					},
-				}
+		if config.IgnoreFileExists() {
+			volumes = []corev1.Volume{
+				{
+					Name: ignoreFileVolumeName,
+					VolumeSource: corev1.VolumeSource{
+						ConfigMap: &corev1.ConfigMapVolumeSource{
+							LocalObjectReference: corev1.LocalObjectReference{
+								Name: trivyConfigName,
+							},
+							Items: []corev1.KeyToPath{
+								{
+									Key:  keyTrivyIgnoreFile,
+									Path: ".trivyignore",
+								},
+							},
+						},
+					},
+				},
+			}
 
-				volumeMounts = []corev1.VolumeMount{
-					{
-						Name:      ignoreFileVolumeName,
-						MountPath: "/etc/trivy/.trivyignore",
-						SubPath:   ".trivyignore",
-					},
-				}
+			volumeMounts = []corev1.VolumeMount{
+				{
+					Name:      ignoreFileVolumeName,
+					MountPath: "/etc/trivy/.trivyignore",
+					SubPath:   ".trivyignore",
+				},
+			}
 
-				env = append(env, corev1.EnvVar{
-					Name:  "TRIVY_IGNOREFILE",
-					Value: "/etc/trivy/.trivyignore",
-				})
-			}
+			env = append(env, corev1.EnvVar{
+				Name:  "TRIVY_IGNOREFILE",
+				Value: "/etc/trivy/.trivyignore",
+			})
+		}
 
-			requirements, err := config.GetResourceRequirements()
-			if err != nil {
-				return corev1.PodSpec{}, nil, err
-			}
+		requirements, err := config.GetResourceRequirements()
+		if err != nil {
+			return corev1.PodSpec{}, nil, err
+		}
 
-			optionalMirroredImage, err := GetMirroredImage(container.Image, config.GetMirrors())
-			if err != nil {
-				return corev1.PodSpec{}, nil, err
-			}
+		optionalMirroredImage, err := GetMirroredImage(container.Image, config.GetMirrors())
+		if err != nil {
+			return corev1.PodSpec{}, nil, err
+		}
 
-			containers = append(containers, corev1.Container{
-				Name:                     container.Name,
-				Image:                    trivyImageRef,
-				ImagePullPolicy:          corev1.PullIfNotPresent,
-				TerminationMessagePolicy: corev1.TerminationMessageFallbackToLogsOnError,
-				Env:                      env,
-				Command: []string{
-					"trivy",
-				},
-				Args: []string{
-					"--quiet",
-					"client",
-					"--format",
-					"json",
-					"--remote",
-					trivyServerURL,
-					optionalMirroredImage,
-				},
-				VolumeMounts: volumeMounts,
-				Resources:    requirements,
-			})
-		}
+		containers = append(containers, corev1.Container{
+			Name:                     container.Name,
+			Image:                    trivyImageRef,
+			ImagePullPolicy:          corev1.PullIfNotPresent,
+			TerminationMessagePolicy: corev1.TerminationMessageFallbackToLogsOnError,
+			Env:                      env,
+			Command: []string{
+				"trivy",
+			},
+			Args: []string{
+				"--quiet",
+				"client",
+				"--format",
+				"json",
+				"--remote",
+				trivyServerURL,
+				optionalMirroredImage,
+			},
+			VolumeMounts: volumeMounts,
+			Resources:    requirements,
+		})
 	}
 
 	return corev1.PodSpec{

@VF-mbrauer
Copy link
Contributor Author

@pschulten, great catch. If have removed the 2 Redundant parentheses as they are not needed. And for the brackets, I fixed that in getPodSpecForClientServerMode. I simply forgot that somehow in this ClientServerMode but I did it correctly in the other area like getPodSpecForStandaloneMode. Therefore, my E2E test was OK at that time.

I have corrected that already and committed d361d7f
Thanks again

@danielpacak danielpacak self-requested a review April 13, 2022 06:52
@chen-keinan chen-keinan self-requested a review April 13, 2022 07:27
pkg/utils/durationutil.go Outdated Show resolved Hide resolved
pkg/utils/durationutil.go Outdated Show resolved Hide resolved
pkg/plugin/trivy/plugin.go Outdated Show resolved Hide resolved
pkg/plugin/trivy/plugin.go Outdated Show resolved Hide resolved
@VF-mbrauer VF-mbrauer requested a review from chen-keinan April 13, 2022 08:56
deploy/helm/Chart.yaml Outdated Show resolved Hide resolved
} else {
errormsg = err.Error()
}
return nil, errors.New((errormsg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, errors.New((errormsg))
return nil, errors.New(errormsg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and pushed.

}
}

var creds (ecr_credentials) = ecr_credentials{aws_creds[0][1], aws_creds[0][2]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var creds (ecr_credentials) = ecr_credentials{aws_creds[0][1], aws_creds[0][2]}
var creds = ecr_credentials{aws_creds[0][1], aws_creds[0][2]}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and pushed.

@VF-mbrauer VF-mbrauer requested a review from chen-keinan April 13, 2022 12:56

if config.UseECRCredentials() && CheckAwsEcrPrivateRegistry(container.Image) != "" {

if utils.TokenTTLValidation(EcrTokenGen, TTLResult) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better not to have EcrTokenGen and aws_creds as global variables, I would, I would create ECRRegistry struct as a property of plugin which will encapsulate both properties (EcrTokenGen and aws_creds)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and fixed.

if config.UseECRCredentials() && CheckAwsEcrPrivateRegistry(container.Image) != "" {

if utils.TokenTTLValidation(EcrTokenGen, TTLResult) {
EcrTokenGen = time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EcrTokenGen = time.Now()
EcrTokenGen = p.clock.Now()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and fixed.

# AWS authorization token is valid for 12 hours by default
# Unit is hours(h)
#
# ecrTokenRefreshTTL = 11h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ecrTokenRefreshTTL has to be shorter then 12h? as Trivy might want to authenticate with an expired credential

Copy link
Contributor

@chen-keinan chen-keinan Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe a retry logic should be placed for GetAuthorizationToken in case tokens are expired unless we force refresh before 12h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Trivy will never use an expired token as in the original part I pushed, the Token creation was part of "each" individual request Trivy did per repository. The only reason we have this timer now in place is to reduce the number of requests (Throttling) against the ECR-API.

So, if you want to detect an expired token and you want to make it part of a retry to refresh the token now you need to get the first trivy job triggered which tries to get the image and will run into the error.
In this case, a Token Refresh needs to be triggered. But this is a completely new scenario, which needs to be plugged into the trivy process in combination with the PlugIn routines.
Also, a drawback and a negative aspect are here is, that each request needs to be sent towards API to check if still valid or not. But this will be contradicting the Throttling which I wanted to solve.

As of now, I see no point in that we need that. The Throttling does its job, and it is up to the implementer to configure the TTLs on their own. In my view, this is the best approach.

EcrTokenGen = time.Now()
aws_creds, err = GetAuthorizationToken(CheckAwsEcrPrivateRegistry(container.Image))
if err != nil {
return corev1.PodSpec{}, nil, err
Copy link
Contributor

@chen-keinan chen-keinan Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EcrTokenGen should be set to p.clock.Now() only if the GetAuthorizationToken returns no error. to avoid index out of range error in the next authorization cycle as aws_creds will be nil (GetAuthorizationToken will not be called again until TTL expire)

suggestions:

    aws_creds, err = GetAuthorizationToken(CheckAwsEcrPrivateRegistry(container.Image))
    if err != nil {
	    return corev1.PodSpec{}, nil, err
		    }
    EcrTokenGen = p.clock.Now()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and fixed.

…al Variables and put it into struct. Changed Session NEW deprecation.
@VF-mbrauer VF-mbrauer requested a review from chen-keinan April 18, 2022 21:12
@danielpacak danielpacak removed their request for review April 22, 2022 09:08

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 0.15.0
appVersion: 0.15.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be restored as it was , we only change versions when we release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to normal.

Back to normal before the change as this is part of the release process which will change this values.
@VF-mbrauer VF-mbrauer requested a review from chen-keinan April 28, 2022 08:00
@danielpacak danielpacak self-requested a review May 11, 2022 08:05
Copy link
Contributor

@danielpacak danielpacak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a closer look at this PR from the integration stand point and it feels to me overly complex and not easy to maintain. Especially that it requires kiam, which is not actively maintained anymore, and even kiam maintainers suggested switching over to IAM roles for service accounts some time ago.

Besides, my concern is that we cannot provide a real support in case of problems with such implementation, because proposed solution requires configuring Starboard in many places: Trivy plugin config, custom annotation added to each scan job, annotation added the starboard-operator namespace, and IAM Roles.

I missed high level explanation how this solution works, but after spending some time reverse engineering the source code I realized it does the following:

  1. Retrieve an ECR authorization token based on AWS environment variables that kiam injects by looking at the custom iam.amazonaws.com/role annotation added to each scan job.
  2. Pass the authorization token to Trivy with TRIVY_USERNAME=AWS and TRIVY_PASSWROD=<token> environment variables:
    TRIVY_USERNAME="AWS" \
    TRIVY_PASSWORD="<token>" \
    trivy image <image ref>
    

However, we can accomplish the same by passing AWS environment variables directly to Trivy, which will exchange them for authorization token to pull down an image before scan:

AWS_ACCESS_KEY_ID=*** \
AWS_SECRET_ACCESS_KEY=*** \
AWS_DEFAULT_REGION=eu-central-1 \
trivy image <image ref>

This would not require kiam or any solution like that. The same capability is actually used when you deploy Starboard in EKS as explained on
https://aquasecurity.github.io/starboard/v0.15.4/vulnerability-scanning/managed-registries/#amazon-elastic-container-registry-ecr

Having said that, I would think of a solution that does not require installing and documenting any additional software that's not actively maintained. Ideally Starboard should be self-contained and self-sufficient to perform scans without overly complex configuration.

Notice also, that we've been experimenting with Trivy filesystem scanner to avoid cloud privider-specifi code in Starboard and scan without exchanging credentials. Check our design docs in this repo for more details.

deploy/helm/Chart.yaml Outdated Show resolved Hide resolved
@pschulten
Copy link

pschulten commented May 13, 2022

@danielpacak I very much share your reluctance to introduce complexity, but using the normal aws sdk credential auth flow for ECR pulls would be very helpful for people running non-EKS cluster (like us 😊)

This branch works great for us using our standard node role via the instance profile flow.
It would also work with any other mechanism like kube2iam (which isn't unmaintained) or a custom HashiCorp vault backed credential_process, etc.

Ideally we would like to run in trivy.mode=ClientServer and the server just receives the image URL, scans the image in a central place (and benefits a lot by container image layer caching) and sends the response report back to starboard, but I guess this would be a major architectural refactoring.

@VF-mbrauer
Copy link
Contributor Author

@chen-keinan I guess we can close? Do we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants