Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block extensions disallowed by policy #3259

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c2cc2c6
Block disallowed extension processing
mgunnala Nov 8, 2024
151081d
Enable policy e2e tests
mgunnala Nov 8, 2024
edec2af
Pylint
mgunnala Nov 8, 2024
a37508f
Fix e2e test failures
mgunnala Nov 11, 2024
b0da554
Address review comments
mgunnala Nov 18, 2024
a4f5cab
Merge branch 'develop' into allowlist_2
mgunnala Nov 18, 2024
699b9ba
Address review comments
mgunnala Nov 20, 2024
86de0c5
Address test review comments
mgunnala Nov 21, 2024
c3e9b89
Remove status file for single-config
mgunnala Nov 22, 2024
65d7034
Add back status file for single-config
mgunnala Nov 22, 2024
95f247a
Run e2e tests on all endorsed
mgunnala Nov 22, 2024
3b18519
Fix UT failures
mgunnala Nov 23, 2024
63da127
Pylint
mgunnala Nov 26, 2024
471cd59
Merge branch 'develop' into allowlist_2
narrieta Nov 26, 2024
8ea989b
Address review comments for agent code
mgunnala Dec 3, 2024
83f6ff0
Tests
mgunnala Dec 3, 2024
b037e41
Revert "Tests"
mgunnala Dec 3, 2024
ba3869c
Address test comments
mgunnala Dec 6, 2024
dfcc158
Address test comments
mgunnala Dec 9, 2024
fe07ffa
Merge branch 'develop' into allowlist_2
mgunnala Dec 9, 2024
a31bdcf
Address test comments
mgunnala Dec 10, 2024
5198cf8
Cleanup existing extensions on test VMs
mgunnala Dec 12, 2024
4a0a4ef
Address comments and disable dependencies e2e tests
mgunnala Dec 16, 2024
daa8017
Merge branch 'develop' into allowlist_2
mgunnala Dec 16, 2024
bacc425
Add fixes for e2e tests
mgunnala Dec 17, 2024
3319916
Add back delete failure test case
mgunnala Dec 17, 2024
8c31798
Address comments round 3
mgunnala Dec 17, 2024
32ef5c1
Address comments
mgunnala Dec 17, 2024
f0895b7
Merge branch 'develop' into allowlist_2
mgunnala Dec 17, 2024
0c9f1c7
Pylint
mgunnala Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 102 additions & 17 deletions azurelinuxagent/ga/exthandlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from azurelinuxagent.common.agent_supported_feature import get_agent_supported_features_list_for_extensions, \
SupportedFeatureNames, get_supported_feature_by_name, get_agent_supported_features_list_for_crp
from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator
from azurelinuxagent.ga.policy.policy_engine import ExtensionPolicyEngine
from azurelinuxagent.common.datacontract import get_properties, set_properties
from azurelinuxagent.common.errorstate import ErrorState
from azurelinuxagent.common.event import add_event, elapsed_milliseconds, WALAEventOperation, \
Expand Down Expand Up @@ -86,6 +87,26 @@
# This is the default sequence number we use when there are no settings available for Handlers
_DEFAULT_SEQ_NO = "0"

# For policy-related errors, this mapping is used to generate user-friendly error messages and determine the appropriate
# terminal error code based on the blocked operation.
# Format: {<ExtensionRequestedState>: (<str>, <ExtensionErrorCodes>)}
# - The first element of the tuple is a user-friendly operation name included in error messages.
# - The second element of the tuple is the CRP terminal error code for the operation.
_POLICY_ERROR_MAP = \
{
ExtensionRequestedState.Enabled: ('run', ExtensionErrorCodes.PluginEnableProcessingFailed),
# Note: currently, when uninstall is requested for an extension, CRP polls until the agent does not
# report status for that extension, or until timeout is reached. In the case of a policy error, the
# agent reports failed status on behalf of the extension, which will cause CRP to poll for the full
# timeout, instead of failing fast.
#
# TODO: CRP does not currently have a terminal error code for uninstall. Once this code is added, use
# it instead of PluginDisableProcessingFailed below.
ExtensionRequestedState.Uninstall: ('uninstall', ExtensionErrorCodes.PluginDisableProcessingFailed),
# "Disable" is an internal operation, users are unaware of it. We surface the term "uninstall" instead.
ExtensionRequestedState.Disabled: ('uninstall', ExtensionErrorCodes.PluginDisableProcessingFailed),
}


class ExtHandlerStatusValue(object):
"""
Expand Down Expand Up @@ -482,6 +503,15 @@ def handle_ext_handlers(self, goal_state_id):
depends_on_err_msg = None
extensions_enabled = conf.get_extensions_enabled()

# Instantiate policy engine, and use same engine to handle all extension handlers.
narrieta marked this conversation as resolved.
Show resolved Hide resolved
# If an error is thrown during policy engine initialization, we block all extensions and report the error via handler/extension status for
mgunnala marked this conversation as resolved.
Show resolved Hide resolved
# each extension.
policy_error = None
try:
policy_engine = ExtensionPolicyEngine()
except Exception as ex:
policy_error = ex

for extension, ext_handler in all_extensions:

handler_i = ExtHandlerInstance(ext_handler, self.protocol, extension=extension)
Expand All @@ -498,11 +528,22 @@ def handle_ext_handlers(self, goal_state_id):
logger.info("{0}: {1}".format(ext_full_name, msg))
add_event(op=WALAEventOperation.ExtensionProcessing, message="{0}: {1}".format(ext_full_name, msg))
handler_i.set_handler_status(status=ExtHandlerStatusValue.not_ready, message=msg, code=-1)
handler_i.create_status_file_if_not_exist(extension,
status=ExtensionStatusValue.error,
code=-1,
operation=handler_i.operation,
message=msg)
handler_i.create_status_file(extension,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: the previous code was very explicit (in the method's name) about not overwriting existing files. I think that is a good choice. We should probably be explicit in the new method as well, remove the default value and set overwrite=False here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove the default and explicitly set the value for overwrite in all calls

status=ExtensionStatusValue.error,
code=-1,
operation=handler_i.operation,
message=msg)
continue

# If an error was thrown during policy engine initialization, skip further processing of the extension.
# CRP is still waiting for status, so we report error status here.
# of the extension.
mgunnala marked this conversation as resolved.
Show resolved Hide resolved
policy_op, policy_err_code = _POLICY_ERROR_MAP.get(ext_handler.state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest removing 'policy' from policy_err_code. the error code is not related to policy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'policy_op' is also kind of misleading, since it is not a policy operation... maybe just 'operation' and 'error_code'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to "operation" and "error_code"

if policy_error is not None:
msg = "Extension will not be processed: {0}".format(ustr(policy_error))
self.__report_policy_error(ext_handler_i=handler_i, error_code=policy_err_code,
report_op=handler_i.operation, message=msg,
extension=extension)
continue

# In case of depends-on errors, we skip processing extensions if there was an error processing dependent extensions.
Expand All @@ -516,18 +557,33 @@ def handle_ext_handlers(self, goal_state_id):
if handler_i.get_handler_status() is None:
handler_i.set_handler_status(message=depends_on_err_msg, code=-1)

handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=-1,
operation=WALAEventOperation.ExtensionProcessing,
message=depends_on_err_msg)
handler_i.create_status_file(extension, status=ExtensionStatusValue.error, code=-1,
operation=WALAEventOperation.ExtensionProcessing,
message=depends_on_err_msg)

# For SC extensions, overwrite the HandlerStatus with the relevant message
else:
handler_i.set_handler_status(message=depends_on_err_msg, code=-1)

continue

# Invoke policy engine to determine if extension is allowed. If disallowed, report an error on behalf of
# the extension and do not process the extension. Dependent extensions will also be blocked.
extension_allowed = policy_engine.should_allow_extension(ext_handler.name)
if not extension_allowed:
msg = (
"Extension will not be processed: failed to {0} extension '{1}' because it is not specified "
"in the allowlist. To {0}, add the extension to the allowed list in the policy file ('{2}')."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we use both the terms "allowlist" and "allowed list". Does this make sense?
Maybe something like "add <ext_name> to the list of allowed extensions in the policy file"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"list of allowed extensions", though verbose, looks good to me. maybe also use something similar for "allowList"? there is no "allowList" element in the policy file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narrieta how about something like this:
"Extension will not be processed: failed to run extension 'CustomScript' because it is not specified as an allowed extension. To run, add the extension to the list of allowed extensions in the policy file ('/etc/waagent_policy.json')."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the message as stated above

).format(policy_op, ext_handler.name, conf.get_policy_file_path())
self.__report_policy_error(handler_i, policy_err_code, report_op=handler_i.operation,
message=msg, extension=extension)

# Process extensions and get if it was successfully executed or not
extension_success = self.handle_ext_handler(handler_i, extension, goal_state_id)
# If extension was blocked by policy, treat the extension as failed and do not process the handler.
if not extension_allowed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge this 'if not extension_allowed:' with the one just above it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, made this change, thanks!

extension_success = False
else:
extension_success = self.handle_ext_handler(handler_i, extension, goal_state_id)

dep_level = self.__get_dependency_level((extension, ext_handler))
if 0 <= dep_level < max_dep_level:
Expand Down Expand Up @@ -642,8 +698,8 @@ def handle_ext_handler(self, ext_handler_i, extension, goal_state_id):
# This error is only thrown for enable operation on MultiConfig extension.
# Since these are maintained by the extensions, the expectation here is that they would update their status files appropriately with their errors.
# The extensions should already have a placeholder status file, but incase they dont, setting one here to fail fast.
ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,
operation=ext_handler_i.operation, message=err_msg)
ext_handler_i.create_status_file(extension, status=ExtensionStatusValue.error, code=error.code,
operation=ext_handler_i.operation, message=err_msg)
add_event(name=ext_name, version=ext_handler_i.ext_handler.version, op=ext_handler_i.operation,
is_success=False, log_event=True, message=err_msg)
except ExtensionsGoalStateError as error:
Expand Down Expand Up @@ -683,15 +739,42 @@ def __handle_and_report_ext_handler_errors(ext_handler_i, error, report_op, mess
# file with failure since the extensions wont be called where they can create their status files.
# This way we guarantee reporting back to CRP
if ext_handler_i.should_perform_multi_config_op(extension):
ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,
operation=report_op, message=message)
ext_handler_i.create_status_file(extension, status=ExtensionStatusValue.error, code=error.code,
operation=report_op, message=message)

if report:
name = ext_handler_i.get_extension_full_name(extension)
handler_version = ext_handler_i.ext_handler.version
add_event(name=name, version=handler_version, op=report_op, is_success=False, log_event=True,
message=message)

@staticmethod
def __report_policy_error(ext_handler_i, error_code, report_op, message, extension=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the default value for the 'extension' parameter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

# TODO: Consider merging this function with __handle_and_report_ext_handler_errors() above, after investigating
# the impact of this change.
#
# If extension status is present, CRP will ignore handler status and report extension status. In the case of policy errors,
# extensions are not processed, so collect_ext_status() reports transitioning status on behalf of the extension.
# However, extensions blocked by policy should fail fast, so agent should write a .status file for policy failures.
# Note that __handle_and_report_ext_handler_errors() does not create the file for single-config extensions, but changing
# it will require additional testing/investigation. As a temporary workaround, this separate function was created
# to write a status file for single-config extensions.

# Set handler status for all extensions (with and without settings)
ext_handler_i.set_handler_status(message=message, code=error_code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment pointing out that we are intentionally reporting the error at the handler and status level

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like "We report the same error at both the handler status and extension status level." ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what i was trying to point is that reporting the error both at the handler and status level is not needed (or should not be needed). e.g. install errors are reported at the handler level, while single-config errors are reported at the status level.


# Create status file for extensions with settings (single and multi config).
# If status file already exists, overwrite it. If an extension was previously reporting status and is now
# blocked by a policy error, we should report the policy error.
if extension is not None:
ext_handler_i.create_status_file(extension, status=ExtensionStatusValue.error, code=error_code,
operation=report_op, message=message, overwrite=True)
narrieta marked this conversation as resolved.
Show resolved Hide resolved

name = ext_handler_i.get_extension_full_name(extension)
handler_version = ext_handler_i.ext_handler.version
add_event(name=name, version=handler_version, op=report_op, is_success=False, log_event=True,
message=message)

def handle_enable(self, ext_handler_i, extension):
"""
1- Ensure the handler is installed
Expand Down Expand Up @@ -988,8 +1071,8 @@ def report_ext_handler_status(self, vm_status, ext_handler, goal_state_changed):
# For MultiConfig, we need to report status per extension even for Handler level failures.
# If we have HandlerStatus for a MultiConfig handler and GS is requesting for it, we would report status per
# extension even if HandlerState == NotInstalled (Sample scenario: ExtensionsGoalStateError, DecideVersionError, etc)
# We also need to report extension status for an uninstalled handler if extensions are disabled because CRP
# waits for extension runtime status before failing the extension operation.
# We also need to report extension status for an uninstalled handler if extensions are disabled, or if the extension
# failed due to policy, because CRP waits for extension runtime status before failing the extension operation.
maddieford marked this conversation as resolved.
Show resolved Hide resolved
if handler_state != ExtHandlerState.NotInstalled or ext_handler.supports_multi_config or not conf.get_extensions_enabled():

# Since we require reading the Manifest for reading the heartbeat, this would fail if HandlerManifest not found.
Expand Down Expand Up @@ -1343,9 +1426,11 @@ def set_extension_resource_limits(self):
extension_name=extension_name, cpu_quota=resource_limits.get_extension_slice_cpu_quota())
CGroupConfigurator.get_instance().set_extension_services_cpu_memory_quota(resource_limits.get_service_list())

def create_status_file_if_not_exist(self, extension, status, code, operation, message):
def create_status_file(self, extension, status, code, operation, message, overwrite=False):
# Create status file for specified extension. If overwrite is true, overwrite any existing status file. If
# false, create a status file only if it does not already exist.
_, status_path = self.get_status_file_path(extension)
if status_path is not None and not os.path.exists(status_path):
if status_path is not None and (overwrite or not os.path.exists(status_path)):
now = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
status_contents = [
{
Expand Down
11 changes: 2 additions & 9 deletions azurelinuxagent/ga/policy/policy_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,6 @@
_MAX_SUPPORTED_POLICY_VERSION = "0.1.0"


class PolicyError(AgentError):
"""
Error raised during agent policy enforcement.
"""


class InvalidPolicyError(AgentError):
"""
Error raised if user-provided policy is invalid.
Expand All @@ -50,14 +44,13 @@ def __init__(self, msg, inner=None):
msg = "Customer-provided policy file ('{0}') is invalid, please correct the following error: {1}".format(conf.get_policy_file_path(), msg)
super(InvalidPolicyError, self).__init__(msg, inner)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an INFO message just after the check for enabled stating that we are using Policy? This makes clearer the fact that we are now processing policies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__read_policy() is called right after the check for enabled, and it logs the following statement:

Policy enforcement is enabled. Enforcing policy using policy file found at '<path>'. File contents: <policy>

Is that sufficient, or do you think we need an additional log message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sufficient, but the message should probably be in the caller instead of read_policy. Who knows, as code evolves we may add other code before read_policy, or call read_policy multiple times.

Alternatively, the caller can log "Policy enforcement is enabled." and read_policy "Enforcing policy using policy file found at ''. File contents: "

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

class _PolicyEngine(object):
"""
Implements base policy engine API.
"""
def __init__(self):
# Set defaults for policy
self._policy_enforcement_enabled = self.__get_policy_enforcement_enabled()
self._policy_enforcement_enabled = self.get_policy_enforcement_enabled()
mgunnala marked this conversation as resolved.
Show resolved Hide resolved
if not self.policy_enforcement_enabled:
return

Expand All @@ -76,7 +69,7 @@ def _log_policy_event(msg, is_success=True, op=WALAEventOperation.Policy, send_e
add_event(op=op, message=msg, is_success=is_success, log_event=False)

@staticmethod
def __get_policy_enforcement_enabled():
def get_policy_enforcement_enabled():
"""
Policy will be enabled if (1) policy file exists at the expected location and (2) the conf flag "Debug.EnableExtensionPolicy" is true.
"""
Expand Down
Loading