We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes are not getting Ready in time, failing the test case scenario.
[2024-12-19T07:55:06.329Z] def test_all_worker_nodes_short_network_failure( [2024-12-19T07:55:06.329Z] self, nodes, setup, mcg_obj, bucket_factory, node_restart_teardown [2024-12-19T07:55:06.329Z] ): [2024-12-19T07:55:06.329Z] """ [2024-12-19T07:55:06.329Z] OCS-1432/OCS-1433: [2024-12-19T07:55:06.329Z] - Start DeploymentConfig based app pods [2024-12-19T07:55:06.329Z] - Make all the worker nodes unresponsive by doing abrupt network failure [2024-12-19T07:55:06.329Z] - Reboot the unresponsive node after short duration of ~300 seconds [2024-12-19T07:55:06.329Z] - When unresponsive node recovers, app pods and ceph cluster should recover [2024-12-19T07:55:06.329Z] - Again run IOs from app pods [2024-12-19T07:55:06.329Z] - Create OBC and read/write objects [2024-12-19T07:55:06.329Z] """ [2024-12-19T07:55:06.329Z] pod_objs = setup [2024-12-19T07:55:06.329Z] worker_nodes = node.get_worker_nodes() [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] # Run IO on pods [2024-12-19T07:55:06.329Z] logger.info(f"Starting IO on {len(pod_objs)} app pods") [2024-12-19T07:55:06.329Z] with ThreadPoolExecutor() as executor: [2024-12-19T07:55:06.329Z] for pod_obj in pod_objs: [2024-12-19T07:55:06.329Z] logger.info(f"Starting IO on pod {pod_obj.name}") [2024-12-19T07:55:06.329Z] storage_type = ( [2024-12-19T07:55:06.329Z] "block" if pod_obj.pvc.get_pvc_vol_mode == "Block" else "fs" [2024-12-19T07:55:06.329Z] ) [2024-12-19T07:55:06.329Z] executor.submit( [2024-12-19T07:55:06.329Z] pod_obj.run_io, [2024-12-19T07:55:06.329Z] storage_type=storage_type, [2024-12-19T07:55:06.329Z] size="2G", [2024-12-19T07:55:06.329Z] runtime=30, [2024-12-19T07:55:06.329Z] fio_filename=f"{pod_obj.name}_io_f1", [2024-12-19T07:55:06.329Z] ) [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] logger.info(f"IO started on all {len(pod_objs)} app pods") [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] # Wait for IO results [2024-12-19T07:55:06.329Z] for pod_obj in pod_objs: [2024-12-19T07:55:06.329Z] pod.get_fio_rw_iops(pod_obj) [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] # Induce network failure on all worker nodes [2024-12-19T07:55:06.329Z] with ThreadPoolExecutor() as executor: [2024-12-19T07:55:06.329Z] for node_name in worker_nodes: [2024-12-19T07:55:06.329Z] executor.submit(node.node_network_failure, node_name, False) [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] node.wait_for_nodes_status( [2024-12-19T07:55:06.329Z] node_names=worker_nodes, status=constants.NODE_NOT_READY [2024-12-19T07:55:06.329Z] ) [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] logger.info(f"Waiting for {self.short_nw_fail_time} seconds") [2024-12-19T07:55:06.329Z] sleep(self.short_nw_fail_time) [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] # Reboot the worker nodes [2024-12-19T07:55:06.329Z] logger.info(f"Stop and start the worker nodes: {worker_nodes}") [2024-12-19T07:55:06.329Z] worker_node_objs = node.get_node_objs(worker_nodes) [2024-12-19T07:55:06.329Z] if config.ENV_DATA["platform"].lower() == constants.GCP_PLATFORM: [2024-12-19T07:55:06.329Z] nodes.restart_nodes_by_stop_and_start(worker_node_objs, force=False) [2024-12-19T07:55:06.329Z] else: [2024-12-19T07:55:06.329Z] nodes.restart_nodes_by_stop_and_start(worker_node_objs) [2024-12-19T07:55:06.329Z] [2024-12-19T07:55:06.329Z] try: [2024-12-19T07:55:06.329Z] > node.wait_for_nodes_status( [2024-12-19T07:55:06.329Z] node_names=worker_nodes, status=constants.NODE_READY [2024-12-19T07:55:06.329Z] )
[2024-12-19T09:17:40.774Z] �[1m�[31mE ocs_ci.ocs.exceptions.TimeoutExpiredError: Timed out after 180s running get_node_objs(['ip-10-0-0-158.us-west-2.compute.internal', 'ip-10-0-0-187.us-west-2.compute.internal', 'ip-10-0-0-195.us-west-2.compute.internal', 'ip-10-0-0-224.us-west-2.compute.internal', 'ip-10-0-0-24.us-west-2.compute.internal', 'ip-10-0-0-76.us-west-2.compute.internal'])�[0m
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Nodes are not getting Ready in time, failing the test case scenario.
The text was updated successfully, but these errors were encountered: