Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark supports query trace #12084

Open
jinchengchenghh opened this issue Jan 14, 2025 · 3 comments
Open

Spark supports query trace #12084

jinchengchenghh opened this issue Jan 14, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@jinchengchenghh
Copy link
Contributor

jinchengchenghh commented Jan 14, 2025

Description

Query trace is a very useful feature, but I meets some exceptions when I try to enable it in Gluten.

  1. Gluten QueryCtx queryId is empty "", so the generated directory missed the queryId layer which must be set in query trace. Since Gluten uses single thread execution and auto incremental vid , so the taskId is enough to distinguish the velox plan.
# TaskId
static std::atomic<uint32_t> vtId{0}; // Velox task ID to distinguish from Spark task ID.
  task_ = velox::exec::Task::create(
      fmt::format(
          "Gluten_Stage_{}_TID_{}_VTID_{}",
          std::to_string(taskInfo_.stageId),
          std::to_string(taskInfo_.taskId),
          std::to_string(vtId++)),
      std::move(planFragment),
      0,
      std::move(queryCtx),
      velox::exec::Task::ExecutionMode::kSerial);
# queryId is ""
std::shared_ptr<velox::core::QueryCtx> ctx = velox::core::QueryCtx::create(
      nullptr,
      facebook::velox::core::QueryConfig{getQueryContextConf()},
      connectorConfigs,
      gluten::VeloxBackend::get()->getAsyncDataCache(),
      memoryManager_->getAggregateMemoryPool(),
      spillExecutor_.get(),
      "");

Generated query trace directory.

/tmp/query_trace/
└── Gluten_Stage_0_TID_0_VTID_0
    ├── 7
    │   └── 0
    │       └── 0
    │           ├── op_input_trace.data
    │           └── op_trace_summary.json
    └── task_trace_meta.json

Receives the exception.

/mnt/DP_disk1/code/velox/build/velox/tool/trace# ./velox_query_replayer  --root_dir /tmp/query_trace --task_id Gluten_Stage_0_TID_0_VTID_0 --summary
terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
  what():  Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: --query_id must be provided
Retriable: False
Expression: !FLAGS_query_id.empty()
Function: init
File: /mnt/DP_disk1/code/velox/velox/tool/trace/TraceReplayRunner.cpp
Line: 241
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.

Aborted (core dumped)

Since QueryCtx does not requires the queryId to be set, so I think the empty queryId is reasonable, so we need to support it in QueryTrace.

  1. Register the Spark functions and distinguish from Presto functions by FLAGS_xx, we cannot register both of them because the functions overwrite may trigger some unexpected behavior.
  2. Spark ValueStreamNode is hard to serialize and deserialize, we may not need to serialize the total plan, extract only the node required to serialize.
@jinchengchenghh jinchengchenghh added the enhancement New feature or request label Jan 14, 2025
@jinchengchenghh
Copy link
Contributor Author

@duanmeng Can you help take a look? Thanks!

@duanmeng duanmeng self-assigned this Jan 14, 2025
@duanmeng
Copy link
Collaborator

There is no queryId in gluten spark hence we need to handle this case in the query trace replayer. cc @xiaoxmeng

@duanmeng
Copy link
Collaborator

Is there any chance to use spark applciation_id as the queryID? This path structure pattern $traceRoot/$StageID_$TID_$VID may be problematic as multiple spark applications may have the same $StageID_$TID_$VID, especially when we use a remote file system to store tracing data, which is a common case. Or we must find and configure a different tracing root directory for each spark application, which is inconvenient. WDYT @jinchengchenghh @xiaoxmeng

jinchengchenghh added a commit to apache/incubator-gluten that referenced this issue Jan 22, 2025
Run the MicroBenchmark to generate stage level plan and then enable query trace in benchmark to profile node level query. Benchmark with query trace enabled replaces ValueStreamNode which is hard to serialize to ValuesNode. This issue may be fixed by plan serialization optimization that only serializes the plan node to profile in velox query trace. facebookincubator/velox#12084
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants