Spark supports query trace #12084

jinchengchenghh · 2025-01-14T06:24:18Z

Description

Query trace is a very useful feature, but I meets some exceptions when I try to enable it in Gluten.

Gluten QueryCtx queryId is empty "", so the generated directory missed the queryId layer which must be set in query trace. Since Gluten uses single thread execution and auto incremental vid , so the taskId is enough to distinguish the velox plan.

# TaskId
static std::atomic<uint32_t> vtId{0}; // Velox task ID to distinguish from Spark task ID.
  task_ = velox::exec::Task::create(
      fmt::format(
          "Gluten_Stage_{}_TID_{}_VTID_{}",
          std::to_string(taskInfo_.stageId),
          std::to_string(taskInfo_.taskId),
          std::to_string(vtId++)),
      std::move(planFragment),
      0,
      std::move(queryCtx),
      velox::exec::Task::ExecutionMode::kSerial);

# queryId is ""
std::shared_ptr<velox::core::QueryCtx> ctx = velox::core::QueryCtx::create(
      nullptr,
      facebook::velox::core::QueryConfig{getQueryContextConf()},
      connectorConfigs,
      gluten::VeloxBackend::get()->getAsyncDataCache(),
      memoryManager_->getAggregateMemoryPool(),
      spillExecutor_.get(),
      "");

Generated query trace directory.

/tmp/query_trace/
└── Gluten_Stage_0_TID_0_VTID_0
    ├── 7
    │   └── 0
    │       └── 0
    │           ├── op_input_trace.data
    │           └── op_trace_summary.json
    └── task_trace_meta.json

Receives the exception.

/mnt/DP_disk1/code/velox/build/velox/tool/trace# ./velox_query_replayer  --root_dir /tmp/query_trace --task_id Gluten_Stage_0_TID_0_VTID_0 --summary
terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
  what():  Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: --query_id must be provided
Retriable: False
Expression: !FLAGS_query_id.empty()
Function: init
File: /mnt/DP_disk1/code/velox/velox/tool/trace/TraceReplayRunner.cpp
Line: 241
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.

Aborted (core dumped)

Since QueryCtx does not requires the queryId to be set, so I think the empty queryId is reasonable, so we need to support it in QueryTrace.

Register the Spark functions and distinguish from Presto functions by FLAGS_xx, we cannot register both of them because the functions overwrite may trigger some unexpected behavior.
Spark ValueStreamNode is hard to serialize and deserialize, we may not need to serialize the total plan, extract only the node required to serialize.

The text was updated successfully, but these errors were encountered:

jinchengchenghh · 2025-01-14T06:24:39Z

@duanmeng Can you help take a look? Thanks!

duanmeng · 2025-01-14T06:52:12Z

There is no queryId in gluten spark hence we need to handle this case in the query trace replayer. cc @xiaoxmeng

duanmeng · 2025-01-15T11:56:44Z

Is there any chance to use spark applciation_id as the queryID? This path structure pattern $traceRoot/$StageID_$TID_$VID may be problematic as multiple spark applications may have the same $StageID_$TID_$VID, especially when we use a remote file system to store tracing data, which is a common case. Or we must find and configure a different tracing root directory for each spark application, which is inconvenient. WDYT @jinchengchenghh @xiaoxmeng

Run the MicroBenchmark to generate stage level plan and then enable query trace in benchmark to profile node level query. Benchmark with query trace enabled replaces ValueStreamNode which is hard to serialize to ValuesNode. This issue may be fixed by plan serialization optimization that only serializes the plan node to profile in velox query trace. facebookincubator/velox#12084

jinchengchenghh added the enhancement New feature or request label Jan 14, 2025

duanmeng self-assigned this Jan 14, 2025

jinchengchenghh mentioned this issue Jan 14, 2025

Support Velox Query Trace apache/incubator-gluten#8379

Closed

jinchengchenghh mentioned this issue Jan 16, 2025

[GLUTEN-8379][VL] Support query trace apache/incubator-gluten#8380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark supports query trace #12084

Spark supports query trace #12084

jinchengchenghh commented Jan 14, 2025 •

edited

Loading

jinchengchenghh commented Jan 14, 2025

duanmeng commented Jan 14, 2025

duanmeng commented Jan 15, 2025

Spark supports query trace #12084

Spark supports query trace #12084

Comments

jinchengchenghh commented Jan 14, 2025 • edited Loading

Description

jinchengchenghh commented Jan 14, 2025

duanmeng commented Jan 14, 2025

duanmeng commented Jan 15, 2025

jinchengchenghh commented Jan 14, 2025 •

edited

Loading