Cleanup CPU predict function. #11139

trivialfis · 2025-01-02T14:02:20Z

Remove predict instance. It's dead code as we can't use it outside of XGBoost even with C++ include.
Remove unroll. No performance benefit.
Optimize dense QDM inference.
optimize data loading by directly copying data to feature vector instead of going through a workspace.

Partially address #10793

The optimization mostly focuses on dense data and the result varies between CPUs:

| Xeon(R) Gold 6128 |            DMatrix |    QuantileDMatrix |
|-------------------+--------------------+--------------------|
| Master            | 27.980122327804565 | 55.665775775909424 |
| PR                |  23.63674759864807 | 30.158272981643677 |

| Ryzen 9 7900X3D |            DMatrix |    QuantileDMatrix |
|-----------------+--------------------+--------------------|
| Master          | 24.764960527420044 | 31.460495710372925 |
| PR              | 22.532921314239502 | 21.412014961242676 |

trivialfis · 2025-01-06T13:50:37Z

@razdoburdin Could you please help take a look into the optimization?

I'm not an expert in CPU optimization. The changes in the predictor affects the Xeon much more significantly than the Ryzen. If I remove the dense optimization, it adds about 3 seconds to Ryzen, but 20 seconds to the Xeon.

Looking at some profiling results on Ryzen, the bottleneck seems to be in data loading (movss/movl). Would love to get some opinions.

razdoburdin · 2025-01-09T10:26:37Z

@razdoburdin Could you please help take a look into the optimization?

I'm not an expert in CPU optimization. The changes in the predictor affects the Xeon much more significantly than the Ryzen. If I remove the dense optimization, it adds about 3 seconds to Ryzen, but 20 seconds to the Xeon.

Looking at some profiling results on Ryzen, the bottleneck seems to be in data loading (movss/movl). Would love to get some opinions.

It is hard to give the exact answer without deep investigation of the changes. My hypothesis are:

Xeon benefits more from vectorization due to AVX512 support
Xeon has much smaller L3 cache, that makes memory access optimizations more critical.

trivialfis · 2025-01-09T10:46:05Z

@razdoburdin Thank you for sharing, could you please help review the changes in the CPU predictor when you are available?

It is hard to give the exact answer without deep investigation of the changes

Currently, the evaluation might be even more expensive than training for some datasets. Would be great if we could get some help on that.

razdoburdin

The PR looks good for me.
As for future prediction optimization, I plan to work on it, but latter this year.

trivialfis · 2025-01-10T09:51:10Z

As for future prediction optimization, I plan to work on it, but latter this year.

Thank you for looking into it. Feel free to ping me if there's anything I can help.

include/xgboost/tree_model.h

trivialfis · 2025-01-11T09:11:59Z

@hcho3 Could you please help approve the PR if there's no further change request? The CI failure is unrelated (sklearn update).

Cleanup CPU predict function.

d40949e

trivialfis force-pushed the cleanup-predict branch from 7a9c90f to d40949e Compare January 2, 2025 14:07

trivialfis added 6 commits January 3, 2025 02:50

Optimize QDM inference.

e86e93d

Fixes, lint.

e51877b

cat.

06993ef

Direct fill.

b2caadb

Fixes.

d610455

lint.

25d1d7c

trivialfis mentioned this pull request Jan 6, 2025

Auto encoding for categorical data during inference. #11088

Open

7 tasks

Merge branch 'master' into cleanup-predict

d32fdb8

razdoburdin approved these changes Jan 10, 2025

View reviewed changes

hcho3 requested changes Jan 10, 2025

View reviewed changes

include/xgboost/tree_model.h Outdated Show resolved Hide resolved

trivialfis added 2 commits January 11, 2025 01:19

Update comment.

ac79286

Merge branch 'master' into cleanup-predict

a484497

hcho3 approved these changes Jan 11, 2025

View reviewed changes

trivialfis merged commit 712e39d into dmlc:master Jan 11, 2025
57 of 59 checks passed

trivialfis deleted the cleanup-predict branch January 11, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup CPU predict function. #11139

Cleanup CPU predict function. #11139

trivialfis commented Jan 2, 2025 •

edited

Loading

trivialfis commented Jan 6, 2025 •

edited

Loading

razdoburdin commented Jan 9, 2025

trivialfis commented Jan 9, 2025

razdoburdin left a comment

trivialfis commented Jan 10, 2025

trivialfis commented Jan 11, 2025

Cleanup CPU predict function. #11139

Cleanup CPU predict function. #11139

Conversation

trivialfis commented Jan 2, 2025 • edited Loading

trivialfis commented Jan 6, 2025 • edited Loading

razdoburdin commented Jan 9, 2025

trivialfis commented Jan 9, 2025

razdoburdin left a comment

Choose a reason for hiding this comment

trivialfis commented Jan 10, 2025

trivialfis commented Jan 11, 2025

trivialfis commented Jan 2, 2025 •

edited

Loading

trivialfis commented Jan 6, 2025 •

edited

Loading