You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given an Entity where the join key column is called something else in the data source, a field_mapping can be set on the data source to map the source column name to the join key. get_historical_features should then recognize that the join key has a field mapping, and generate the correct alias in the query.
For example:
fromfeastimportEntity, Field, FeatureStore, FeatureViewfromfeast.typesimportFloat32, Stringfromfeast.infra.offline_stores.contrib.spark_offline_store.spark_sourceimportSparkSource# Initialize Feature Store.store=FeatureStore(...)
# "driver_id" is used in the Feature Store.driver=Entity(name="driver", join_keys=["driver_id"])
# Using SparkSource as an example, but this applies to other sources.# Source data contains a primary key called "id". This is mapped to the join key "driver_id".driver_stats_src=SparkSource(
name="driver_stats",
field_mapping={"id": "driver_id"},
path=...,
file_format=...,
)
driver_stats_fv=FeatureView(
name="driver_stats",
source=driver_stats_src,
entities=[driver],
schema=[
# join key must be specified in the schema, else it is not included in driver_stats_fv.entity_columnsField(name="driver_id", dtype=String),
Field(name="stat1", dtype=Float32),
Field(name="stat2", dtype=Float32),
]
)
# Get historical featuresstore.get_historical_features(
entity_df=...,
features=[
"driver_stats:stat1",
"driver_stats:stat2",
]
When get_historical_features is run, the alias id AS driver_id should be provided to the query. In the case of Spark, for example, this should be the query:
driver_stats__subquery AS (
SELECT
event_timestamp as event_timestamp,
created as created_timestamp,
id AS driver_id,
stat1 as stat1, stat2 as stat2
FROM`feast_entity_df_677a1a6fd13443c6b0e8ccc059b25f01`WHERE event_timestamp <='2025-01-05T14:00:00'
)
Current Behavior
This is what currently happens (Spark example):
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `driver_id` cannot be resolved. Did you mean one of the following? [`id`, `stat1`, `stat2`]
Underlying Spark query:
driver_stats__subquery AS (
SELECT
event_timestamp as event_timestamp,
created as created_timestamp,
-- Here is the problem.
driver_id AS driver_id,
stat1 as stat1, stat2 as stat2
FROM`feast_entity_df_677a1a6fd13443c6b0e8ccc059b25f01`WHERE event_timestamp <='2025-01-05T14:00:00'
)
Expected Behavior
Given an Entity where the join key column is called something else in the data source, a
field_mapping
can be set on the data source to map the source column name to the join key.get_historical_features
should then recognize that the join key has a field mapping, and generate the correct alias in the query.For example:
When
get_historical_features
is run, the aliasid AS driver_id
should be provided to the query. In the case of Spark, for example, this should be the query:Current Behavior
This is what currently happens (Spark example):
Underlying Spark query:
Steps to reproduce
See example above.
Specifications
Possible Solution
See PR #4886
The text was updated successfully, but these errors were encountered: