Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference API] Fix Azure AI Studio Integration for Completions and Embeddings #119818

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

brendan-jugan-elastic
Copy link

This draft PR fixes the Inference API integration with Azure AI Foundry (previously Azure AI Studio). The previous integration was broken for both completions and embeddings models due to API changes from Microsoft.

Core Changes:

  • no longer referencing an approved list of providers
  • introducing a required AzureAiStudioDeploymentType to the service settings
    • either azure_ai_model_inference_service or serverless_api
  • modifying auth configuration for each deployment type
  • slight request format modifications after API changes
  • testing and rebranding changes are in-progress, wanted to get some eyes on the implementation while I complete them

Once testing is complete, I will add more detailed docs describing the deployment types, their configurations, and describe how to use this integration with screenshots from the Azure console.

Local Testing:

Embeddings:

PUT http://localhost:9200/_inference/text_embedding/cohere_serverless_embed

curl --location --request PUT 'http://localhost:9200/_inference/text_embedding/cohere_serverless_embed' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "service": "azureaistudio",
  "service_settings": {
    "api_key": "*****",
    "target": "https://example-target.eastus.models.ai.azure.com/embeddings",
    "deployment_type": "serverless_api",
    "deployment_name": "Cohere-embed-v3-english-hmcek"
  }
}

POST http://localhost:9200/_inference/text_embedding/cohere_serverless_embed

curl --location 'http://localhost:9200/_inference/text_embedding/cohere_serverless_embed' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "input": "What is Elastic?"
}'



PUT http://localhost:9200/_inference/text_embedding/cohere_amlis_embed

curl --location --request PUT 'http://localhost:9200/_inference/text_embedding/cohere_amlis_embed' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "service": "azureaistudio",
  "service_settings": {
    "api_key": "*****",
    "target": "https://example-target/models/embeddings",
    "deployment_type": "azure_ai_model_inference_service",
    "deployment_name": "Cohere-embed-v3-english"
  }
}'

curl --location 'http://localhost:9200/_inference/text_embedding/cohere_amlis_embed' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "input": "What is Elastic?"
}'

Completions:

PUT http://localhost:9200/_inference/completion/cohere_serverless_completion

curl --location --request PUT 'http://localhost:9200/_inference/completion/cohere_serverless_completion' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "service": "azureaistudio",
  "service_settings": {
    "api_key": "*****",
    "target": "https://example-target.eastus.models.ai.azure.com/chat/completions",
    "deployment_type": "serverless_api",
    "deployment_name": "Cohere-command-r"
  }
}'

POST http://localhost:9200/_inference/completion/cohere_serverless_completion

curl --location 'http://localhost:9200/_inference/completion/cohere_serverless_completion' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "input": "What is Elastic?"
}'



PUT http://localhost:9200/_inference/completion/cohere_amlis_completion

curl --location --request PUT 'http://localhost:9200/_inference/completion/cohere_amlis_completion' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "service": "azureaistudio",
  "service_settings": {
    "api_key": "*****",
    "target": "https://example-target.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview",
    "deployment_type": "azure_ai_model_inference_service",
    "deployment_name": "Cohere-command-r"
  }
}'

POST http://localhost:9200/_inference/completion/cohere_amlis_completion

curl --location 'http://localhost:9200/_inference/completion/cohere_amlis_completion' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic *****' \
--data '{
  "input": "What is Elastic?"
}'

Related Issues:

Helpful Links:

@brendan-jugan-elastic brendan-jugan-elastic changed the title WIP(azure_ai_foundry): fix implementation for completions and embeddings [(WIP) Inference API] Fix Azure AI Foundry Integration for Completions and Embeddings Jan 9, 2025
Copy link
Contributor

@timgrein timgrein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good stuff! 👏 Gave it a first pass and left some comments, already looking good

@brendan-jugan-elastic brendan-jugan-elastic changed the title [(WIP) Inference API] Fix Azure AI Foundry Integration for Completions and Embeddings [Inference API] Fix Azure AI Studio Integration for Completions and Embeddings Jan 10, 2025
@brendan-jugan-elastic
Copy link
Author

Note: I'm waiting to complete the Azure AI Studio -> Azure AI Foundry renaming until tomorrow. The above commits contain all of the functional/test changes for this Inference API fix.

Copy link
Contributor

@timgrein timgrein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly left comments around changes we need to address with regards to the transport level changes - we can also sync on that, it can be a bit confusing in the beginning

out.writeEnum(provider);
out.writeEnum(endpointType);
if (out.getTransportVersion().before(AZURE_AI_FOUNDRY_INTEGRATION_FIX_1_10_25)) {
out.writeEnum(AzureAiFoundryProvider.NONE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old node will does not know about the enum value AzureAiFoundryProvider.NONE as that is added in this PR. Enums are written by their ordinal values, the old node will read the ordinal value for AzureAiFoundryProvider.NONE (let's say it is 2) but then throw an error because it does not know of any AzureAiFoundryProvider enum with ordinal value 2.

If there isn't a logical mapping from the old fields (provider, endpointtype) to the new (deploymenttype, model) then it is a question of how we want this to fail.

Inference endpoint creation is a master node action. The request will be serialised from whichever node it lands on to the master node and the response in turn will be serialised back to the originating node. PutInferenceModelAction.Response contains the ServiceSettings (we return the new endpoint configuration), if the originating node is an old node that doesn't know about the new options it won't return the proper config as some fields have been lost in serialisation. We can solve for that by not allowing new AzureAiFoundry inference endpoints to be created in a mixed cluster where some nodes do not know about this change.

)
);

namedWriteables.add(
new NamedWriteableRegistry.Entry(
ServiceSettings.class,
AzureAiStudioChatCompletionServiceSettings.NAME,
AzureAiStudioChatCompletionServiceSettings::new
AzureAiFoundryChatCompletionServiceSettings.NAME,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAME has changed from azure_ai_studio_chat_completion_service_settings to azure_ai_foundry_chat_completion_service_settings. When an old node writes a named writable with the old name azure_ai_studio_chat_completion_service_settings this node will not know about it.

Because there is logic in the AzureAiFoundryChatCompletionServiceSettings serialisation code to handle backwards compatibility it is better not to change the name. Just add a comment explaining that NAME hasn't changed to maintain BWC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants