-
Notifications
You must be signed in to change notification settings - Fork 647
Add Azure AI Foundry support #9974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Bug in the FoundryLocalManager where the model is always downloaded (not doing case-insensitive checks for local models) Bug that the WaitFor isn't propogating across and the web app in playground won't start
… requires Not sure this is 100% right as the ModelInfo value isn't available until the model is downloaded, so if you don't do WaitFor Aspire will build the connection string with some null values. Also, the WaitFor, while it pauses the web app from starting, doesn't then release it to start for some reason
This should be delegated to the hosting resource to provide a valid endpoint, either generated by something like Foundry Local, or provided as expected using an endpoint from the Azure resource
…nto sebros/aifoundry
<IsPackable>true</IsPackable> | ||
<PackageTags>aspire integration hosting azure openai ai aifoundry foundry</PackageTags> | ||
<Description>Azure AI Foundry resource types for .NET Aspire.</Description> | ||
<PackageIconFullPath>$(SharedDir)AzureOpenAI_256x.png</PackageIconFullPath> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong icon, need to use the foundry one
Apparently the SDK just got on NuGet yesterday: https://www.nuget.org/packages/Microsoft.AI.Foundry.Local/0.1.0 We should be able to remove the /Internal folder with the SDK files that were copied |
@@ -0,0 +1,552 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of untested logic...
/// <summary> | ||
/// Gets the "connectionString" output reference from the Azure AI Services resource. | ||
/// </summary> | ||
public BicepOutputReference AIFoundryApiEndpoint => new("aiFoundryApiEndpoint", this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to expose the "NameOutputReference" property like the other bicep resources
/// <summary> | ||
/// Gets the "connectionString" output reference from the Azure AI Services resource. | ||
/// </summary> | ||
public BicepOutputReference AIFoundryApiEndpoint => new("aiFoundryApiEndpoint", this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call this endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to just call this "Endpoint" because the way Azure does this is there are multiple endpoints on each CognitiveServices account.
There is a collection of endpoints that are indexed by name.
/// <summary> | ||
/// Gets or sets a value indicating whether the resource is local. | ||
/// </summary> | ||
public bool IsLocal { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have IsLocal, IsContainer, IsEmulator 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use IsEmulator
rather than IsLocal
, since it's pretty analogous to an emulator in the way we use Foundry Local.
public class AzureAIFoundryResource(string name, Action<AzureResourceInfrastructure> configureInfrastructure) : | ||
AzureProvisioningResource(name, configureInfrastructure), | ||
IResourceWithConnectionString, | ||
IResourceWithEndpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do this anywhere. Usually you still need to get the underlying emulator to get the endpoints. I think if we do this for foundry local, we should consider doing it for the other resources that have emulators.
public class AzureAIFoundryResource(string name, Action<AzureResourceInfrastructure> configureInfrastructure) : | |
AzureProvisioningResource(name, configureInfrastructure), | |
IResourceWithConnectionString, | |
IResourceWithEndpoints | |
public class AzureAIFoundryResource(string name, Action<AzureResourceInfrastructure> configureInfrastructure) : | |
AzureProvisioningResource(name, configureInfrastructure), | |
IResourceWithConnectionString |
|
||
namespace Aspire.Hosting.Azure.AIFoundry; | ||
|
||
internal sealed class FoundryHealthCheck(FoundryLocalManager manager) : IHealthCheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this should be an http health check, but I understand the manager handles it.
QNNExecutionProvider | ||
} | ||
|
||
internal sealed partial class FoundryLocalManager : IDisposable, IAsyncDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of logic that doesn't seem to have any logging. Make me wonder if it should be a global resource that shows up in the dashboard.
|
||
resourceBuilder | ||
.WithHttpEndpoint(env: "PORT", isProxied: false, port: 6914, name: AzureAIFoundryResource.PrimaryEndpointName) | ||
.WithExternalHttpEndpoints() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?? Doesn't matter for local.
} | ||
else | ||
{ | ||
await rns.PublishUpdateAsync(resource, state => state with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No way to restart? No way to see logs as to why it's not starting...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my testing, there's no way to push logs from a "background task" such as this up to the parent resource log stream. But also, the SDK we use to manage Foundry Local allow us to hook into that anyway, as it manages the calls to the CLI or control plane (depending on the call you're making), so a failure is opaque to us.
As for the ability to restart/etc. let's tackle that via a future PR - this one can focus on MVP of Foundry Local + Azure AI Foundry, then add the nice-to-haves.
var rns = @event.Services.GetRequiredService<ResourceNotificationService>(); | ||
var manager = @event.Services.GetRequiredService<FoundryLocalManager>(); | ||
|
||
await rns.WaitForResourceAsync(@event.Resource.Name, KnownResourceStates.Finished, ct).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do anything if you are using a custom resource. Who marks the resource as finished?
|
||
await rns.WaitForResourceAsync(@event.Resource.Name, KnownResourceStates.Finished, ct).ConfigureAwait(false); | ||
|
||
await manager.StopServiceAsync(ct).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this shared on the machine? What happens when you have multiple of these apps running at the same time?
logger.LogInformation("Failed to start {Model}. Error: {Error}", model, progress.ErrorMessage); | ||
await rns.PublishUpdateAsync(deployment, state => state with | ||
{ | ||
State = new ResourceStateSnapshot(KnownResourceStates.FailedToStart, KnownResourceStateStyles.Error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State = new ResourceStateSnapshot(KnownResourceStates.FailedToStart, KnownResourceStateStyles.Error) | |
State = KnownResourceStates.FailedToStart |
|
||
await rns.PublishUpdateAsync(deployment, state => state with | ||
{ | ||
State = new ResourceStateSnapshot(KnownResourceStates.Running, KnownResourceStateStyles.Success) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State = new ResourceStateSnapshot(KnownResourceStates.Running, KnownResourceStateStyles.Success) | |
State = KnownResourceStates.Running |
Properties = [.. state.Properties, new(CustomResourceKnownProperties.Source, model)] | ||
}).ConfigureAwait(false); | ||
|
||
var result = manager.DownloadModelWithProgressAsync(model, ct: ct) ?? throw new InvalidOperationException($"Failed to download model {model}."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When model download fails where will it show up? Who is logging this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check the latest iteration of the management SDK (we've got a snapshot of the code until their NuGet package ships) but I think it can't return null
anymore, so it'll always return a result that we handle in the await foreach
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never throw from a background thread like this. Log to the resource logger.
|
||
resourceBuilder | ||
.WithHttpEndpoint(env: "PORT", isProxied: false, port: 6914, name: AzureAIFoundryResource.PrimaryEndpointName) | ||
.WithExternalHttpEndpoints() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.WithExternalHttpEndpoints() |
True, I think it was there from when I brute-forced some stuff originally.
} | ||
else | ||
{ | ||
await rns.PublishUpdateAsync(resource, state => state with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my testing, there's no way to push logs from a "background task" such as this up to the parent resource log stream. But also, the SDK we use to manage Foundry Local allow us to hook into that anyway, as it manages the calls to the CLI or control plane (depending on the call you're making), so a failure is opaque to us.
As for the ability to restart/etc. let's tackle that via a future PR - this one can focus on MVP of Foundry Local + Azure AI Foundry, then add the nice-to-haves.
/// </summary> | ||
/// <param name="resource">The resource builder for the Foundry Local resource.</param> | ||
/// <returns>The updated resource builder.</returns> | ||
private static IResourceBuilder<AzureAIFoundryResource> EnsureResourceStops(this IResourceBuilder<AzureAIFoundryResource> resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this method entirely. I originally wrote it with the theory that you could run multiple instances of Foundry Local (since it was an executable as the entry point) but have since learnt that that isn't possible, so trying to force a stop will just cause problems.
Properties = [.. state.Properties, new(CustomResourceKnownProperties.Source, model)] | ||
}).ConfigureAwait(false); | ||
|
||
var result = manager.DownloadModelWithProgressAsync(model, ct: ct) ?? throw new InvalidOperationException($"Failed to download model {model}."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check the latest iteration of the management SDK (we've got a snapshot of the code until their NuGet package ships) but I think it can't return null
anymore, so it'll always return a result that we handle in the await foreach
below.
/// <summary> | ||
/// Gets or sets a value indicating whether the resource is local. | ||
/// </summary> | ||
public bool IsLocal { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use IsEmulator
rather than IsLocal
, since it's pretty analogous to an emulator in the way we use Foundry Local.
|
||
namespace Aspire.Hosting.Azure.AIFoundry; | ||
|
||
internal sealed class FoundryHealthCheck(FoundryLocalManager manager) : IHealthCheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal sealed class FoundryHealthCheck(FoundryLocalManager manager) : IHealthCheck | |
internal sealed class FoundryLocalHealthCheck(FoundryLocalManager manager) : IHealthCheck |
@@ -0,0 +1,552 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should note that this file comes from https://github.com/microsoft/Foundry-Local/tree/main/sdk/cs/src and is copied locally until the NuGet package for the management SDK ships.
|
||
namespace Aspire.Hosting.Azure.AIFoundry; | ||
|
||
internal sealed class ModelHealthCheck(string modelAlias, FoundryLocalManager manager) : IHealthCheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal sealed class ModelHealthCheck(string modelAlias, FoundryLocalManager manager) : IHealthCheck | |
internal sealed class LocalModelHealthCheck(string modelAlias, FoundryLocalManager manager) : IHealthCheck |
|
||
var localBuilder = resourceBuilder.RunAsFoundryLocal(); | ||
var localResource = Assert.Single(builder.Resources.OfType<AzureAIFoundryResource>()); | ||
Assert.True(localResource.IsLocal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to assert on the ApiKey?
/// <summary> | ||
/// Provides extension methods for configuring and managing Foundry Local resources. | ||
/// </summary> | ||
public static partial class AzureAIFoundryLocalResourceExtensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a separate class?
} | ||
|
||
var azureResource = builder.Resource; | ||
builder.ApplicationBuilder.Resources.Remove(azureResource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove and add here? Can't we just set IsLocal = true
on the current AzureAIFoundryResource?
// The .NET Foundation licenses this file to you under the MIT license. | ||
|
||
// -------------------------------------------------------------------------------------------------------------------- | ||
// <copyright company="Microsoft"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just add Microsoft copyrighted code to this repo?
resourceBuilder | ||
.WithHttpEndpoint(env: "PORT", isProxied: false, port: 6914, name: AzureAIFoundryResource.PrimaryEndpointName) | ||
.WithExternalHttpEndpoints() | ||
//.WithAIFoundryLocalDefaults() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it can be deleted.
var builder = DistributedApplication.CreateBuilder(args); | ||
|
||
var foundry = builder.AddAzureAIFoundry("foundry") | ||
.RunAsFoundryLocal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to F5 this app, I'm getting an error that says my Azure provisioning information is missing.
We shouldn't require Azure provisioning info when the app is running as local.
|
||
<ItemGroup> | ||
<AspireProjectOrPackageReference Include="Aspire.Azure.AI.Inference" /> | ||
<AspireProjectOrPackageReference Include="Aspire.OpenAI" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OpenAI used at all?
|
||
```csharp | ||
var chat = builder.AddAzureAIFoundry("foundry") | ||
.AddDeployment("chat", "phi-3.5-mini", "1", "Microsoft"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't work for me. I got:
{"error":{"code":"InvalidTemplateDeployment","message":"The template deployment 'foundry' is not valid according to the validation procedure. The tracking id is 'a63a7811-3833-40bd-85fc-8df85c2360dc'. See inner errors for details.","details":[{"code":"DeploymentModelNotSupported","message":"The model 'Format:Microsoft,Name:phi-3.5-mini,Version:1' of account deployment is not supported."}]}}
@@ -103,17 +103,17 @@ void IConnectionStringSettings.ParseConnectionString(string? connectionString) | |||
ConnectionString = connectionString | |||
}; | |||
|
|||
if (connectionBuilder.TryGetValue(nameof(DeploymentId), out var modelId)) | |||
if (connectionBuilder.TryGetValue("DeploymentId", out var modelId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
@@ -113,15 +114,29 @@ protected override IAzureClientBuilder<ChatCompletionsClient, AzureAIInferenceCl | |||
} | |||
else | |||
{ | |||
var endpoint = settings.Endpoint; | |||
|
|||
if (endpoint.Host.EndsWith(".ai.azure.com", StringComparison.OrdinalIgnoreCase) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the responsibility of the Hosting side. It should guarantee it uses the right URL and the client integration shouldn't need to fix the URL up.
return new ChatCompletionsClient(settings.Endpoint, settings.TokenCredential ?? new DefaultAzureCredential(), options); | ||
var credential = settings.TokenCredential ?? new DefaultAzureCredential(); | ||
|
||
BearerTokenAuthenticationPolicy tokenPolicy = new(credential, ["https://cognitiveservices.azure.com/.default"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this the default in the library?
Can we add a comment here?
Description
Add support for Azure AI Foundry models and Foundry Local.
Fixes #9012 #9568
Checklist
<remarks />
and<code />
elements on your triple slash comments?