feat: working instrumentation with smolagents #1184

aymeric-roucher · 2025-01-09T20:20:14Z

Fixes #1182
cc @harrisonchu I've copied the work you did for crewAI, ended up working really well!

I've put a test.py file at the root to let you try out the instrumentation.

Beware that I've not adapted the tests yet.

github-actions · 2025-01-09T20:20:31Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

axiomofjoy · 2025-01-09T21:03:09Z

Excited to dig into this @aymeric-roucher!

aymeric-roucher · 2025-01-09T21:05:51Z

...ference-instrumentation-smolagents/src/openinference/instrumentation/smolagents/_wrappers.py

+                    agent.model.last_input_token_count + agent.model.last_output_token_count
+                )
+                span.set_attribute("Observations", step_log.observations)
+                # if step_log.error is not None:


@axiomofjoy As you can see I've commented out these 3 lines.
It is because currently their visualization in a platform like Arize Phoenix is unsatisfactory: having an error in one step shows the whole run as failing, when indeed I can have an error at one step but then the multi-step agent recovers in the next steps to successfully solve the task. Is there a way to display an error without upwards transmission of the error to the whole run?

Interesting, can you give me a code snippet to reproduce this behavior?

This is the code in script test.py

Will take a look and get back to you!

I think this might be a bug in Phoenix. I have filed an issue here.

aymeric-roucher · 2025-01-09T21:06:28Z

Nice tomeet you @axiomofjoy! 🤗 This is ready for review. Final things to do are:

how to not propagate errors upwards (see above)
remove the test.py that's here only for testing
make tests work

axiomofjoy · 2025-01-09T21:27:06Z

Great to meet you @aymeric-roucher and thanks so much for this contribution! I'm digging into the PR now. Let me know how I can help you get it over the line, e.g., if you need help writing tests. Our team would also love to dogfood the instrumentation once it's in a good state!

python/instrumentation/openinference-instrumentation-smolagents/pyproject.toml

aymeric-roucher · 2025-01-09T22:08:25Z

@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work!

aymeric-roucher · 2025-01-09T22:18:42Z

@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: HfApiModel inherits from Model but since it overrides its parent's get_tool_call method, maybe the wrapper targeting Model.get_tool_call() does not work for HfApiModel.

Is there a way to create a wrapper that works for Subclass.get_tool_call for any subclass of Model?

Method __call__ works for standard LLM calls (LLM calls that are just "please generate some text", not "please generate a tool call") because it's defined in Model and not overridden. But it would seem like a hack to build such a wrapper method for Model.get_tool_call, so I'd prefer the universal wrapper described above.

aymeric-roucher · 2025-01-09T22:24:36Z

I have read the CLA Document and I hereby sign the CLA

aymeric-roucher · 2025-01-09T22:27:39Z

Also wondering how to make the nice LLM messages format with system/user shown in this screenshot:

…s/pyproject.toml Co-authored-by: Xander Song <axiomofjoy@gmail.com>

axiomofjoy · 2025-01-09T22:53:50Z

@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work!

Got it, no worries! I just got it running from dev 😄

axiomofjoy · 2025-01-09T22:56:43Z

Also wondering how to make the nice LLM messages format with system/user shown in this screenshot:

You need to our LLM message semantic conventions. These can be a bit tricky due to constraints on OTel attribute value types. You can see an example here.

axiomofjoy · 2025-01-09T23:00:38Z

@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: HfApiModel inherits from Model but since it overrides its parent's get_tool_call method, maybe the wrapper targeting Model.get_tool_call() does not work for HfApiModel.

Is there a way to create a wrapper that works for Subclass.get_tool_call for any subclass of Model?

Method __call__ works for standard LLM calls (LLM calls that are just "please generate some text", not "please generate a tool call") because it's defined in Model and not overridden. But it would seem like a hack to build such a wrapper method for Model.get_tool_call, so I'd prefer the universal wrapper described above.

Good catch! My first thought is that you might try instrumenting each subclass individually in addition to the base class. This can be accomplished by iterating over subclasses. We do something similar in DSPy here.

axiomofjoy · 2025-01-10T00:33:57Z

@aymeric-roucher It's looking really promising so far! A few findings from my initial testing.

A few span kinds are missing inputs and outputs (agent spans are missing input, LLM spans are missing both). We typically add these values for all OpenInference span kinds to show in the spans and traces table.

Tool attributes are missing. These attributes are the tools conventions in this table.

Retriever tools might best be instrumented as retriever span kind rather than a tool to give us rich UI for retrieved documents, scores, etc.

If you don't mind, I'll open a PR against your branch including some of the examples I used!

aymeric-roucher · 2025-01-10T11:52:12Z

I've applied your suggestions @axiomofjoy and implemented a wrapper over all subclasses of Model, + done many other changes to mimick dspy implementation! Here's what my dashboard now looks like.
Addressing your points:

Should be solved (cf screenshot)
I think tool attributes are now input correctly. Is there any attribute you still see missing?
We have no specific Retriever tool (or the community might build some but we can't know in advance how they'll work, they could return only a big string report instead of individual docs), so this does not apply here!

axiomofjoy · 2025-01-11T00:56:07Z

Hey @aymeric-roucher, awesome progress! I'm excited to test it out.

I opened a PR to your fork that sets up CI and adds examples here. This enables the following commands:

tox run -e mypy-smolagents (run type checks)
tox run -e ruff-smolagents (run formatters and linters)
tox run -e test-smolagents (run tests, my PR nixes the CrewAI tests since they are out of date with our current patterns)
tox run -e smolagents (run everything)

You can read about how to get set up with tox here.

I'll start dogfooding your changes and add some tests in a subsequent PR if you don't mind!

* Add examples and pass CI

aymeric-roucher · 2025-01-11T17:15:34Z

Thanks a lot for your proposed changes @axiomofjoy! I'm so looking forward to enabling observability 🌟

axiomofjoy · 2025-01-13T06:30:43Z

Began adding tests here @aymeric-roucher!

* initial test for llm spans * llm input messages * llm tool definitions * llm tool calls * record tests * nix encoder

aymeric-roucher · 2025-01-13T11:25:12Z

...ference-instrumentation-smolagents/src/openinference/instrumentation/smolagents/_wrappers.py

+                span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception)))
+                span.record_exception(exception)
+                raise
+            span.set_attribute("Tool calls", step_log.tool_calls)


@axiomofjoy here's where Ive logged the tool calls (it's a list of ToolCall object, ToolCall is a dataclass so it could just be json-dumped)

aymeric-roucher · 2025-01-13T11:29:34Z

Thank you for these changes @axiomofjoy !

Two questions from my side:

I've added logging for step_log.tool_calls in _StepWrapper
Do you know of a Chain attribute to log this list of tool calls (each tool call is a dataclass but they could be JSON dumped) in the Info tab above any error or output?
Is there a way to log errors in a less error-y way?
Below is an example where I've introduced a deliberate error, but then the agent recovers and successfully solves the task.

In `smolagents`, having an error in one step is not that much of an issue, it's normal for the LLM to give incorrect tool calls then recover in later steps.

So I want to show errors as part of the normal workflow. (else, users open issues like "my agent fails" and think it's a workflow issue instead of seeing that it's normal)

Could there be in a less red-flash-bad-error way to log errors? Like, not a frightening status error, but just a red-background message in the info tab, similar to how an Input or Output would be logged? Having the red (!) element on the minified representation is fine though.

aymeric-roucher · 2025-01-13T11:30:21Z

But with these comments we're really entering the "small nits" realm, we can merge when you're ready!

axiomofjoy · 2025-01-13T20:29:33Z

Thank you for these changes @axiomofjoy !

Two questions from my side:

I've added logging for step_log.tool_calls in _StepWrapper
Do you know of a Chain attribute to log this list of tool calls (each tool call is a dataclass but they could be JSON dumped) in the Info tab above any error or output?

Is there a way to log errors in a less error-y way?
Below is an example where I've introduced a deliberate error, but then the agent recovers and successfully solves the task.

In smolagents, having an error in one step is not that much of an issue, it's normal for the LLM to give incorrect tool calls then recover in later steps.
So I want to show errors as part of the normal workflow. (else, users open issues like "my agent fails" and think it's a workflow issue instead of seeing that it's normal)

Could there be in a less red-flash-bad-error way to log errors? Like, not a frightening status error, but just a red-background message in the info tab, similar to how an Input or Output would be logged? Having the red (!) element on the minified representation is fine though.

Hey @aymeric-roucher! We don't support a mechanism to change the salience of error messages in the UI. If you want to log information that is similar to an error, another option would be to log an OTel event without setting an error status code. This would result in a message in the events tab, but no (!).

aymeric-roucher · 2025-01-13T23:23:28Z

Ok, thank you for the precision. Then all's good on my side! Do you want any help, for instance to make sure all tests pass/ add any more required tests?

axiomofjoy · 2025-01-14T04:58:28Z

One more PR here with additional tests and tweaks to tool spans, CI, and smolagents-specific attributes. This should get us passing CI as well. If the telemetry still looks good in the UI to you @aymeric-roucher, I think we're ready for an initial release!

Ok, thank you for the precision. Then all's good on my side! Do you want any help, for instance to make sure all tests pass/ add any more required tests?

It would be helpful for maintainability if we had a test case with a minimalistic agent that produces relatively consistent output that would give us coverage on the _RunWrapper and _StepWrapper. This is not blocking the initial release, though 😁

aymeric-roucher · 2025-01-14T14:08:07Z

@axiomofjoy just added tests with deterministic agent objects! I've only put smolagent-side tests for now, not added the OTel-side testing (I figured you would be faster than me to make these tests), but I can do it if you want!

Again this last commit needs the lates main branch from smolagents to work properly.

aymeric-roucher · 2025-01-14T14:10:29Z

Also @axiomofjoy do we have way to change all errors into warnings ? This may solve the scary aspect of them, showing that errors are a normal part of an agent's run.

EDIT: I see no match for warning anywhere in the code, so probably it's not handled, np and let's merge this!

axiomofjoy · 2025-01-14T19:36:14Z

@axiomofjoy just added tests with deterministic agent objects! I've only put smolagent-side tests for now, not added the OTel-side testing (I figured you would be faster than me to make these tests), but I can do it if you want!

Thanks so much @aymeric-roucher! We can add in a follow-up.

I have one more PR still open here if you can take a look!

* pass ci * tools * clean * simplify exception handling * clean smolagents attributes * fix types

aymeric-roucher · 2025-01-14T23:07:50Z

It's merged @axiomofjoy !

axiomofjoy · 2025-01-15T06:07:33Z

Amazing work @aymeric-roucher! So excited to get this merged.

I've filed an umbrella ticket here to track fast follows, including setting up the release pipeline and addressing some of the outstanding issues in this thread. Please feel free to add to the issue!

We'll get an initial version of the package released tomorrow! 🎉

axiomofjoy · 2025-01-15T06:16:15Z

Initial release is out here!

axiomofjoy · 2025-01-15T06:34:05Z

Noticed a few small bugs and filed issues in the ticket. @aymeric-roucher please let me know how it looks to you! In particular, I have not yet been able to test with the HfApiModel since I am having some trouble getting set up with Inferences. Definitely want to make sure that is working.

aymeric-roucher · 2025-01-15T17:00:01Z

@axiomofjoy I've updated our internal ChatMessages format to give it a model_dumps_json() method, now it works!
I've now released the telemetry in this release of smolagents: https://github.com/huggingface/smolagents/releases/tag/v1.3.0

axiomofjoy · 2025-01-15T17:04:58Z

@axiomofjoy I've updated our internal ChatMessages format to give it a model_dumps_json() method, now it works! I've now released the telemetry in this release of smolagents: https://github.com/huggingface/smolagents/releases/tag/v1.3.0

Amazing, thanks so much @aymeric-roucher!

Working instrumentation with smolagents

c2a22e4

aymeric-roucher requested a review from a team as a code owner January 9, 2025 20:20

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 9, 2025

aymeric-roucher mentioned this pull request Jan 9, 2025

[support new package] Hugging Face smolagents #1182

Closed

Remove changelog

569338f

axiomofjoy self-requested a review January 9, 2025 21:02

Support multiagent

fcd55df

aymeric-roucher commented Jan 9, 2025

View reviewed changes

Support ToolCallingAgent

9457ab6

axiomofjoy reviewed Jan 9, 2025

View reviewed changes

python/instrumentation/openinference-instrumentation-smolagents/pyproject.toml Outdated Show resolved Hide resolved

aymeric-roucher mentioned this pull request Jan 9, 2025

Support OpenTelemetry huggingface/smolagents#136

Merged

github-actions bot added a commit that referenced this pull request Jan 9, 2025

@aymeric-roucher has signed the CLA in #1184

2dc4c2a

Update python/instrumentation/openinference-instrumentation-smolagent…

d47b4e5

…s/pyproject.toml Co-authored-by: Xander Song <axiomofjoy@gmail.com>

aymeric-roucher mentioned this pull request Jan 9, 2025

[enhancement] Observability huggingface/smolagents#103

Closed

axiomofjoy changed the title ~~Working instrumentation with smolagents~~ feat: working instrumentation with smolagents Jan 9, 2025

Simplify wrappers following refacto in smolagents

62aac6d

set up and pass ci for smolagents instrumentation and add examples (#1)

8c71f4a

* Add examples and pass CI

add smolagents tests for llm (#2)

2784a6b

* initial test for llm spans * llm input messages * llm tool definitions * llm tool calls * record tests * nix encoder

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 13, 2025

Log tool calls

5285b80

aymeric-roucher commented Jan 13, 2025

View reviewed changes

Add agent for testing purposes

d28e041

assorted smolagents enhancements (#3)

a1051f7

* pass ci * tools * clean * simplify exception handling * clean smolagents attributes * fix types

axiomofjoy approved these changes Jan 15, 2025

View reviewed changes

axiomofjoy merged commit a9b70ed into Arize-ai:main Jan 15, 2025
3 checks passed

axiomofjoy mentioned this pull request Jan 15, 2025

[smolagents] improve display of handled errors #1198

Open

github-actions bot mentioned this pull request Jan 15, 2025

chore(main): release python-openinference-instrumentation-smolagents 0.1.0 #1199

Merged

feat: working instrumentation with smolagents #1184

feat: working instrumentation with smolagents #1184

Conversation

aymeric-roucher commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 • edited Loading

axiomofjoy commented Jan 9, 2025

aymeric-roucher Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

axiomofjoy Jan 9, 2025

Choose a reason for hiding this comment

aymeric-roucher Jan 9, 2025

Choose a reason for hiding this comment

axiomofjoy Jan 9, 2025

Choose a reason for hiding this comment

axiomofjoy Jan 10, 2025

Choose a reason for hiding this comment

aymeric-roucher commented Jan 9, 2025 • edited Loading

axiomofjoy commented Jan 9, 2025

aymeric-roucher commented Jan 9, 2025

aymeric-roucher commented Jan 9, 2025 • edited Loading

aymeric-roucher commented Jan 9, 2025

aymeric-roucher commented Jan 9, 2025

axiomofjoy commented Jan 9, 2025

axiomofjoy commented Jan 9, 2025

axiomofjoy commented Jan 9, 2025

axiomofjoy commented Jan 10, 2025 • edited Loading

aymeric-roucher commented Jan 10, 2025 • edited Loading

axiomofjoy commented Jan 11, 2025

aymeric-roucher commented Jan 11, 2025

axiomofjoy commented Jan 13, 2025

aymeric-roucher Jan 13, 2025

Choose a reason for hiding this comment

aymeric-roucher commented Jan 13, 2025

aymeric-roucher commented Jan 13, 2025

axiomofjoy commented Jan 13, 2025

aymeric-roucher commented Jan 13, 2025 • edited Loading

axiomofjoy commented Jan 14, 2025

aymeric-roucher commented Jan 14, 2025 • edited Loading

aymeric-roucher commented Jan 14, 2025 • edited Loading

axiomofjoy commented Jan 14, 2025

aymeric-roucher commented Jan 14, 2025

axiomofjoy commented Jan 15, 2025

axiomofjoy commented Jan 15, 2025

axiomofjoy commented Jan 15, 2025

aymeric-roucher commented Jan 15, 2025

axiomofjoy commented Jan 15, 2025

github-actions bot commented Jan 9, 2025 •

edited

Loading

aymeric-roucher Jan 9, 2025 •

edited

Loading

aymeric-roucher commented Jan 9, 2025 •

edited

Loading

aymeric-roucher commented Jan 9, 2025 •

edited

Loading

axiomofjoy commented Jan 10, 2025 •

edited

Loading

aymeric-roucher commented Jan 10, 2025 •

edited

Loading

aymeric-roucher commented Jan 13, 2025 •

edited

Loading

aymeric-roucher commented Jan 14, 2025 •

edited

Loading

aymeric-roucher commented Jan 14, 2025 •

edited

Loading