-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: working instrumentation with smolagents #1184
Conversation
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
Excited to dig into this @aymeric-roucher! |
agent.model.last_input_token_count + agent.model.last_output_token_count | ||
) | ||
span.set_attribute("Observations", step_log.observations) | ||
# if step_log.error is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@axiomofjoy As you can see I've commented out these 3 lines.
It is because currently their visualization in a platform like Arize Phoenix is unsatisfactory: having an error in one step shows the whole run as failing, when indeed I can have an error at one step but then the multi-step agent recovers in the next steps to successfully solve the task. Is there a way to display an error without upwards transmission of the error to the whole run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, can you give me a code snippet to reproduce this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code in script test.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a look and get back to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a bug in Phoenix. I have filed an issue here.
Nice tomeet you @axiomofjoy! 🤗 This is ready for review. Final things to do are:
|
Great to meet you @aymeric-roucher and thanks so much for this contribution! I'm digging into the PR now. Let me know how I can help you get it over the line, e.g., if you need help writing tests. Our team would also love to dogfood the instrumentation once it's in a good state! |
python/instrumentation/openinference-instrumentation-smolagents/pyproject.toml
Outdated
Show resolved
Hide resolved
@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work! |
@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: Is there a way to create a wrapper that works for Method |
I have read the CLA Document and I hereby sign the CLA |
…s/pyproject.toml Co-authored-by: Xander Song <axiomofjoy@gmail.com>
Got it, no worries! I just got it running from dev 😄 |
You need to our LLM message semantic conventions. These can be a bit tricky due to constraints on OTel attribute value types. You can see an example here. |
Good catch! My first thought is that you might try instrumenting each subclass individually in addition to the base class. This can be accomplished by iterating over subclasses. We do something similar in DSPy here. |
@aymeric-roucher It's looking really promising so far! A few findings from my initial testing.
If you don't mind, I'll open a PR against your branch including some of the examples I used! |
I've applied your suggestions @axiomofjoy and implemented a wrapper over all subclasses of Model, + done many other changes to mimick dspy implementation! Here's what my dashboard now looks like.
|
Hey @aymeric-roucher, awesome progress! I'm excited to test it out. I opened a PR to your fork that sets up CI and adds examples here. This enables the following commands:
You can read about how to get set up with I'll start dogfooding your changes and add some tests in a subsequent PR if you don't mind! |
* Add examples and pass CI
Thanks a lot for your proposed changes @axiomofjoy! I'm so looking forward to enabling observability 🌟 |
Began adding tests here @aymeric-roucher! |
* initial test for llm spans * llm input messages * llm tool definitions * llm tool calls * record tests * nix encoder
span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception))) | ||
span.record_exception(exception) | ||
raise | ||
span.set_attribute("Tool calls", step_log.tool_calls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@axiomofjoy here's where Ive logged the tool calls (it's a list of ToolCall object, ToolCall is a dataclass so it could just be json-dumped)
Thank you for these changes @axiomofjoy ! Two questions from my side:
So I want to show errors as part of the normal workflow. (else, users open issues like "my agent fails" and think it's a workflow issue instead of seeing that it's normal) Could there be in a less red-flash-bad-error way to log errors? Like, not a frightening status error, but just a red-background message in the info tab, similar to how an Input or Output would be logged? Having the red (!) element on the minified representation is fine though. |
But with these comments we're really entering the "small nits" realm, we can merge when you're ready! |
Hey @aymeric-roucher! We don't support a mechanism to change the salience of error messages in the UI. If you want to log information that is similar to an error, another option would be to log an OTel event without setting an error status code. This would result in a message in the events tab, but no (!). |
Ok, thank you for the precision. Then all's good on my side! Do you want any help, for instance to make sure all tests pass/ add any more required tests? |
One more PR here with additional tests and tweaks to tool spans, CI, and
It would be helpful for maintainability if we had a test case with a minimalistic agent that produces relatively consistent output that would give us coverage on the |
@axiomofjoy just added tests with deterministic agent objects! I've only put smolagent-side tests for now, not added the OTel-side testing (I figured you would be faster than me to make these tests), but I can do it if you want! Again this last commit needs the lates main branch from |
Also @axiomofjoy do we have way to change all errors into warnings ? This may solve the scary aspect of them, showing that errors are a normal part of an agent's run. EDIT: I see no match for warning anywhere in the code, so probably it's not handled, np and let's merge this! |
Thanks so much @aymeric-roucher! We can add in a follow-up. I have one more PR still open here if you can take a look! |
* pass ci * tools * clean * simplify exception handling * clean smolagents attributes * fix types
It's merged @axiomofjoy ! |
Amazing work @aymeric-roucher! So excited to get this merged. I've filed an umbrella ticket here to track fast follows, including setting up the release pipeline and addressing some of the outstanding issues in this thread. Please feel free to add to the issue! We'll get an initial version of the package released tomorrow! 🎉 |
Initial release is out here! |
Noticed a few small bugs and filed issues in the ticket. @aymeric-roucher please let me know how it looks to you! In particular, I have not yet been able to test with the |
@axiomofjoy I've updated our internal ChatMessages format to give it a |
Amazing, thanks so much @aymeric-roucher! |
Fixes #1182
cc @harrisonchu I've copied the work you did for crewAI, ended up working really well!
I've put a
test.py
file at the root to let you try out the instrumentation.Beware that I've not adapted the tests yet.