Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: working instrumentation with smolagents #1184

Merged
merged 11 commits into from
Jan 15, 2025

Conversation

aymeric-roucher
Copy link
Contributor

Fixes #1182
cc @harrisonchu I've copied the work you did for crewAI, ended up working really well!

I've put a test.py file at the root to let you try out the instrumentation.

Beware that I've not adapted the tests yet.

@aymeric-roucher aymeric-roucher requested a review from a team as a code owner January 9, 2025 20:20
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 9, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@axiomofjoy axiomofjoy self-requested a review January 9, 2025 21:02
@axiomofjoy
Copy link
Contributor

Excited to dig into this @aymeric-roucher!

agent.model.last_input_token_count + agent.model.last_output_token_count
)
span.set_attribute("Observations", step_log.observations)
# if step_log.error is not None:
Copy link
Contributor Author

@aymeric-roucher aymeric-roucher Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axiomofjoy As you can see I've commented out these 3 lines.
It is because currently their visualization in a platform like Arize Phoenix is unsatisfactory: having an error in one step shows the whole run as failing, when indeed I can have an error at one step but then the multi-step agent recovers in the next steps to successfully solve the task. Is there a way to display an error without upwards transmission of the error to the whole run?

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, can you give me a code snippet to reproduce this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code in script test.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look and get back to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bug in Phoenix. I have filed an issue here.

@aymeric-roucher
Copy link
Contributor Author

aymeric-roucher commented Jan 9, 2025

Nice tomeet you @axiomofjoy! 🤗 This is ready for review. Final things to do are:

  • how to not propagate errors upwards (see above)
  • remove the test.py that's here only for testing
  • make tests work

@axiomofjoy
Copy link
Contributor

Great to meet you @aymeric-roucher and thanks so much for this contribution! I'm digging into the PR now. Let me know how I can help you get it over the line, e.g., if you need help writing tests. Our team would also love to dogfood the instrumentation once it's in a good state!

@aymeric-roucher
Copy link
Contributor Author

@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work!

@aymeric-roucher
Copy link
Contributor Author

aymeric-roucher commented Jan 9, 2025

@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: HfApiModel inherits from Model but since it overrides its parent's get_tool_call method, maybe the wrapper targeting Model.get_tool_call() does not work for HfApiModel.

Is there a way to create a wrapper that works for Subclass.get_tool_call for any subclass of Model?

Method __call__ works for standard LLM calls (LLM calls that are just "please generate some text", not "please generate a tool call") because it's defined in Model and not overridden. But it would seem like a hack to build such a wrapper method for Model.get_tool_call, so I'd prefer the universal wrapper described above.

image

@aymeric-roucher
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Jan 9, 2025
@aymeric-roucher
Copy link
Contributor Author

Also wondering how to make the nice LLM messages format with system/user shown in this screenshot:
image

…s/pyproject.toml

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
@axiomofjoy
Copy link
Contributor

@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work!

Got it, no worries! I just got it running from dev 😄

@axiomofjoy
Copy link
Contributor

Also wondering how to make the nice LLM messages format with system/user shown in this screenshot: image

You need to our LLM message semantic conventions. These can be a bit tricky due to constraints on OTel attribute value types. You can see an example here.

@axiomofjoy
Copy link
Contributor

@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: HfApiModel inherits from Model but since it overrides its parent's get_tool_call method, maybe the wrapper targeting Model.get_tool_call() does not work for HfApiModel.

Is there a way to create a wrapper that works for Subclass.get_tool_call for any subclass of Model?

Method __call__ works for standard LLM calls (LLM calls that are just "please generate some text", not "please generate a tool call") because it's defined in Model and not overridden. But it would seem like a hack to build such a wrapper method for Model.get_tool_call, so I'd prefer the universal wrapper described above.

image

Good catch! My first thought is that you might try instrumenting each subclass individually in addition to the base class. This can be accomplished by iterating over subclasses. We do something similar in DSPy here.

@axiomofjoy axiomofjoy changed the title Working instrumentation with smolagents feat: working instrumentation with smolagents Jan 9, 2025
@axiomofjoy
Copy link
Contributor

axiomofjoy commented Jan 10, 2025

@aymeric-roucher It's looking really promising so far! A few findings from my initial testing.

  1. A few span kinds are missing inputs and outputs (agent spans are missing input, LLM spans are missing both). We typically add these values for all OpenInference span kinds to show in the spans and traces table.
Screenshot 2025-01-09 at 4 27 56 PM
  1. Tool attributes are missing. These attributes are the tools conventions in this table.
Screenshot 2025-01-09 at 4 29 09 PM
  1. Retriever tools might best be instrumented as retriever span kind rather than a tool to give us rich UI for retrieved documents, scores, etc.

If you don't mind, I'll open a PR against your branch including some of the examples I used!

@aymeric-roucher
Copy link
Contributor Author

aymeric-roucher commented Jan 10, 2025

image

I've applied your suggestions @axiomofjoy and implemented a wrapper over all subclasses of Model, + done many other changes to mimick dspy implementation! Here's what my dashboard now looks like.
Addressing your points:

  1. Should be solved (cf screenshot)
  2. I think tool attributes are now input correctly. Is there any attribute you still see missing?
  3. We have no specific Retriever tool (or the community might build some but we can't know in advance how they'll work, they could return only a big string report instead of individual docs), so this does not apply here!

@axiomofjoy
Copy link
Contributor

Hey @aymeric-roucher, awesome progress! I'm excited to test it out.

I opened a PR to your fork that sets up CI and adds examples here. This enables the following commands:

  • tox run -e mypy-smolagents (run type checks)
  • tox run -e ruff-smolagents (run formatters and linters)
  • tox run -e test-smolagents (run tests, my PR nixes the CrewAI tests since they are out of date with our current patterns)
  • tox run -e smolagents (run everything)

You can read about how to get set up with tox here.

I'll start dogfooding your changes and add some tests in a subsequent PR if you don't mind!

@aymeric-roucher
Copy link
Contributor Author

Thanks a lot for your proposed changes @axiomofjoy! I'm so looking forward to enabling observability 🌟

@axiomofjoy
Copy link
Contributor

Began adding tests here @aymeric-roucher!

* initial test for llm spans

* llm input messages

* llm tool definitions

* llm tool calls

* record tests

* nix encoder
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 13, 2025
span.set_status(trace_api.Status(trace_api.StatusCode.ERROR, str(exception)))
span.record_exception(exception)
raise
span.set_attribute("Tool calls", step_log.tool_calls)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axiomofjoy here's where Ive logged the tool calls (it's a list of ToolCall object, ToolCall is a dataclass so it could just be json-dumped)

@aymeric-roucher
Copy link
Contributor Author

Thank you for these changes @axiomofjoy !

Two questions from my side:

  1. I've added logging for step_log.tool_calls in _StepWrapper
    Do you know of a Chain attribute to log this list of tool calls (each tool call is a dataclass but they could be JSON dumped) in the Info tab above any error or output?

  2. Is there a way to log errors in a less error-y way?
    Below is an example where I've introduced a deliberate error, but then the agent recovers and successfully solves the task.

image In `smolagents`, having an error in one step is not that much of an issue, it's normal for the LLM to give incorrect tool calls then recover in later steps.

So I want to show errors as part of the normal workflow. (else, users open issues like "my agent fails" and think it's a workflow issue instead of seeing that it's normal)

Could there be in a less red-flash-bad-error way to log errors? Like, not a frightening status error, but just a red-background message in the info tab, similar to how an Input or Output would be logged? Having the red (!) element on the minified representation is fine though.

@aymeric-roucher
Copy link
Contributor Author

But with these comments we're really entering the "small nits" realm, we can merge when you're ready!

@axiomofjoy
Copy link
Contributor

Thank you for these changes @axiomofjoy !

Two questions from my side:

  1. I've added logging for step_log.tool_calls in _StepWrapper
    Do you know of a Chain attribute to log this list of tool calls (each tool call is a dataclass but they could be JSON dumped) in the Info tab above any error or output?
  2. Is there a way to log errors in a less error-y way?
    Below is an example where I've introduced a deliberate error, but then the agent recovers and successfully solves the task.

image In smolagents, having an error in one step is not that much of an issue, it's normal for the LLM to give incorrect tool calls then recover in later steps.
So I want to show errors as part of the normal workflow. (else, users open issues like "my agent fails" and think it's a workflow issue instead of seeing that it's normal)

Could there be in a less red-flash-bad-error way to log errors? Like, not a frightening status error, but just a red-background message in the info tab, similar to how an Input or Output would be logged? Having the red (!) element on the minified representation is fine though.

Hey @aymeric-roucher! We don't support a mechanism to change the salience of error messages in the UI. If you want to log information that is similar to an error, another option would be to log an OTel event without setting an error status code. This would result in a message in the events tab, but no (!).

@aymeric-roucher
Copy link
Contributor Author

aymeric-roucher commented Jan 13, 2025

Ok, thank you for the precision. Then all's good on my side! Do you want any help, for instance to make sure all tests pass/ add any more required tests?

@axiomofjoy
Copy link
Contributor

One more PR here with additional tests and tweaks to tool spans, CI, and smolagents-specific attributes. This should get us passing CI as well. If the telemetry still looks good in the UI to you @aymeric-roucher, I think we're ready for an initial release!

Ok, thank you for the precision. Then all's good on my side! Do you want any help, for instance to make sure all tests pass/ add any more required tests?

It would be helpful for maintainability if we had a test case with a minimalistic agent that produces relatively consistent output that would give us coverage on the _RunWrapper and _StepWrapper. This is not blocking the initial release, though 😁

@aymeric-roucher
Copy link
Contributor Author

aymeric-roucher commented Jan 14, 2025

@axiomofjoy just added tests with deterministic agent objects! I've only put smolagent-side tests for now, not added the OTel-side testing (I figured you would be faster than me to make these tests), but I can do it if you want!

Again this last commit needs the lates main branch from smolagents to work properly.

@aymeric-roucher
Copy link
Contributor Author

aymeric-roucher commented Jan 14, 2025

Also @axiomofjoy do we have way to change all errors into warnings ? This may solve the scary aspect of them, showing that errors are a normal part of an agent's run.

EDIT: I see no match for warning anywhere in the code, so probably it's not handled, np and let's merge this!

@axiomofjoy
Copy link
Contributor

@axiomofjoy just added tests with deterministic agent objects! I've only put smolagent-side tests for now, not added the OTel-side testing (I figured you would be faster than me to make these tests), but I can do it if you want!

Thanks so much @aymeric-roucher! We can add in a follow-up.

I have one more PR still open here if you can take a look!

* pass ci

* tools

* clean

* simplify exception handling

* clean smolagents attributes

* fix types
@aymeric-roucher
Copy link
Contributor Author

It's merged @axiomofjoy !

@axiomofjoy axiomofjoy merged commit a9b70ed into Arize-ai:main Jan 15, 2025
3 checks passed
@axiomofjoy
Copy link
Contributor

Amazing work @aymeric-roucher! So excited to get this merged.

I've filed an umbrella ticket here to track fast follows, including setting up the release pipeline and addressing some of the outstanding issues in this thread. Please feel free to add to the issue!

We'll get an initial version of the package released tomorrow! 🎉

@axiomofjoy
Copy link
Contributor

Initial release is out here!

@axiomofjoy
Copy link
Contributor

Noticed a few small bugs and filed issues in the ticket. @aymeric-roucher please let me know how it looks to you! In particular, I have not yet been able to test with the HfApiModel since I am having some trouble getting set up with Inferences. Definitely want to make sure that is working.

@aymeric-roucher
Copy link
Contributor Author

@axiomofjoy I've updated our internal ChatMessages format to give it a model_dumps_json() method, now it works!
I've now released the telemetry in this release of smolagents: https://github.com/huggingface/smolagents/releases/tag/v1.3.0

@axiomofjoy
Copy link
Contributor

@axiomofjoy I've updated our internal ChatMessages format to give it a model_dumps_json() method, now it works! I've now released the telemetry in this release of smolagents: https://github.com/huggingface/smolagents/releases/tag/v1.3.0

Amazing, thanks so much @aymeric-roucher!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[support new package] Hugging Face smolagents
2 participants