-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: running with asyncio loop in uvicorn cause WorkflowHandler.pending
to not be handled correctly in phoenix (arize)
#17436
Comments
@SvenWesterlaken Greetings! I'm here to assist you with any bugs, questions, or contributions you might have. Let me know how I can help! The issue you're experiencing with the Here are a few steps you can take to troubleshoot and potentially resolve the issue:
If these steps do not resolve the issue, it may be necessary to further investigate the specific interactions between Uvicorn's asyncio loop and LlamaIndex's tracing mechanisms. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Ah yea, I think I've heard this before. I need to look into it, but being honest, I have no idea what the exact issue/fix will be here. Suggestions/help very welcome |
Every workflow step (and the
The workflow.run returns a custom subclassed asyncio future object, which i think is messing with the async context? I've dug into this before, and it's a bit confusing. |
Thanks for the fast response @logan-markewich! only can add from my part that the asyncio global loop is the cause in my case. Removing it solves it, so would say it's something with the global async, but I think you already go that from my initial message haha. Might be able to look into it later this week, but so far don't think I have enough experience to be of much help. |
Don't know if this is correct @logan-markewich, but it seems that the problem mainly arrises from the double context and thus resetting the context around the span id. Which causes the span to exit prematurely and forego the parent_id that is saved which causes the detachments of the steps.
It is worth noting though that the steps do combine correctly, only the first step of the running of the workflow itself looks like it has some problems. Probably now that the global loop is async, the context created doesn't need to be reset? Might be related: django/asgiref#267 and a related PR to fix it: instana/python-sensor#666 |
Thanks for the tip, this actually helps a lot. Let me see what I can cook up here |
@SvenWesterlaken Actually, I'm unable to reproduce using the code you gave 👀 What versions of the packages do you have? Here's my relevant versions (omg, so many otel pacakges lol)
Also running the latest pheonix docker image (v7.3.2 is printed in the docker logs) |
Let me try to create a local reproduction later today and send you the package versions. It's because I was a bit cooked yesterday after spending so much time coding under pressure. So I couldn't get myself to put in the effort to create a full reproduction and test it. Also couldn't just copy-paste my repo code as it has some sensitive info etc. so quickly grabbed the reproduction from the related issue in hopes that it would give enough for people to figure out the issue as I normally expect people to react after a week haha. So once again thanks for the quick responses and help @logan-markewich ! |
PackagesAs for the packages @logan-markewich (deleted quite a lot of rows, as it was thrice as long, in case you miss a package you need to verify the version of, let me know): arize-phoenix==7.5.1
arize-phoenix-evals==0.18.0
arize-phoenix-otel==0.6.1
datasets==3.2.0
deepeval==0.10.7
fastapi==0.115.6
fastapi-cli==0.0.7
fastembed==0.5.0
huggingface-hub==0.23.5
langchain==0.3.14
langchain-community==0.3.14
langchain-core==0.3.29
langchain-openai==0.2.14
langchain-text-splitters==0.3.4
langcodes==3.5.0
langsmith==0.2.10
language_data==1.3.0
llama-cloud==0.1.7
llama-index==0.12.9
llama-index-agent-openai==0.4.1
llama-index-cli==0.4.0
llama-index-core==0.12.10.post1
llama-index-embeddings-fastembed==0.3.0
llama-index-embeddings-openai==0.3.1
llama-index-extractors-entity==0.3.0
llama-index-indices-managed-llama-cloud==0.6.3
llama-index-llms-openai==0.3.12
llama-index-multi-modal-llms-openai==0.4.2
llama-index-program-openai==0.3.1
llama-index-question-gen-openai==0.3.0
llama-index-readers-file==0.4.2
llama-index-readers-llama-parse==0.4.0
llama-index-storage-chat-store-redis==0.4.0
llama-index-storage-docstore-mongodb==0.3.0
llama-index-storage-index-store-mongodb==0.4.0
llama-index-storage-kvstore-mongodb==0.3.0
llama-index-vector-stores-qdrant==0.4.2
llama-parse==0.5.19
nest-asyncio==1.6.0
openai==1.59.3
openinference-instrumentation==0.1.20
openinference-instrumentation-llama-index==3.1.2
openinference-semantic-conventions==0.1.12
opentelemetry-api==1.29.0
opentelemetry-exporter-otlp==1.29.0
opentelemetry-exporter-otlp-proto-common==1.29.0
opentelemetry-exporter-otlp-proto-grpc==1.29.0
opentelemetry-exporter-otlp-proto-http==1.29.0
opentelemetry-instrumentation==0.50b0
opentelemetry-proto==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-semantic-conventions==0.50b0
python-dotenv==1.0.1
qdrant-client==1.12.2
ragas==0.2.9 Reproduction1. Setup the workflow from the rag with reranking exampleBut without the ingestion to simplify it # workflow.py
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core.postprocessor.llm_rerank import LLMRerank
from llama_index.core.workflow import (
Context,
Workflow,
StartEvent,
StopEvent,
step,
)
from llama_index.llms.openai import OpenAI
from app.models.events import RetrieverEvent, RerankEvent
import logging
logger = logging.getLogger('main_app')
class RAGWorkflow(Workflow):
@step
async def retrieve(self, ctx: Context, event: StartEvent) -> RetrieverEvent | None:
query = event.get("query")
index = event.get("index")
if not query:
return None
logger.debug(f"Query the database with: {query}")
# store the query in the global context
await ctx.set("query", query)
if index is None:
logger.warning("Index is empty, load some documents before querying!")
return None
retriever = index.as_retriever(similarity_top_k=25)
nodes = await retriever.aretrieve(query)
logger.debug(f"Retrieved {len(nodes)} nodes.")
return RetrieverEvent(nodes=nodes)
@step
async def rerank(self, ctx: Context, ev: RetrieverEvent) -> RerankEvent:
# Rerank the nodes
ranker = LLMRerank(
choice_batch_size=5, top_n=5, llm=OpenAI(model="gpt-4o-mini")
)
query = await ctx.get("query", default=None)
new_nodes = ranker.postprocess_nodes(
ev.nodes,
query_str=query
)
logger.debug(f"Reranked nodes to {len(new_nodes)}")
return RerankEvent(nodes=new_nodes)
@step
async def synthesize(self, ctx: Context, ev: RerankEvent) -> StopEvent:
"""Return a streaming response using reranked nodes."""
llm = OpenAI(model="gpt-4o-mini")
summarizer = CompactAndRefine(llm=llm, streaming=True, verbose=True)
query = await ctx.get("query", default=None)
response = await summarizer.asynthesize(query, nodes=ev.nodes)
return StopEvent(result=response) 2. Run the workflow from a basic endpoint using FastAPI & setup tracingmake sure to add an # main.py
from fastapi import FastAPI
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from phoenix.otel import register
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from .workflow import RAGWorkflow
tracer_provider = register(
# project_name="default", # Default is 'default'
endpoint="http://phoenix:6006/v1/traces",
)
# Instrument the application
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
# Setup fastapi
app = FastAPI()
# Note that this is just a simple debug endpoint
@app.get("/")
async def read_root():
reader = SimpleDirectoryReader(
input_files=["./app/test.txt"],
)
docs = reader.load_data()
index = VectorStoreIndex.from_documents(docs, embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"))
w = RAGWorkflow()
result = await w.run(query="What does the context say?", index=index)
async for chunk in result.async_response_gen():
print(chunk, end="", flush=True)
return 'completed' 3. Create file called
|
Thank you! I will give this a shot. I'm actually just realizing that I ran without setting the loop type to asyncio (so using the default uvloop). I wonder if that's the key difference. I will find out |
Yes, that's the key difference actually haha. If I don't run with the asyncio loop it works as expected. @logan-markewich Unfortunately, can't take that out as it will crash ragas as it needs the asyncio loop instead of the uvloop. So that's also some background on how I stumbled upon this bug. |
@SvenWesterlaken there is likely a way to structure your code without needing |
After consulting the arize folks, I think this issue will be very hard to fix 😓 In the meantime, I'm happy to help you redesign your code so that you don't need |
Let me get back to you later today with some error messages and some context of the code that causes it. Thank you already for time! If you want, we can leave this issue open and continue communication in another form @logan-markewich? If not, I will post the stuff here. |
Are you on our discord? Feel free to ping me there (you'll see my name in every channel lol |
Bug Description
I am running my python environment for running llama index inside a docker container. To prevent errors on starting up after introducing ragas in the whole project I added
--loop asyncio
to the starting command. Everything seems to be working fine accept for one thing: the traces of a workflow don't combine into a single grouped, basically having the same problem as #16283 - showing"<WorkflowHandler pending>"
and splitting up the stepsThe docker image start command before (and screenshot of output in phoenix):
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--reload"]
The docker image start command after
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--reload", "--loop", "asyncio"]
Note the added last two arguments
I want to note I don't get any weird logs, probably there's a mismatch in context due to the async behaviour that is now wrapped around the application. Further information about the application is that I use fastapi as the main entrypoint to access my code and execute functions.
Version
0.12.10
Steps to Reproduce
1. Setup simpel workflow
2. Run the workflow with the span handler
3. Setup docker with phoenix
Or use the setup from #16283 as I think it will yield the same output, but haven't confirmed this.
This will expose the Phoenix on localhost:6006
4. Run the application with uvicorn
Make sure it's installed
run
uvicorn main:app --loop asyncio
5. Execute get request to fastapi
call in web-browser or
wget
localhost:8000/ and it should execute the workflowRelevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: