Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: running with asyncio loop in uvicorn cause WorkflowHandler.pending to not be handled correctly in phoenix (arize) #17436

Open
SvenWesterlaken opened this issue Jan 6, 2025 · 15 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@SvenWesterlaken
Copy link

SvenWesterlaken commented Jan 6, 2025

Bug Description

I am running my python environment for running llama index inside a docker container. To prevent errors on starting up after introducing ragas in the whole project I added --loop asyncio to the starting command. Everything seems to be working fine accept for one thing: the traces of a workflow don't combine into a single grouped, basically having the same problem as #16283 - showing "<WorkflowHandler pending>" and splitting up the steps

The docker image start command before (and screenshot of output in phoenix):

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--reload"]

image

The docker image start command after

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--reload", "--loop", "asyncio"]
Note the added last two arguments

image image

I want to note I don't get any weird logs, probably there's a mismatch in context due to the async behaviour that is now wrapped around the application. Further information about the application is that I use fastapi as the main entrypoint to access my code and execute functions.

Version

0.12.10

Steps to Reproduce

1. Setup simpel workflow

from llama_index.core.workflow import StartEvent, StopEvent,Workflow, step
    import time

    class MyWorkflow(Workflow):
        @step
        async def my_step(self, ev: StartEvent) -> StopEvent:
            time.sleep(3)
            return StopEvent(result="Hi, world!")

    w = MyWorkflow(timeout=10, verbose=False)

2. Run the workflow with the span handler

import llama_index.core.instrumentation as instrument
from fastapi import FastAPI
from phoenix.otel import register

tracer_provider = register(
  # project_name="default", # Default is 'default'
  endpoint="http://localhost:6006/v1/traces",
)

# Instrument the application
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

app = FastAPI()

@app.get("/")
async def read_root():
    result = await w.run()

3. Setup docker with phoenix

Pull & run latest Phoenix image from Docker Hub:

docker pull arizephoenix/phoenix:latest
docker run -p 6006:6006 arizephoenix/phoenix:latest 

Or use the setup from #16283 as I think it will yield the same output, but haven't confirmed this.

This will expose the Phoenix on localhost:6006

4. Run the application with uvicorn

Make sure it's installed

run uvicorn main:app --loop asyncio

5. Execute get request to fastapi

call in web-browser or wget localhost:8000/ and it should execute the workflow

Relevant Logs/Tracbacks

No response

@SvenWesterlaken SvenWesterlaken added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 6, 2025
Copy link

dosubot bot commented Jan 6, 2025

@SvenWesterlaken Greetings! I'm here to assist you with any bugs, questions, or contributions you might have. Let me know how I can help!

The issue you're experiencing with the WorkflowHandler.pending status not being handled correctly when using --loop asyncio in Uvicorn might be related to how asynchronous contexts are managed in LlamaIndex version 0.12.10. This version may have introduced changes in async handling that affect workflow tracing, particularly when combined with the --loop asyncio option.

Here are a few steps you can take to troubleshoot and potentially resolve the issue:

  1. Check Async Context Management: Ensure that the async context is correctly managed throughout your application. The use of --loop asyncio can sometimes lead to context mismatches if the application or its dependencies are not fully compatible with this event loop.

  2. Review Tracing Configuration: Double-check your tracing setup with Phoenix and ensure that the LlamaIndexInstrumentor is correctly configured. Make sure that the tracer provider is properly set up to handle async workflows.

  3. Test Without --loop asyncio: Temporarily remove the --loop asyncio option to see if the issue persists. This can help determine if the problem is specifically related to the asyncio loop.

  4. Update Dependencies: Ensure all related packages, including LlamaIndex, FastAPI, and Uvicorn, are up to date. Sometimes, compatibility issues arise from outdated dependencies.

  5. Consult Documentation: Review any available documentation or release notes for LlamaIndex version 0.12.10 to identify any known issues or changes related to async handling and tracing.

If these steps do not resolve the issue, it may be necessary to further investigate the specific interactions between Uvicorn's asyncio loop and LlamaIndex's tracing mechanisms.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

Ah yea, I think I've heard this before. I need to look into it, but being honest, I have no idea what the exact issue/fix will be here. Suggestions/help very welcome

@logan-markewich
Copy link
Collaborator

Every workflow step (and the workflow.run()) function is wrapped by this span wrapper

The workflow.run returns a custom subclassed asyncio future object, which i think is messing with the async context? I've dug into this before, and it's a bit confusing.

@SvenWesterlaken
Copy link
Author

SvenWesterlaken commented Jan 6, 2025

Thanks for the fast response @logan-markewich! only can add from my part that the asyncio global loop is the cause in my case. Removing it solves it, so would say it's something with the global async, but I think you already go that from my initial message haha. Might be able to look into it later this week, but so far don't think I have enough experience to be of much help.

@SvenWesterlaken
Copy link
Author

SvenWesterlaken commented Jan 6, 2025

Don't know if this is correct @logan-markewich, but it seems that the problem mainly arrises from the double context and thus resetting the context around the span id. Which causes the span to exit prematurely and forego the parent_id that is saved which causes the detachments of the steps.

context.run(active_span_id.reset, token)

It is worth noting though that the steps do combine correctly, only the first step of the running of the workflow itself looks like it has some problems.

Probably now that the global loop is async, the context created doesn't need to be reset?

Might be related: django/asgiref#267 and a related PR to fix it: instana/python-sensor#666

@logan-markewich
Copy link
Collaborator

Thanks for the tip, this actually helps a lot. Let me see what I can cook up here

@logan-markewich
Copy link
Collaborator

logan-markewich commented Jan 6, 2025

@SvenWesterlaken Actually, I'm unable to reproduce using the code you gave 👀 What versions of the packages do you have? Here's my relevant versions (omg, so many otel pacakges lol)

python==3.10
llama-index-core==0.12.10
arize-phoenix-otel==0.6.1
openinference-instrumentation==0.1.20
openinference-instrumentation-llama-index==3.1.2
openinference-semantic-conventions==0.1.12
opentelemetry-api==1.29.0
opentelemetry-exporter-otlp==1.29.0
opentelemetry-exporter-otlp-proto-common==1.29.0
opentelemetry-exporter-otlp-proto-grpc==1.29.0
opentelemetry-exporter-otlp-proto-http==1.29.0
opentelemetry-instrumentation==0.50b0
opentelemetry-proto==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-semantic-conventions==0.50b0

Also running the latest pheonix docker image (v7.3.2 is printed in the docker logs)

image

@SvenWesterlaken
Copy link
Author

Let me try to create a local reproduction later today and send you the package versions.

It's because I was a bit cooked yesterday after spending so much time coding under pressure. So I couldn't get myself to put in the effort to create a full reproduction and test it. Also couldn't just copy-paste my repo code as it has some sensitive info etc. so quickly grabbed the reproduction from the related issue in hopes that it would give enough for people to figure out the issue as I normally expect people to react after a week haha. So once again thanks for the quick responses and help @logan-markewich !

@SvenWesterlaken
Copy link
Author

Packages

As for the packages @logan-markewich (deleted quite a lot of rows, as it was thrice as long, in case you miss a package you need to verify the version of, let me know):

arize-phoenix==7.5.1
arize-phoenix-evals==0.18.0
arize-phoenix-otel==0.6.1
datasets==3.2.0
deepeval==0.10.7
fastapi==0.115.6
fastapi-cli==0.0.7
fastembed==0.5.0
huggingface-hub==0.23.5
langchain==0.3.14
langchain-community==0.3.14
langchain-core==0.3.29
langchain-openai==0.2.14
langchain-text-splitters==0.3.4
langcodes==3.5.0
langsmith==0.2.10
language_data==1.3.0
llama-cloud==0.1.7
llama-index==0.12.9
llama-index-agent-openai==0.4.1
llama-index-cli==0.4.0
llama-index-core==0.12.10.post1
llama-index-embeddings-fastembed==0.3.0
llama-index-embeddings-openai==0.3.1
llama-index-extractors-entity==0.3.0
llama-index-indices-managed-llama-cloud==0.6.3
llama-index-llms-openai==0.3.12
llama-index-multi-modal-llms-openai==0.4.2
llama-index-program-openai==0.3.1
llama-index-question-gen-openai==0.3.0
llama-index-readers-file==0.4.2
llama-index-readers-llama-parse==0.4.0
llama-index-storage-chat-store-redis==0.4.0
llama-index-storage-docstore-mongodb==0.3.0
llama-index-storage-index-store-mongodb==0.4.0
llama-index-storage-kvstore-mongodb==0.3.0
llama-index-vector-stores-qdrant==0.4.2
llama-parse==0.5.19
nest-asyncio==1.6.0
openai==1.59.3
openinference-instrumentation==0.1.20
openinference-instrumentation-llama-index==3.1.2
openinference-semantic-conventions==0.1.12
opentelemetry-api==1.29.0
opentelemetry-exporter-otlp==1.29.0
opentelemetry-exporter-otlp-proto-common==1.29.0
opentelemetry-exporter-otlp-proto-grpc==1.29.0
opentelemetry-exporter-otlp-proto-http==1.29.0
opentelemetry-instrumentation==0.50b0
opentelemetry-proto==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-semantic-conventions==0.50b0
python-dotenv==1.0.1
qdrant-client==1.12.2
ragas==0.2.9

Reproduction

1. Setup the workflow from the rag with reranking example

But without the ingestion to simplify it

# workflow.py
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core.postprocessor.llm_rerank import LLMRerank
from llama_index.core.workflow import (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
)

from llama_index.llms.openai import OpenAI
from app.models.events import RetrieverEvent, RerankEvent
import logging

logger = logging.getLogger('main_app')

class RAGWorkflow(Workflow):
  @step
  async def retrieve(self, ctx: Context, event: StartEvent) -> RetrieverEvent | None:
    query = event.get("query")
    index = event.get("index")

    if not query:
      return None

    logger.debug(f"Query the database with: {query}")

    # store the query in the global context
    await ctx.set("query", query)

    if index is None:
      logger.warning("Index is empty, load some documents before querying!")
      return None

    retriever = index.as_retriever(similarity_top_k=25)
    nodes = await retriever.aretrieve(query)
    
    logger.debug(f"Retrieved {len(nodes)} nodes.")
    return RetrieverEvent(nodes=nodes)

  @step
  async def rerank(self, ctx: Context, ev: RetrieverEvent) -> RerankEvent:
      # Rerank the nodes
      
      ranker = LLMRerank(
          choice_batch_size=5, top_n=5, llm=OpenAI(model="gpt-4o-mini")
      )
      
      query = await ctx.get("query", default=None)
      
      new_nodes = ranker.postprocess_nodes(
          ev.nodes, 
          query_str=query
      )
      
      logger.debug(f"Reranked nodes to {len(new_nodes)}")
      return RerankEvent(nodes=new_nodes)

  @step
  async def synthesize(self, ctx: Context, ev: RerankEvent) -> StopEvent:
      """Return a streaming response using reranked nodes."""
      llm = OpenAI(model="gpt-4o-mini")
      summarizer = CompactAndRefine(llm=llm, streaming=True, verbose=True)
      query = await ctx.get("query", default=None)

      response = await summarizer.asynthesize(query, nodes=ev.nodes)
      return StopEvent(result=response)

2. Run the workflow from a basic endpoint using FastAPI & setup tracing

make sure to add an __init__.py so the relative import works

# main.py
from fastapi import FastAPI
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from phoenix.otel import register
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from .workflow import RAGWorkflow

tracer_provider = register(
  # project_name="default", # Default is 'default'
  endpoint="http://phoenix:6006/v1/traces",
)

# Instrument the application
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

# Setup fastapi
app = FastAPI()

# Note that this is just a simple debug endpoint
@app.get("/")
async def read_root():
  
  reader = SimpleDirectoryReader(
    input_files=["./app/test.txt"],
  )
  
  docs = reader.load_data()
  
  index = VectorStoreIndex.from_documents(docs, embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"))
  
  w = RAGWorkflow()
  result = await w.run(query="What does the context say?", index=index)
  
  async for chunk in result.async_response_gen():
    print(chunk, end="", flush=True)
    
  return 'completed'

3. Create file called test.txt

Just create a txt file with simple sentence to be indexed, in my case the content didn't matter. As long as we have something to index.

4. Run phoenix in local docker container

docker pull arizephoenix/phoenix:latest
docker run -p 6006:6006 arizephoenix/phoenix:latest 

This will expose the Phoenix on localhost:6006, it should pull 7.5.1 version of phoenix

5. Run app using uvicorn

Make sure it's installed globally

run uvicorn main:app --loop asyncio --reload

6. Access API endpoint to start workflow

call in web-browser, postman or wget localhost:8000/ and it should execute the workflow

Should return string of "completed" on completion

7. Go to phoenix to see the result

image


In case it still can be reproduced, there might be something with the docker container?
What you can try is to replace step 5 with running the python file within a docker container:

FROM python:3.10

WORKDIR /code

COPY ./requirements.txt ./
RUN pip install -r requirements.txt

COPY ./app ./app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--reload", "--loop", "asyncio"]

Code lives in subfolder of 'app' next to the Dockerfile.

@logan-markewich
Copy link
Collaborator

Thank you! I will give this a shot.

I'm actually just realizing that I ran without setting the loop type to asyncio (so using the default uvloop). I wonder if that's the key difference. I will find out

@SvenWesterlaken
Copy link
Author

SvenWesterlaken commented Jan 7, 2025

Thank you! I will give this a shot.

I'm actually just realizing that I ran without setting the loop type to asyncio (so using the default uvloop). I wonder if that's the key difference. I will find out

Yes, that's the key difference actually haha. If I don't run with the asyncio loop it works as expected. @logan-markewich

Unfortunately, can't take that out as it will crash ragas as it needs the asyncio loop instead of the uvloop. So that's also some background on how I stumbled upon this bug.

@logan-markewich
Copy link
Collaborator

@SvenWesterlaken there is likely a way to structure your code without needing nest_asyncio -- but yea ok, let me see if I can do something for this

@logan-markewich
Copy link
Collaborator

After consulting the arize folks, I think this issue will be very hard to fix 😓 In the meantime, I'm happy to help you redesign your code so that you don't need nest_asyncio -- is there a specific block of code that is requiring this?

@SvenWesterlaken
Copy link
Author

Let me get back to you later today with some error messages and some context of the code that causes it. Thank you already for time! If you want, we can leave this issue open and continue communication in another form @logan-markewich? If not, I will post the stuff here.

@logan-markewich
Copy link
Collaborator

logan-markewich commented Jan 7, 2025

Are you on our discord? Feel free to ping me there (you'll see my name in every channel lol Logan M ) -- if not, we can just continue here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants