Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random KeyError happening during code execution with correct code #1544

Open
hbrantner-auvaria opened this issue Jan 23, 2025 · 16 comments
Open
Labels
bug Something isn't working

Comments

@hbrantner-auvaria
Copy link

System Info

pandas AI 2.4.2
mac OS 15.2
Python 3.12.8
AWS Bedrock as LLM with Claude 3.5 Sonnet v2

🐛 Describe the bug

@ArslanSaleem @gventuri

This is the log that I get in my console if I ask my agent: "What is the current date?" I have no data sources attached ...
Why do I get the KeyError here, code seems to be just fine. I do not understand why such easy use case is not working.

You generated this python code:
current_date = datetime.datetime.now(datetime.timezone.utc)
formatted_date = current_date.strftime('%A, %B %d, %Y')
result = {'type': 'string', 'value': f"Today's date is {formatted_date}."}
It fails with the following error:
Traceback (most recent call last):
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
result = self.execute_code(code_to_run, code_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/freezegun/api.py", line 885, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 174, in execute_code
exec(code, environment)
File "", line 2, in
KeyError: 'import'

Do you have an idea why this happens?

Copy link

dosubot bot commented Jan 23, 2025

Hey @hbrantner-auvaria! I'm here to help you with your bug. I'm a bot that can assist you with bug fixes, answer your questions, and help you become a contributor. Let's tackle this issue together!

A similar issue has been reported where a KeyError: '__import__' occurs during code execution in PandasAI. The suggested solution is to ensure that the code does not contain any references to dangerous built-ins, including __import__. The CodeCleaning class in PandasAI is designed to check for such dangerous built-ins and remove them. Make sure that the _is_jailbreak method is correctly identifying and removing any references to __import__ in the generated code. Additionally, verify that the CodeCleaning step is properly integrated into the GenerateChatPipeline [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hbrantner-auvaria hbrantner-auvaria changed the title Random Random KeyError happening during code execution with correct code Jan 23, 2025
@dosubot dosubot bot added the bug Something isn't working label Jan 23, 2025
@hbrantner-auvaria
Copy link
Author

also while giving the agent information about a dataframe, the common type pd.StringDtype() is displayed as object, but should be displayed as string in my opinion, furthermore the addition of a nullable property for each column would be nice:

name: PortName
type: object
samples:
- Helsinki
- Stockholm
- Visby (Gotland)

@hbrantner-auvaria
Copy link
Author

Another thing that I found out is that timestamps are incorrectly presented to the LLM in the samples:

name: EstimatedArrivalTime
type: datetime64[ns, UTC]
samples:
- &id002 !!python/object/apply:pandas._libs.tslibs.nattype.__nat_unpickle
- null
- !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
- 1719210600000000000
- null
- *id001
- 10
- !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
- 1719471600000000000
- null
- *id001
- 10

@hbrantner-auvaria
Copy link
Author

If you point me to the locations I may be able to fix it myself, I think the df is inserted with the templates, but I currently have no good overview over the project

Copy link

dosubot bot commented Jan 23, 2025

The conversion of pandas data types to string representations for the LLM is handled by the DataframeSerializer class. This class provides methods to serialize a pandas DataFrame into different string formats, such as JSON, YML, CSV, and SQL [1].

For the issue with pd.StringDtype() being displayed as object, it seems there isn't specific handling for StringDtype in the current codebase. You might need to modify the serialization logic to correctly identify and represent StringDtype as string.

Regarding the incorrect presentation of timestamps, the TimestampMixin in server/core/database/mixins/timestamp.py is responsible for handling and formatting timestamp data for the LLM [2]. You may need to adjust how timestamps are serialized to ensure they are presented correctly.

The templates used to insert dataframe information into the chat pipeline are located in pandasai/ee/agents/semantic_agent/prompts/templates/shared/dataframe.tmpl [3]. You can start by looking into these areas to make the necessary adjustments.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hbrantner-auvaria
Copy link
Author

@dosu Can I skip the code cleaning step?

Copy link

dosubot bot commented Jan 23, 2025

Skipping the code cleaning step is not recommended, as it helps prevent the execution of potentially dangerous code by removing unsafe built-ins like __import__. However, if you still want to bypass it, you would need to modify the code execution pipeline to exclude the CodeCleaning step. This involves altering the GenerateChatPipeline to remove or bypass the code cleaning logic. Be cautious with this approach, as it may expose your system to security risks.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hbrantner-auvaria
Copy link
Author

Furthermore, also the kind of errors are happening @dosu:

  • the "Unfortunately, I was not able to answer your question, because of the following error:\n\n'import'\n" error from before
  • this return error "Unfortunately, I was not able to answer your question, because of the following error:\n\n'return' outside function (, line 10)\n" - fixable with system prompt probably
  • this wrong result type error - "Unfortunately, I was not able to answer your question, because of the following error:\n\n'str' object has no attribute 'get'\n" (should also be fixable with a system prompt)

What do you suggest?

Copy link

dosubot bot commented Jan 23, 2025

For the errors you're encountering:

  1. KeyError: '__import__': This error is likely due to the presence of unsafe built-ins in the code. The CodeCleaning step in the GenerateChatPipeline is designed to remove such references. Ensure that this step is correctly identifying and removing __import__ from the generated code. Skipping this step is not recommended due to security concerns [1].

  2. 'return' outside function: This error can be addressed by using system prompts to guide the LLM in generating correct code structures. The ErrorPromptGeneration class can be configured to handle such errors by generating prompts that correct the code structure [2].

  3. 'str' object has no attribute 'get': This error suggests a type mismatch or incorrect handling of string objects. Similar to the previous error, system prompts can be used to guide the LLM in generating the correct code. The ErrorPromptGeneration class can help by providing context-specific prompts to correct this issue [2].

Consider reviewing the ErrorPromptGeneration class and its associated prompt classes to ensure they are configured to handle these specific errors effectively.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hbrantner-auvaria
Copy link
Author

@dosu What is wrong here, I do not get it ...

You generated this python code:
current_date = datetime.datetime.now(datetime.timezone.utc).strftime('%A, %B %d, %Y')
result = {'type': 'string', 'value': f"Today's date is {current_date}."}
It fails with the following error:
Traceback (most recent call last):
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
result = self.execute_code(code_to_run, code_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/freezegun/api.py", line 885, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 174, in execute_code
exec(code, environment)
File "", line 1, in
KeyError: 'import'
Fix the python code above and return the new python code:

Copy link

dosubot bot commented Jan 23, 2025

The KeyError: '__import__' you're encountering suggests that the code execution environment is trying to access the __import__ function, which is likely being blocked by the security measures in place. The CodeCleaning class is designed to prevent the use of unsafe built-ins like __import__ to ensure security [1].

To address this issue, you can try the following:

  1. Review the CodeCleaning Process: Ensure that the CodeCleaning step is correctly identifying and removing any references to __import__ in the generated code. This step is crucial for maintaining security, so it's important to verify that it's functioning as intended.

  2. Modify the Code Execution Environment: If you need to use certain built-ins that are being blocked, you might consider adjusting the environment setup to allow them, but be cautious as this can introduce security risks.

  3. Error Handling with Prompts: Use the ErrorPromptGeneration class to handle and correct errors. This class can generate prompts to guide the LLM in producing code that avoids such errors [2].

If the code itself is correct and the error persists, it might be necessary to look into how the execution environment is configured and whether any security settings are overly restrictive for your use case.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hbrantner-auvaria
Copy link
Author

Another example @dosu

You generated this python code:
current_date = datetime.datetime.now(datetime.timezone.utc)
current_date_start = current_date.replace(hour=0, minute=0, second=0, microsecond=0)
current_date_end = current_date.replace(hour=23, minute=59, second=59, microsecond=999999)
daily_events_df = dfs[5]
todays_workshops = daily_events_df[(daily_events_df['EventStart'] >= current_date_start) & (daily_events_df['EventStart'] <= current_date_end) & daily_events_df['EventCategory'].str.contains('kreativ', case=False, na=False)]
if len(todays_workshops) > 0:
workshop_info = []
for _, workshop in todays_workshops.iterrows():
start_time = workshop['EventStart'].strftime('%H:%M')
workshop_info.append(f"- {workshop['EventName']} at {start_time} in {workshop['EventLocation']}")
response = "Today's workshops:\n" + '\n'.join(workshop_info)
else:
response = 'There are no workshops scheduled for today.'
result = {'type': 'string', 'value': response}
It fails with the following error:
Traceback (most recent call last):
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
result = self.execute_code(code_to_run, code_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/freezegun/api.py", line 885, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 174, in execute_code
exec(code, environment)
File "", line 9, in
File "pandas/_libs/tslibs/timestamps.pyx", line 1496, in pandas._libs.tslibs.timestamps.Timestamp.strftime
KeyError: 'import'

Copy link

dosubot bot commented Jan 23, 2025

The KeyError: '__import__' error you're encountering suggests that the code execution environment is attempting to use the __import__ function, which is likely being blocked by the security measures in place. This is part of the security setup to prevent the use of unsafe built-ins [1].

To address this issue, consider the following steps:

  1. Ensure Code Cleaning: Verify that the CodeCleaning class is correctly identifying and removing any references to __import__ in the generated code. This step is crucial for maintaining security and should be functioning as intended [2].

  2. Review Execution Environment: If you need to use certain built-ins that are being blocked, you might consider adjusting the environment setup to allow them. However, be cautious as this can introduce security risks. The environment uses a whitelist of allowed built-ins, and you may need to modify this list if necessary [1].

  3. Error Handling with Prompts: Use the ErrorPromptGeneration class to handle and correct errors. This class can generate prompts to guide the LLM in producing code that avoids such errors [1].

If the code itself is correct and the error persists, it might be necessary to look into how the execution environment is configured and whether any security settings are overly restrictive for your use case.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hbrantner-auvaria
Copy link
Author

hbrantner-auvaria commented Jan 23, 2025

cannot import code in exec - disabled code cleaning step and set security to 'none' - do not understand where the problem comes from @ArslanSaleem @gventuri :

Generate python code and return full updated code:

INFO:pandasai.helpers.logger:Code generated:

            import pandas as pd
import datetime
import pytz
def format_time(dt):
    if pd.isna(dt):
        return "Not specified"
    return dt.strftime("%H:%M")
# Get current date in UTC
current_date = datetime.datetime.now(pytz.UTC)
current_date_start = current_date.replace(hour=0, minute=0, second=0, microsecond=0)
current_date_end = current_date.replace(hour=23, minute=59, second=59, microsecond=999999)
# Filter events for today
daily_events_df = dfs[5]  # DailyEvents table
todays_events = daily_events_df[
    (daily_events_df['EventStart'] >= current_date_start) &
    (daily_events_df['EventStart'] <= current_date_end)
]
if len(todays_events) == 0:
    response = "There are no events scheduled for today."
else:
    # Sort events by start time
    todays_events = todays_events.sort_values('EventStart')
    
    # Create formatted event list
    event_list = []
    for _, event in todays_events.iterrows():
        event_str = (f"{format_time(event['EventStart'])} - {format_time(event['EventEnd'])}: "
                    f"{event['EventName']} ({event['EventLocation']})")
        if pd.notna(event['EventPriceInEur']):
            event_str += f" - {event['EventPriceInEur']}€"
        event_list.append(event_str)
    
    response = "Today's events:\n\n" + "\n".join(event_list)
result = {
    "type": "string",
    "value": response
}

INFO:pandasai.helpers.logger:Executing Step 4: CachePopulation
INFO:pandasai.helpers.logger:Executing Step 5: CodeExecution
ERROR:pandasai.helpers.logger:Failed with error: Traceback (most recent call last):
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
result = self.execute_code(code_to_run, code_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/freezegun/api.py", line 885, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/hannes_brantner/Library/Caches/pypoetry/virtualenvs/chatbot-fXCp8fhr-py3.12/lib/python3.12/site-packages/pandasai/pipelines/chat/code_execution.py", line 174, in execute_code
exec(code, environment)
File "", line 1, in
ImportError: import not found

@hbrantner-auvaria
Copy link
Author

hbrantner-auvaria commented Jan 23, 2025

This seemed to fix it, after digging in the library code:
WHITELISTED_BUILTINS.append("__import__")

But that should not be so complicated ... or so error-prone in the default config

@gventuri
Copy link
Collaborator

Hi @hbrantner-auvaria we are aware there are some stability issues with date times and are further investigating in a robust way to fix it!

Thanks a lot for the very detailed logs, much appreciated, we'll investigate further from there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants