Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Controlled generation fails truncating response when it includes code markdown #1372

Closed
1 task done
LBUPD33 opened this issue Nov 4, 2024 · 7 comments
Closed
1 task done
Assignees

Comments

@LBUPD33
Copy link

LBUPD33 commented Nov 4, 2024

File Name

GCP Vertex Studio & Google Colab

What happened?

Context
Trying to reproduce a Gemini Chatbot for internal use: dedicated system instructions, restricted access (using gauth).

Technical constraints

  1. Output need to be a consistent JSON.
  2. Output need to include a response with markdown formatted content.

Bug

Output get truncated when it tries to include markdown code.

Tests done

  • No controlled generation (plain text) but include JSON Schema in prompt = Working fine BUT Inconsistent Output
  • Controlled generation (app/json), with response schema and without system instructions = Still truncating response

Vertex Studio Params

  • Model: Gemini 1.5 (flash and pro don't work)
  • Type: Chat
  • System instruction: NONE
  • Generation Config:
    • Output format / response_mime_type: application/json
    • response_schema: {"type":"OBJECT","properties":{"response_content":{"type":"STRING"}},"required":["response_content"]}
  • Safety Settings: OFF
  • Prompt: "Use Markdown to format your answer. Could you explain me what is map() function in javascript and give me an example?"

Python code

import base64
import vertexai
from vertexai.generative_models import GenerativeModel, SafetySetting, Part

def multiturn_generate_content():
    vertexai.init(project=gcp_project, location="europe-west9")
    model = GenerativeModel(
        "gemini-1.5-flash-002",
        system_instruction=[textsi_1],
        )
    chat = model.start_chat()
    print(chat.send_message(
        ["""Use Markdown to format your answer. Could you explain me what is map() function in javascript and give me an example?"""],
        generation_config=generation_config,
        safety_settings=safety_settings
    ))

textsi_1 = """"""

generation_config = {
    "max_output_tokens": 8192,
    "temperature": 0.5,
    "top_p": 0.95,
    "response_mime_type": "application/json",
    "response_schema": {"type":"OBJECT","properties":{"response_content":{"type":"STRING"}},"required":["response_content"]},
    }

safety_settings = [
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
]

multiturn_generate_content()

Output

1.5 Flash 001

candidates {
  content {
    role: "model"
    parts {
      text: "{\"response_content\": \"The `map()` function in JavaScript is a powerful tool for transforming arrays. It allows you to apply a given function to each element in an array, creating a new array with the transformed results.  \\n\\n**Here\'s how it works:**\\n\\n1. **The `map()` function takes a callback function as an argument.** This callback function is executed for each element in the original array. The callback function usually takes three parameters:\\n    * `currentValue`: The current element being processed.\\n    * `index`: The index of the current element in the array.\\n    * `array`: The original array itself. \\n\\n2. **The callback function performs some transformation on the `currentValue`.** This could be anything: adding a value, changing the data type, or applying a complex calculation.\\n\\n3. **The `map()` function returns a new array containing the transformed elements.** The original array remains unchanged.\\n\\n**Example:**\\n\\nLet\'s say you have an array of numbers and you want to double each number:\\n\\n"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.35739438961713743
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 234
  total_token_count: 264
}
model_version: "gemini-1.5-flash-001"

1.5 Flash 002

candidates {
  content {
    role: "model"
    parts {
      text: "{\"response_content\": \"The `map()` function in JavaScript is a higher-order function that allows you to iterate over an array and transform each element into a new value.  It returns a new array containing the transformed elements, leaving the original array unchanged. \\n\\n**Syntax:**\\n\\n"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.4850025475025177
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 64
  total_token_count: 94
}
model_version: "gemini-1.5-flash-002"

1.5 Pro 001

candidates {
  content {
    role: "model"
    parts {
      text: "{\n\"response_content\": \""
    }
  }
  finish_reason: STOP
  citation_metadata {
    citations {
      start_index: 1064
      end_index: 1187
      uri: "https://www.spritely.net/how-to-put-multiple-objects-in-one-var-array-javascript/"
    }
  }
  avg_logprobs: -3.1444194316864014
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 8
  total_token_count: 38
}
model_version: "gemini-1.5-pro-001"

Pro 1.5 002

candidates {
  content {
    role: "model"
    parts {
      text: "{\"response_content\": \""
    }
  }
  finish_reason: STOP
  citation_metadata {
    citations {
      start_index: 220
      end_index: 343
      uri: "https://falytom.com/javascript-array-methods-examples/"
    }
  }
  avg_logprobs: -11.301958719889322
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 6
  total_token_count: 36
}
model_version: "gemini-1.5-pro-002"

Code of Conduct

  • I agree to follow this project's Code of Conduct
@LBUPD33 LBUPD33 changed the title [Bug]: Controlled generation Fails with Gemini on Vertex AI truncating response [Bug]: Controlled generation fails truncating response when it includes code markdown Nov 4, 2024
@holtskinner
Copy link
Collaborator

Does this occur with gemini-1.5-pro as well? Or just gemini-1.5-flash?

@holtskinner holtskinner self-assigned this Nov 4, 2024
@LBUPD33
Copy link
Author

LBUPD33 commented Nov 4, 2024

Hi @holtskinner, it occurs on both of them, I updated Relevant log output to include gemini-1.5-pro-002, gemini-1.5-flash and gemini-1.5-pro response. I used the Python code provided and only changed model name.

@holtskinner
Copy link
Collaborator

Thanks for your feedback, I think this could be a bug in Controlled Generation relating to the markdown output. I'm going to report it to the product team.

@holtskinner
Copy link
Collaborator

@LBUPD33 Question on your use case, why do you need the model output to be in JSON since you're just reading in the one field response_content?

@LBUPD33
Copy link
Author

LBUPD33 commented Nov 11, 2024

@holtskinner The provided code is only for demonstrating the bug. I went to the most simple way to reproduce the bug.

The real use case was including multiple fields, such as response_content and response_type. Based on response_type value the frontend would have different behaviours.

@holtskinner
Copy link
Collaborator

Thanks for the context, this issue seems to be with how the Controlled Generation backend parses the markdown-formatted json blocks created by Gemini. The product team is working on a fix.

In the meantime, I found a workaround that should handle your use case (not using controlled generation and doing manual markdown parsing):

import vertexai
from vertexai.generative_models import (
    GenerativeModel,
    GenerationConfig,
    SafetySetting,
    Part,
)
import json


def extract_json_block(markdown_string: str) -> str | None:
    """
    Extracts the outermost JSON code block from a markdown string.

    Args:
      markdown_string: The markdown string to extract the code block from.

    Returns:
      The outer JSON code block if found, otherwise None.
    """
    pattern = r"```json\n(\{\n.*?\n\})\n```"
    match = re.search(pattern, markdown_string, re.DOTALL | re.MULTILINE)
    return match.group(1) if match else None


def multiturn_generate_content():
    vertexai.init(project=gcp_project, location="us-central1")
    model = GenerativeModel(
        "gemini-1.5-flash-002",
        system_instruction=[textsi_1],
    )
    chat = model.start_chat()
    response = chat.send_message(
        [
            """Use Markdown to format your answer. Could you explain me what is map() function in javascript and give me an example? Respond in the following JSON structure:
                {
                    "response_content": "Insert Answer Here",
                    "response_type": "markdown" # Or whatever this should be
                }
            """
        ],
        generation_config=generation_config,
        safety_settings=safety_settings,
    )
    json_block = extract_json_block(response.text)
    json_object = json.loads(json_block)
    print(json_object["response_content"])
    print(json_object["response_type"])


textsi_1 = """"""

generation_config = GenerationConfig(
    max_output_tokens=8192, temperature=0.5, top_p=0.95
)

safety_settings = [
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
]

multiturn_generate_content()

@LBUPD33
Copy link
Author

LBUPD33 commented Nov 11, 2024

@holtskinner Thank you, good news that the product team is working on a fix !
And thanks for the workaround.

@LBUPD33 LBUPD33 closed this as completed Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants