[Bug]: Controlled generation fails truncating response when it includes code markdown #1372

LBUPD33 · 2024-11-04T13:25:17Z

File Name

GCP Vertex Studio & Google Colab

What happened?

Context
Trying to reproduce a Gemini Chatbot for internal use: dedicated system instructions, restricted access (using gauth).

Technical constraints

Output need to be a consistent JSON.
Output need to include a response with markdown formatted content.

Bug

Output get truncated when it tries to include markdown code.

Tests done

No controlled generation (plain text) but include JSON Schema in prompt = Working fine BUT Inconsistent Output
Controlled generation (app/json), with response schema and without system instructions = Still truncating response

Vertex Studio Params

Model: Gemini 1.5 (flash and pro don't work)
Type: Chat
System instruction: NONE
Generation Config:
- Output format / response_mime_type: application/json
- response_schema: {"type":"OBJECT","properties":{"response_content":{"type":"STRING"}},"required":["response_content"]}
Safety Settings: OFF
Prompt: "Use Markdown to format your answer. Could you explain me what is map() function in javascript and give me an example?"

Python code

import base64
import vertexai
from vertexai.generative_models import GenerativeModel, SafetySetting, Part

def multiturn_generate_content():
    vertexai.init(project=gcp_project, location="europe-west9")
    model = GenerativeModel(
        "gemini-1.5-flash-002",
        system_instruction=[textsi_1],
        )
    chat = model.start_chat()
    print(chat.send_message(
        ["""Use Markdown to format your answer. Could you explain me what is map() function in javascript and give me an example?"""],
        generation_config=generation_config,
        safety_settings=safety_settings
    ))

textsi_1 = """"""

generation_config = {
    "max_output_tokens": 8192,
    "temperature": 0.5,
    "top_p": 0.95,
    "response_mime_type": "application/json",
    "response_schema": {"type":"OBJECT","properties":{"response_content":{"type":"STRING"}},"required":["response_content"]},
    }

safety_settings = [
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
]

multiturn_generate_content()

Output

1.5 Flash 001

candidates {
  content {
    role: "model"
    parts {
      text: "{\"response_content\": \"The `map()` function in JavaScript is a powerful tool for transforming arrays. It allows you to apply a given function to each element in an array, creating a new array with the transformed results.  \\n\\n**Here\'s how it works:**\\n\\n1. **The `map()` function takes a callback function as an argument.** This callback function is executed for each element in the original array. The callback function usually takes three parameters:\\n    * `currentValue`: The current element being processed.\\n    * `index`: The index of the current element in the array.\\n    * `array`: The original array itself. \\n\\n2. **The callback function performs some transformation on the `currentValue`.** This could be anything: adding a value, changing the data type, or applying a complex calculation.\\n\\n3. **The `map()` function returns a new array containing the transformed elements.** The original array remains unchanged.\\n\\n**Example:**\\n\\nLet\'s say you have an array of numbers and you want to double each number:\\n\\n"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.35739438961713743
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 234
  total_token_count: 264
}
model_version: "gemini-1.5-flash-001"

1.5 Flash 002

candidates {
  content {
    role: "model"
    parts {
      text: "{\"response_content\": \"The `map()` function in JavaScript is a higher-order function that allows you to iterate over an array and transform each element into a new value.  It returns a new array containing the transformed elements, leaving the original array unchanged. \\n\\n**Syntax:**\\n\\n"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.4850025475025177
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 64
  total_token_count: 94
}
model_version: "gemini-1.5-flash-002"

1.5 Pro 001

candidates {
  content {
    role: "model"
    parts {
      text: "{\n\"response_content\": \""
    }
  }
  finish_reason: STOP
  citation_metadata {
    citations {
      start_index: 1064
      end_index: 1187
      uri: "https://www.spritely.net/how-to-put-multiple-objects-in-one-var-array-javascript/"
    }
  }
  avg_logprobs: -3.1444194316864014
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 8
  total_token_count: 38
}
model_version: "gemini-1.5-pro-001"

Pro 1.5 002

candidates {
  content {
    role: "model"
    parts {
      text: "{\"response_content\": \""
    }
  }
  finish_reason: STOP
  citation_metadata {
    citations {
      start_index: 220
      end_index: 343
      uri: "https://falytom.com/javascript-array-methods-examples/"
    }
  }
  avg_logprobs: -11.301958719889322
}
usage_metadata {
  prompt_token_count: 30
  candidates_token_count: 6
  total_token_count: 36
}
model_version: "gemini-1.5-pro-002"

Code of Conduct

I agree to follow this project's Code of Conduct

holtskinner · 2024-11-04T18:13:51Z

Does this occur with gemini-1.5-pro as well? Or just gemini-1.5-flash?

LBUPD33 · 2024-11-04T18:17:35Z

Hi @holtskinner, it occurs on both of them, I updated Relevant log output to include gemini-1.5-pro-002, gemini-1.5-flash and gemini-1.5-pro response. I used the Python code provided and only changed model name.

holtskinner · 2024-11-11T16:20:02Z

Thanks for your feedback, I think this could be a bug in Controlled Generation relating to the markdown output. I'm going to report it to the product team.

holtskinner · 2024-11-11T17:59:47Z

@LBUPD33 Question on your use case, why do you need the model output to be in JSON since you're just reading in the one field response_content?

LBUPD33 · 2024-11-11T18:19:59Z

@holtskinner The provided code is only for demonstrating the bug. I went to the most simple way to reproduce the bug.

The real use case was including multiple fields, such as response_content and response_type. Based on response_type value the frontend would have different behaviours.

holtskinner · 2024-11-11T18:47:46Z

Thanks for the context, this issue seems to be with how the Controlled Generation backend parses the markdown-formatted json blocks created by Gemini. The product team is working on a fix.

In the meantime, I found a workaround that should handle your use case (not using controlled generation and doing manual markdown parsing):

import vertexai
from vertexai.generative_models import (
    GenerativeModel,
    GenerationConfig,
    SafetySetting,
    Part,
)
import json


def extract_json_block(markdown_string: str) -> str | None:
    """
    Extracts the outermost JSON code block from a markdown string.

    Args:
      markdown_string: The markdown string to extract the code block from.

    Returns:
      The outer JSON code block if found, otherwise None.
    """
    pattern = r"```json\n(\{\n.*?\n\})\n```"
    match = re.search(pattern, markdown_string, re.DOTALL | re.MULTILINE)
    return match.group(1) if match else None


def multiturn_generate_content():
    vertexai.init(project=gcp_project, location="us-central1")
    model = GenerativeModel(
        "gemini-1.5-flash-002",
        system_instruction=[textsi_1],
    )
    chat = model.start_chat()
    response = chat.send_message(
        [
            """Use Markdown to format your answer. Could you explain me what is map() function in javascript and give me an example? Respond in the following JSON structure:
                {
                    "response_content": "Insert Answer Here",
                    "response_type": "markdown" # Or whatever this should be
                }
            """
        ],
        generation_config=generation_config,
        safety_settings=safety_settings,
    )
    json_block = extract_json_block(response.text)
    json_object = json.loads(json_block)
    print(json_object["response_content"])
    print(json_object["response_type"])


textsi_1 = """"""

generation_config = GenerationConfig(
    max_output_tokens=8192, temperature=0.5, top_p=0.95
)

safety_settings = [
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=SafetySetting.HarmBlockThreshold.OFF,
    ),
]

multiturn_generate_content()

LBUPD33 · 2024-11-11T19:10:01Z

@holtskinner Thank you, good news that the product team is working on a fix !
And thanks for the workaround.

LBUPD33 changed the title ~~[Bug]: Controlled generation Fails with Gemini on Vertex AI truncating response~~ [Bug]: Controlled generation fails truncating response when it includes code markdown Nov 4, 2024

holtskinner self-assigned this Nov 4, 2024

LBUPD33 closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Controlled generation fails truncating response when it includes code markdown #1372

[Bug]: Controlled generation fails truncating response when it includes code markdown #1372

LBUPD33 commented Nov 4, 2024 •

edited by holtskinner

Loading

holtskinner commented Nov 4, 2024

LBUPD33 commented Nov 4, 2024 •

edited

Loading

holtskinner commented Nov 11, 2024

holtskinner commented Nov 11, 2024

LBUPD33 commented Nov 11, 2024

holtskinner commented Nov 11, 2024

LBUPD33 commented Nov 11, 2024

[Bug]: Controlled generation fails truncating response when it includes code markdown #1372

[Bug]: Controlled generation fails truncating response when it includes code markdown #1372

Comments

LBUPD33 commented Nov 4, 2024 • edited by holtskinner Loading

File Name

What happened?

Output

Code of Conduct

holtskinner commented Nov 4, 2024

LBUPD33 commented Nov 4, 2024 • edited Loading

holtskinner commented Nov 11, 2024

holtskinner commented Nov 11, 2024

LBUPD33 commented Nov 11, 2024

holtskinner commented Nov 11, 2024

LBUPD33 commented Nov 11, 2024

LBUPD33 commented Nov 4, 2024 •

edited by holtskinner

Loading

LBUPD33 commented Nov 4, 2024 •

edited

Loading