-
Notifications
You must be signed in to change notification settings - Fork 48
IndexError: index 0 is out of bounds for axis 0 with size 0 #8
Comments
Hi @skuma307 thanks for reaching out. Let me ask two quick questions:
|
Thanks for your reply @kamil-kaczmarek ! I am using below code base: import numpy as np from embeddings import LocalHuggingFaceEmbeddings To download the files locally for processing, here's the command linewget -e robots=off --recursive --no-clobber --page-requisites --html-extension \--convert-links --restrict-file-names=windows \--domains docs.ray.io --no-parent https://docs.ray.io/en/master/FAISS_INDEX_PATH = "faiss_index_fast" loader = ReadTheDocsLoader("docs.ray.io/en/master/") text_splitter = RecursiveCharacterTextSplitter( @ray.remote(num_gpus=1) Stage one: read all the docs, split them into chunks.st = time.time() Theoretically, we could use Ray to accelerate this, but it's fast enough as is.chunks = text_splitter.create_documents( Stage two: embed the docs.print(f"Loading chunks into vector store ... using {db_shards} shards") st = time.time() Straight serial merge of others into results[0]db = results[0] st = time.time() I have created a virtual env on Python 3.9 on Windows. |
Hi, You need to make sure that you build DB first. Have a look at this script: https://github.com/ray-project/langchain-ray/blob/main/open_source_LLM_retrieval_qa/build_vector_store.py |
Thanks for your reply, but am I also using the same code I pasted above? Am I missing anything? I would appreciate your help. @kamil-kaczmarek |
@skuma307 you need to create embeddings store first. Please check these instructions for more details. |
@kamil-kaczmarek , when I run
|
Following the guidance in langchain-ai/chat-langchain#26 (comment), I fixed this error by:
diff --git a/open_source_LLM_retrieval_qa/build_vector_store.py b/open_source_LLM_retrieval_qa/build_vector_store.py
index e530b54..9a519a8 100644
--- a/open_source_LLM_retrieval_qa/build_vector_store.py
+++ b/open_source_LLM_retrieval_qa/build_vector_store.py
@@ -4,7 +4,7 @@ from typing import List
import numpy as np
import ray
-from langchain.document_loaders import ReadTheDocsLoader
+from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings.base import Embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
@@ -21,7 +21,7 @@ FAISS_INDEX_PATH = "faiss_index_fast"
db_shards = 8
ray.init()
-loader = ReadTheDocsLoader("docs.ray.io/en/master/")
+loader = UnstructuredURLLoader(urls=["https://docs.ray.io/en/master/"])
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show. |
I see a lot of users following the tutorial are getting same error It would be better to have requirements.txt file inside /langchain-ray/open_source_LLM_retrieval_qa/requirements.txt move to the move one level up and add instructions in the repo rather that inside retrieval_qa ; a lot of new users will also face same issue. Also, please include documentation links on how to spin up an ray cluster for all cloud platforms ; whether it's cluster.yaml or any other way. Writing that it's a hefty setup will not guide a user on how to do it ; |
Hi, thanks for the great work in the open-source space. I am facing the below error:
index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: index 0 is out of bounds for axis 0 with size 0
The faiss index is empty. There are no embeddings?
Can you help me debug this? I really appreciate any help you can provide.
The text was updated successfully, but these errors were encountered: