-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLLM.py
72 lines (48 loc) · 3.3 KB
/
LLM.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import os
import openai
"""
Your role: Expert python programmer, LLM Expert, LLAMAINDEX expert.
Let's create a simple LLAMA Index based python application with the following requirements:
For the Data Loader we will use the JSON dataloader: from llama_index.readers.json import JSONReader
The JSON Structure is a list of dicts with the following:
[
{
"start": 1.0999999999999996,
"end": 4.32,
"speaker": "SPEAKER_00",
"text": " ...fungerar, den tar upp allt ljud, schysst."
},
{
]
Metadata, start, end, speaker
document, text
For the LLM we will use together.ai: from llama_index.llms.together import TogetherLLM
For embeddings we will use Nomic: from llama_index.embeddings.nomic import NomicEmbedding
For vector storage we will use Chroma: from llama_index.vector_stores.chroma import ChromaVectorStore
When Retrieving documents from the vector store, we would like to ALSO collect nearby documents based on the following logic:
All ajacent documents with the same speaker should be included. The first document with a different speaker should be included on each end, such that the documents or chunks we pull look as follows:
{
Doc1: Speaker_01 - text
Doc2: Speaker_00 - text
Doc3: Speaker_00 - text
Doc4: Speaker_00 - text (target doc)
Doc5: Speaker_00 - text
Doc6: Speaker_02 -text
}
Once the documents are retrieved we will use a COMPACT chat response mode to query the LLM with a chat like interface.
https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/root.html
##Additional Requireements:
1) The initial document intake/chunking/embedding/vector storage should only be performed once. The resulting vector store should persist to local storage and be retrieved on subsequent runs if using the same target file.
https://github.com/run-llama/llama_index/blob/main/docs/examples/ingestion/document_management_pipeline.ipynb (example cache)
https://github.com/run-llama/llama_index/blob/main/docs/examples/ingestion/ingestion_gdrive.ipynb Ingestion from Gdrive (redis / cache)
2) Once loaded, the main loop should allow the user to continue to chat with the document. Including memory of prior responses.
##Documents and resources:
Concepts high level overview of llama index https://docs.llamaindex.ai/en/stable/understanding/understanding.html
API Reference: https://docs.llamaindex.ai/en/stable/api_reference/index.html
The Service context is important when customizing configurations (such as LLM / vector storage etc) https://docs.llamaindex.ai/en/stable/api_reference/service_context.html
3) Document summary index - create a summary on top of each index which can be used for querying first: https://github.com/run-llama/llama_index/blob/main/docs/examples/index_structs/doc_summary/DocSummary.ipynb
3) Fine tuning / training - https://github.com/run-llama/llama_index/blob/main/docs/examples/llama_dataset/downloading_llama_datasets.ipynb Run with GPT4 / fine tune lower model.
4) Semantic Chunking - https://github.com/run-llama/llama_index/blob/main/docs/examples/node_parsers/semantic_chunking.ipynb
5) Llama index python package registry - https://pretty-sodium-5e0.notion.site/ce81b247649a44e4b6b35dfb24af28a6?v=53b3c2ced7bb4c9996b81b83c9f01139
6) JSON Loader - Langchain - https://python.langchain.com/docs/modules/data_connection/document_loaders/json
"""