Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To do - TAI Course #1

Open
18 of 22 tasks
louisfb01 opened this issue Dec 7, 2023 · 1 comment
Open
18 of 22 tasks

To do - TAI Course #1

louisfb01 opened this issue Dec 7, 2023 · 1 comment

Comments

@louisfb01
Copy link
Collaborator

louisfb01 commented Dec 7, 2023

  • Create GitHub repo: https://github.com/towardsai/ai-tutor-rag-system
  • Create a paid account with OpenAI and get your API key.
    1. Add key to env variables
  • 1- Create an initial script for answering questions with OpenAI GPT 3.5. Basic function that takes an input, prompts around, and answers a question.
    1. Code from scratch
  • 2- Add super basic rag with numpy and cosine similarity and embeddings ada function and a small dataset example in json.
  • 3- Then compare with llamaindex to introduce it and why we use it in the future
    1. Update function to do RAG with llamaindex. Basic example but with llamaindex.
  • 4- Replace database from json to Chroma or other
      • script to embed and create a vector store from csvs
      • with basic chunking script (nb char)
  • 5-Improve prompting for the question and add sources (references)
  • 6- Advanced: write a script to create questions for the dataset with GPT4
    1. Script for evaluation script ragas or other with llamaindex or other
      1. Run evaluation
  • 7- Improve chunking (sections, titles in sections…)- re evaluate
  • 8- Script for Fine-tuning embedded based GPT4 generated questions above.
  • 9- Replace Ada with Cohere or a better embedding model or HF.- re evaluate
  • 10- Add reranking or open-source model (Cohere?)- re evaluate
  • 11- Add Hybrid search- re evaluate
  • 12- Improve query (reformulation, more details…)- re evaluate
  • 13- Add router for data source optimization - re evaluate
  • 14- Add chat feature to go back and forth with the bot ([llamaindex](https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/root.html))
  • 15- Replace open ai with open source models - re evaluate
  • 16- ??? Fine-tune open source model (but cannot be used commercially since dataset built using gpt 4)
  • 17- GPT4 (or 3.5) "judge" for re-ranking.
  • 18- Add visualization lessons? Based on https://itnext.io/visualize-your-rag-data-eda-for-retrieval-augmented-generation-0701ee98768f, and https://github.com/Renumics/rag-demo/blob/main/notebooks/visualize_rag_tutorial.ipynb
  • 19- Write the lessons from code built (notebooks initially, then we need to teach to replicate our full repo without giving it to them. They need to learn and work!). Report to the full syllabus (https://www.notion.so/seldonia/Full-syllabus-564070f715b2455d9a6b945b0b470c6b). Don't forget to re-use parts of the ebook we just did to save time.
  • 20- Add multi-lingual section with image showing embeddings are the same, so valuable to have content in different languages, etc...
@louisfb01 louisfb01 assigned AlaFalaki and omar-sol and unassigned AlaFalaki and omar-sol Dec 20, 2023
@louisfb01
Copy link
Collaborator Author

cc @AlaFalaki @omar-sol

@louisfb01 louisfb01 changed the title To do To do - TAI Course Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants