add txt loader

Lin-jun-xiang · Oct 2, 2023 · ca06de7 · ca06de7
1 parent 7dcb0e8
commit ca06de7
Show file tree

Hide file tree

Showing 4 changed files with 31 additions and 21 deletions.
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 
 [English](./README.md) | [中文版](./README.zh-TW.md)
 
-Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`), without the need for any keys or fees.
+Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`, `.txt`), without the need for any keys or fees.
 
 Additionally, you can deploy the app anywhere based on the document.
 
@@ -27,7 +27,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~
 
 ### 📚Introduction
 
-* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
+* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`, `.txt`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
 
 * Provide two models:
   * `gpt4free`
@@ -46,8 +46,8 @@ If you like this project, please give it a ⭐`Star` to support the developers~
 ### 🧨Features
 
 - **`gpt4free` Integration**: Everyone can use `docGPT` for **free** without needing an OpenAI API key.
-- **Support docx, pdf file**: Users can upload PDF or Word file.
-- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf`, `.docx` or `.csv` files.
+- **Support docx, pdf, csv, txt file**: Users can upload PDF, Word, CSV, txt file.
+- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading document files(see the demo).
 - **Langchain Agent**: Enables AI to answer current questions and achieve Google search-like functionality.
 - **User-Friendly Environment**: Easy-to-use interface for simple operations.
 
@@ -93,7 +93,7 @@ Through LangChain, you can create a universal AI model or tailor it for business
    - `SERPAPI API KEY`: Required if you want to query content not present in the Document.
 
 3. 📁Upload a Document file (choose one method)
-    * Method 1: Browse and upload your own `.pdf`, `.docx` or `.csv` file from your local machine.
+    * Method 1: Browse and upload your own `.pdf`, `.docx`, `.csv`, `.txt` file from your local machine.
     * Method 2: Enter the Document `URL` link directly.
 
 4. 🚀Start asking questions!

diff --git a/README.zh-TW.md b/README.zh-TW.md
@@ -4,7 +4,7 @@
 
 [English](./README.md) | [中文版](./README.zh-TW.md)
 
-免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`) 進行對話，無需任何金鑰或費用。
+免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`, `.txt`) 進行對話，無需任何金鑰或費用。
 
 此外，您也可以根據該文件操作，將程序部屬在任何地方。
 
@@ -26,7 +26,7 @@
 
 ### 📚Introduction
 
-* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`)，並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
+* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`, `.txt`)，並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
 * 提供兩種模型選擇:
   * `gpt4free`
     * **完全免費，"允許使用者在無需輸入 API 金鑰或付款的情況下使用該應用程序"**
@@ -44,8 +44,8 @@
 ### 🧨Features
 
 - **`gpt4free` 整合**：任何人都可以免費使用 GPT4，無需輸入 OpenAI API 金鑰。
-- **支援 docx, pdf 檔案**: 可以上傳 PDF or Word 檔
-- **直接輸入 Document 網址**：使用者可以直接輸入 Document 網址進行解析，無需從本地上傳 `.pdf`, `.docx` or `.csv` 檔案。
+- **支援 docx, pdf, csv, txt 檔案**: 可以上傳 PDF, Word, CSV, txt 檔
+- **直接輸入 Document 網址**：使用者可以直接輸入 Document URL 進行解析，無需從本地上傳檔案(如下方demo所示)。
 - **Langchain Agent**：AI 能夠回答當前問題，實現類似 Google 搜尋功能。
 - **簡易操作環境**：友善的界面，操作簡便
 
@@ -92,7 +92,7 @@ LangChain 填補了 ChatGPT 的不足之處。通過以下示例，您可以理
     * `SERPAPI API KEY`: 如果您要查詢 Document 中不存在的內容，則需要使用此金鑰。
 
 3. 📁上傳來自本地的 Document 檔案 (選擇一個方法)
-    * 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx` or `.csv` 檔
+    * 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx`, `.csv` or `.txt` 檔
     * 方法二: 輸入 Document URL 連結
 
 4. 🚀開始提問 ! 

diff --git a/app.py b/app.py
@@ -39,12 +39,14 @@ def theme() -> None:
         with st.expander(':orange[How to use?]'):
             st.markdown(
                 """
-                1. Enter your API keys: (You can choose to skip it and use the `gpt4free` free model)
+                1. Enter your API keys: (You can use the `gpt4free` free model **without API keys**)
                     * `OpenAI API Key`: Make sure you still have usage left
                     * `SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
-                2. Upload a Document file (choose one method):
-                    * method1: Browse and upload your own `.pdf or .docx` file from your local machine.
-                    * method2: Enter the PDF or DOCX `URL` link directly.
+                2. **Upload a Document** file (choose one method):
+                    * method1: Browse and upload your own document file from your local machine.
+                    * method2: Enter the document URL link directly.
+                    
+                    (**support documents**: `.pdf`, `.docx`, `.csv`, `.txt`)
                 3. Start asking questions!
                 4. More details.(https://github.com/Lin-jun-xiang/docGPT-streamlit)
                 5. If you have any questions, feel free to leave comments and engage in discussions.(https://github.com/Lin-jun-xiang/docGPT-streamlit/issues)
@@ -108,22 +110,22 @@ def load_api_key() -> None:
 
 
 def upload_and_process_document() -> list:
-    st.write('#### Upload a Document file (PDF, DOCX, CSV)')
+    st.write('#### Upload a Document file')
     browse, url_link = st.tabs(
         ['Drag and drop file (Browse files)', 'Enter document URL link']
     )
     with browse:
         upload_file = st.file_uploader(
-            'Browse file (.pdf, .docx, .csv)',
-            type=['pdf', 'docx', 'csv'],
+            'Browse file (.pdf, .docx, .csv, `.txt`)',
+            type=['pdf', 'docx', 'csv', 'txt'],
             label_visibility='hidden'
         )
         filetype = os.path.splitext(upload_file.name)[1].lower() if upload_file else None
         upload_file = upload_file.read() if upload_file else None
 
     with url_link:
         doc_url = st.text_input(
-            "Enter document URL Link (.pdf, .docx, .csv)",
+            "Enter document URL Link (.pdf, .docx, .csv, .txt)",
             placeholder='https://www.xxx/uploads/file.pdf',
             label_visibility='hidden'
         )

diff --git a/model/data_connection.py b/model/data_connection.py
@@ -3,7 +3,12 @@
 
 import requests
 import streamlit as st
-from langchain.document_loaders import CSVLoader, Docx2txtLoader, PyMuPDFLoader
+from langchain.document_loaders import (
+    CSVLoader,
+    Docx2txtLoader,
+    PyMuPDFLoader,
+    TextLoader,
+)
 from langchain.text_splitter import RecursiveCharacterTextSplitter
 
 
@@ -22,7 +27,7 @@ def get_files(path: str, filetype: str = '.pdf') -> Iterator[str]:
     def load_documents(
         file: str,
         filetype: str = '.pdf'
-    ) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader]:
+    ) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader, TextLoader]:
         """Loading PDF, Docx, CSV"""
         try:
             if filetype == '.pdf':
@@ -31,15 +36,18 @@ def load_documents(
                 loader = Docx2txtLoader(file)
             elif filetype == '.csv':
                 loader = CSVLoader(file, encoding='utf-8')
+            elif filetype == '.txt':
+                loader = TextLoader(file, encoding='utf-8')
 
             return loader.load()
+
         except Exception as e:
             print(f'\033[31m{e}')
             return []
 
     @staticmethod
     def split_documents(
-        document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader],
+        document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader, TextLoader],
         chunk_size: int=2000,
         chunk_overlap: int=0
     ) -> list: