Skip to content

Commit

Permalink
add txt loader
Browse files Browse the repository at this point in the history
  • Loading branch information
Lin-jun-xiang committed Oct 2, 2023
1 parent 7dcb0e8 commit ca06de7
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 21 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

[English](./README.md) | [中文版](./README.zh-TW.md)

Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`), without the need for any keys or fees.
Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`, `.txt`), without the need for any keys or fees.

Additionally, you can deploy the app anywhere based on the document.

Expand All @@ -27,7 +27,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~

### 📚Introduction

* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`, `.txt`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.

* Provide two models:
* `gpt4free`
Expand All @@ -46,8 +46,8 @@ If you like this project, please give it a ⭐`Star` to support the developers~
### 🧨Features

- **`gpt4free` Integration**: Everyone can use `docGPT` for **free** without needing an OpenAI API key.
- **Support docx, pdf file**: Users can upload PDF or Word file.
- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf`, `.docx` or `.csv` files.
- **Support docx, pdf, csv, txt file**: Users can upload PDF, Word, CSV, txt file.
- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading document files(see the demo).
- **Langchain Agent**: Enables AI to answer current questions and achieve Google search-like functionality.
- **User-Friendly Environment**: Easy-to-use interface for simple operations.

Expand Down Expand Up @@ -93,7 +93,7 @@ Through LangChain, you can create a universal AI model or tailor it for business
- `SERPAPI API KEY`: Required if you want to query content not present in the Document.

3. 📁Upload a Document file (choose one method)
* Method 1: Browse and upload your own `.pdf`, `.docx` or `.csv` file from your local machine.
* Method 1: Browse and upload your own `.pdf`, `.docx`, `.csv`, `.txt` file from your local machine.
* Method 2: Enter the Document `URL` link directly.

4. 🚀Start asking questions!
Expand Down
10 changes: 5 additions & 5 deletions README.zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

[English](./README.md) | [中文版](./README.zh-TW.md)

免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`) 進行對話,無需任何金鑰或費用。
免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`, `.txt`) 進行對話,無需任何金鑰或費用。

此外,您也可以根據該文件操作,將程序部屬在任何地方。

Expand All @@ -26,7 +26,7 @@

### 📚Introduction

* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`),並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`, `.txt`),並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
* 提供兩種模型選擇:
* `gpt4free`
* **完全免費,"允許使用者在無需輸入 API 金鑰或付款的情況下使用該應用程序"**
Expand All @@ -44,8 +44,8 @@
### 🧨Features

- **`gpt4free` 整合**:任何人都可以免費使用 GPT4,無需輸入 OpenAI API 金鑰。
- **支援 docx, pdf 檔案**: 可以上傳 PDF or Word 檔
- **直接輸入 Document 網址**:使用者可以直接輸入 Document 網址進行解析,無需從本地上傳 `.pdf`, `.docx` or `.csv` 檔案
- **支援 docx, pdf, csv, txt 檔案**: 可以上傳 PDF, Word, CSV, txt
- **直接輸入 Document 網址**:使用者可以直接輸入 Document URL 進行解析,無需從本地上傳檔案(如下方demo所示)
- **Langchain Agent**:AI 能夠回答當前問題,實現類似 Google 搜尋功能。
- **簡易操作環境**:友善的界面,操作簡便

Expand Down Expand Up @@ -92,7 +92,7 @@ LangChain 填補了 ChatGPT 的不足之處。通過以下示例,您可以理
* `SERPAPI API KEY`: 如果您要查詢 Document 中不存在的內容,則需要使用此金鑰。

3. 📁上傳來自本地的 Document 檔案 (選擇一個方法)
* 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx` or `.csv`
* 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx`, `.csv` or `.txt`
* 方法二: 輸入 Document URL 連結

4. 🚀開始提問 !
Expand Down
18 changes: 10 additions & 8 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,14 @@ def theme() -> None:
with st.expander(':orange[How to use?]'):
st.markdown(
"""
1. Enter your API keys: (You can choose to skip it and use the `gpt4free` free model)
1. Enter your API keys: (You can use the `gpt4free` free model **without API keys**)
* `OpenAI API Key`: Make sure you still have usage left
* `SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
2. Upload a Document file (choose one method):
* method1: Browse and upload your own `.pdf or .docx` file from your local machine.
* method2: Enter the PDF or DOCX `URL` link directly.
2. **Upload a Document** file (choose one method):
* method1: Browse and upload your own document file from your local machine.
* method2: Enter the document URL link directly.
(**support documents**: `.pdf`, `.docx`, `.csv`, `.txt`)
3. Start asking questions!
4. More details.(https://github.com/Lin-jun-xiang/docGPT-streamlit)
5. If you have any questions, feel free to leave comments and engage in discussions.(https://github.com/Lin-jun-xiang/docGPT-streamlit/issues)
Expand Down Expand Up @@ -108,22 +110,22 @@ def load_api_key() -> None:


def upload_and_process_document() -> list:
st.write('#### Upload a Document file (PDF, DOCX, CSV)')
st.write('#### Upload a Document file')
browse, url_link = st.tabs(
['Drag and drop file (Browse files)', 'Enter document URL link']
)
with browse:
upload_file = st.file_uploader(
'Browse file (.pdf, .docx, .csv)',
type=['pdf', 'docx', 'csv'],
'Browse file (.pdf, .docx, .csv, `.txt`)',
type=['pdf', 'docx', 'csv', 'txt'],
label_visibility='hidden'
)
filetype = os.path.splitext(upload_file.name)[1].lower() if upload_file else None
upload_file = upload_file.read() if upload_file else None

with url_link:
doc_url = st.text_input(
"Enter document URL Link (.pdf, .docx, .csv)",
"Enter document URL Link (.pdf, .docx, .csv, .txt)",
placeholder='https://www.xxx/uploads/file.pdf',
label_visibility='hidden'
)
Expand Down
14 changes: 11 additions & 3 deletions model/data_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@

import requests
import streamlit as st
from langchain.document_loaders import CSVLoader, Docx2txtLoader, PyMuPDFLoader
from langchain.document_loaders import (
CSVLoader,
Docx2txtLoader,
PyMuPDFLoader,
TextLoader,
)
from langchain.text_splitter import RecursiveCharacterTextSplitter


Expand All @@ -22,7 +27,7 @@ def get_files(path: str, filetype: str = '.pdf') -> Iterator[str]:
def load_documents(
file: str,
filetype: str = '.pdf'
) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader]:
) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader, TextLoader]:
"""Loading PDF, Docx, CSV"""
try:
if filetype == '.pdf':
Expand All @@ -31,15 +36,18 @@ def load_documents(
loader = Docx2txtLoader(file)
elif filetype == '.csv':
loader = CSVLoader(file, encoding='utf-8')
elif filetype == '.txt':
loader = TextLoader(file, encoding='utf-8')

return loader.load()

except Exception as e:
print(f'\033[31m{e}')
return []

@staticmethod
def split_documents(
document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader],
document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader, TextLoader],
chunk_size: int=2000,
chunk_overlap: int=0
) -> list:
Expand Down

0 comments on commit ca06de7

Please sign in to comment.