Skip to content

Commit

Permalink
docs: Features into their own markdown pages (#881)
Browse files Browse the repository at this point in the history
Co-authored-by: Arpit Gaur <arpitgaur@microsoft.com>
  • Loading branch information
ross-p-smith and gaurarpit authored May 13, 2024
1 parent 221ea4a commit cacb7c3
Show file tree
Hide file tree
Showing 58 changed files with 89 additions and 96 deletions.
60 changes: 14 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,14 @@ urlFragment: chat-with-your-data-solution-accelerator

# Chat with your data - Solution accelerator

<!-- [**USER STORY**](#user-story) | [**DEPLOY**](#Deploy) | [**SUPPORTING DOCUMENTATION**](#supporting-documentation) | [**CUSTOMER TRUTH**](#customer-truth)\ -->

##### Table of Contents
* [User story](#user-story)
+ [About this repo](#about-this-repo)
+ [When should you use this repo](#when-should-you-use-this-repo)
+ [Key features](#key-features)
+ [Supported file types](#supported-file-types)
+ [Target end users](#target-end-users)
+ [Industry scenario](#industry-scenario)
+ [Speech-to-text functionality](#speech-to-text-functionality)
+ [Teams extension](#teams-extension)
* [Deploy](#deploy)
+ [Pre-requisites](#pre-requisites)
+ [Products used](#products-used)
Expand All @@ -51,11 +48,11 @@ urlFragment: chat-with-your-data-solution-accelerator
* [Disclaimers](#disclaimers)
\
\
![User Story](/media/userStory.png)
![User Story](/docs/images/userStory.png)
## User story
Welcome to the *Chat with your data* Solution accelerator repository! The *Chat with your data* Solution accelerator is a powerful tool that combines the capabilities of Azure AI Search and Large Language Models (LLMs) to create a conversational search experience. This solution accelerator uses an Azure OpenAI GPT model and an Azure AI Search index generated from your data, which is integrated into a web application to provide a natural language interface, including speech-to-text functionality, for search queries. Users can drag and drop files, point to storage, and take care of technical setup to transform documents. There is a web app that users can create in their own subscription with security and authentication.
Welcome to the *Chat with your data* Solution accelerator repository! The *Chat with your data* Solution accelerator is a powerful tool that combines the capabilities of Azure AI Search and Large Language Models (LLMs) to create a conversational search experience. This solution accelerator uses an Azure OpenAI GPT model and an Azure AI Search index generated from your data, which is integrated into a web application to provide a natural language interface, including [speech-to-text](docs/speech_to_text.md) functionality, for search queries. Users can drag and drop files, point to storage, and take care of technical setup to transform documents. Everything can be deployed in your own subscription to accelerate your use of this technology.

![Solution Architecture - Chat with your data](/media/cwyd-solution-architecture.png)
![Solution Architecture - Chat with your data](/docs/images/cwyd-solution-architecture.png)

### About this repo

Expand All @@ -77,6 +74,7 @@ The accelerator presented here provides several options, for example:
* A backend that mimics the [On Your Data](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data) flow, with the ability to switch to a custom backend
* Advanced prompt engineering capabilities
* An admin site for ingesting/inspecting/configuring your dataset on the fly
* Push or Pull model for data ingestion: See [integrated vectorization](./docs/integrated_vectorization.md) documentation for more details
* Running a Retrieval Augmented Generation (RAG) solution locally

*Have you seen [ChatGPT + Enterprise data with Azure OpenAI and AI Search demo](https://github.com/Azure-Samples/azure-search-openai-demo)? If you would like to experiment: Play with prompts, understanding RAG pattern different implementation approaches, see how different features interact with the RAG pattern and choose the best options for your RAG deployments, take a look at that repo.
Expand All @@ -94,25 +92,14 @@ Here is a comparison table with a few features offered by Azure, an available Gi

### Key features
- **Private LLM access on your data**: Get all the benefits of ChatGPT on your private, unstructured data.
- **Single application access to your full data set**: Minimize endpoints required to access internal company knowledgebases
- **Natural language interaction with your unstructured data**: Use natural language to quickly find the answers you need and ask follow-up queries to get the supplemental details.
- **Single application access to your full data set**: Minimize endpoints required to access internal company knowledgebases. Reuse the same backend with the [Microsoft Teams Extension](docs/teams_extension.md)
- **Natural language interaction with your unstructured data**: Use natural language to quickly find the answers you need and ask follow-up queries to get the supplemental details, including [Speech-to-text](docs/speech_to_text.md).
- **Easy access to source documentation when querying**: Review referenced documents in the same chat window for additional context.
- **Data upload**: Batch upload documents
- **Data upload**: Batch upload documents of [various file types](docs/supported_file_types.md)
- **Accessible orchestration**: Prompt and document configuration (prompt engineering, document processing, and data retrieval)

**Note**: The current model allows users to ask questions about unstructured data, such as PDF, text, and docx files.

### Supported file types

Out-of-the-box, you can upload the following file types:
* PDF
* JPEG
* JPG
* PNG
* TXT
* HTML
* MD (Markdown)
* DOCX
**Note**: The current model allows users to ask questions about unstructured data, such as PDF, text, and docx files. See the [supported file types](docs/supported_file_types.md).

### Target end users
Company personnel (employees, executives) looking to research against internal unstructured company data would leverage this accelerator using natural language to find what they need quickly.
Expand All @@ -130,25 +117,7 @@ Now that the financial advisor is more informed about Woodgrove’s Emerging Mar

Note: Some of the sample data included with this accelerator was generated using AI and is for illustrative purposes only.

### Integrated Vectorization
The Chat with your data accelerator supports [Integrated Vectorization](https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization), an Azure AI Search feature that pulls, processes and indexes data directly from Azure Storage. This greatly simplifies the code that needs to be maintained by developers deploying the accelerator since the chunking, vectorization and indexing of data is handled by Azure AI's built-in pull-indexers. Learn more about deploying with Integrated Vectorization [here](./docs/INTEGRATED_VECTORIZATION.md).

### Speech-to-text functionality
Many users are used to the convenience of speech-to-text functionality in their consumer products. With hybrid work increasing, speech-to-text supports a more flexible way for users to chat with their data, whether they’re at their computer or on the go with their mobile device. The speech-to-text capability is combined with NLP capabilities to extract intent and context from spoken language, allowing the chatbot to understand and respond to user requests more intelligently.

![Web - Chat with unstructured data](/media/web-unstructureddata.png)Chat with your unstructured data

![Web - Get responses using natural language](/media/web-nlu.png)Get responses using natural language

### [Teams extension](./docs/TEAMS_EXTENSION.md)
By bringing the Chat with your data experience into Teams, users can stay within their current workflow and get the answers they need without switching platforms. Rather than building the Chat with your data accelerator within Teams from scratch, the same underlying backend used for the web application is leveraged within Teams.

Learn more about deploying the Teams extension [here](./docs/TEAMS_EXTENSION.md).

![Teams - Chat with your Data](/media/teams-cwyd.png)
\
\
![One-click Deploy](/media/oneClickDeploy.png)
![One-click Deploy](/docs/images/oneClickDeploy.png)
## Deploy
### Pre-requisites
- Azure subscription - [Create one for free](https://azure.microsoft.com/free/) with owner access.
Expand Down Expand Up @@ -202,19 +171,19 @@ switch to a lower version. To find out which versions are supported in different

Where `{RESOURCE_TOKEN}` is uniquely generated during deployment. This is a combination of your subscription and the name of the resource group. Then select **Ingest Data** and add your data. You can find sample data in the `/data` directory.

![A screenshot of the admin site.](./media/admin-site.png)
![A screenshot of the admin site.](./docs/images/admin-site.png)


2. Navigate to the web app to start chatting on top of your data. The web app can be found at:

`https://web-{RESOURCE_TOKEN}.azurewebsites.net/`


![A screenshot of the chat app.](./media/web-unstructureddata.png)
![A screenshot of the chat app.](./docs/images/web-unstructureddata.png)

\
\
![Supporting documentation](/media/supportingDocuments.png)
![Supporting documentation](/docs/images/supportingDocuments.png)
## Supporting documentation

### Resource links
Expand All @@ -235,8 +204,7 @@ This solution accelerator deploys the following resources. It's critical to comp
This repository is licensed under the [MIT License](LICENSE.md).

The data set under the /data folder is licensed under the [CDLA-Permissive-2 License](CDLA-Permissive-2.md).
\
\

## Disclaimers
This Software requires the use of third-party components which are governed by separate proprietary or open-source licenses as identified below, and you must comply with the terms of each applicable license in order to use the Software. You acknowledge and agree that this license does not grant you a license or other right to use any such third-party proprietary or open-source components.

Expand Down
6 changes: 3 additions & 3 deletions docs/TEAMS_LOCAL_DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Or use the [Azure Functions VS Code extension](https://marketplace.visualstudio.
### Develop & run the Teams Frontend
1. Open the “extensions/teams” folder with Visual Studio Code

![Teams](/media/teams.png)
![Teams](images/teams.png)

2. Open the file env\\.env.local
3. Locate the environment variable _AZURE_FUNCTION_URL_.
Expand All @@ -33,14 +33,14 @@ Or use the [Azure Functions VS Code extension](https://marketplace.visualstudio.
```env
AZURE_FUNCTION_URL=http://localhost:7071/api/GetConversationResponse
```
![Env](/media/teams-local-3.png)
![Env](images/teams-local-3.png)
5. Save the file.
6. Select Teams Toolkit from the navigation panel.
7. Verify your signed into O365 and Azure with sideloading enabled.
8. Select the "play" button next to Local.
![Teams Toolkit](/media/teams-local-2.png)
![Teams Toolkit](images/teams-local-2.png)
## Environment variables
Expand Down
2 changes: 1 addition & 1 deletion docs/best_practices.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[Back to *Chat with your data* README](../README.md)

![Supporting documentation](../media/supportingDocuments.png)
![Supporting documentation](images/supportingDocuments.png)
# Best practices

**Evaluate your data first**
Expand Down
2 changes: 1 addition & 1 deletion docs/customer_truth.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[Back to *Chat with your data* README](../README.md)

![Customer truth](../media/customerTruth.png)
![Customer truth](images/customerTruth.png)
# Customer truth
Customer stories coming soon. For early access, contact: fabrizio.ruocco@microsoft.com
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[**USER STORY**](#user-story) | [**DEPLOYMENT INSTRUCTIONS**](#local-deployment-instructions)
\
\
![User Story](/media/userStory.png)
![User Story](images/userStory.png)
## User Story
This feature allows chunking and vectorization of data during ingestion into Azure AI Search through built-in pull-indexers. It supports automatic processing of data directly from storage - meaning the user can just upload their data to Azure Blob Storage and the built-in pull-indexers will do the chunking, vectorization and indexing. This removes the need for Chat With Your Data to explicitly perform chunking, vectorization and pushing to the search index. Read [more](https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization).

Expand All @@ -15,7 +15,7 @@ This feature allows chunking and vectorization of data during ingestion into Azu

## Using the Deploy to Azure button
When you click the "Deploy to Azure" button on the repo's main page, you will be taken to the Azure portal, where you can select "true" for the option "Azure Search Use Integrated Vectorization".
![Integrated Vectorization](/media/azure-search-use-iv.png)
![Integrated Vectorization](images/azure-search-use-iv.png)


## Local Deployment - If deploying for the first time
Expand All @@ -30,7 +30,7 @@ If you have previously deployed Chat With Your Data without Integrated Vectoriza

1. On your Azure portal, navigate to the resource group which has your Chat With Your Data deployment.
1. Delete the existing search index.
![Delete Search Index](/media/delete-search-index.png)
![Delete Search Index](images/delete-search-index.png)
1. Run the command `azd env set AZURE_SEARCH_USE_INTEGRATED_VECTORIZATION true`
1. Run `azd up`

Expand All @@ -40,12 +40,12 @@ If you have a deployment with Integrated Vectorization enabled, and you want to

1. On your Azure portal, navigate to the resource group which has your Chat With Your Data deployment.
1. Delete the existing search index.
![Delete Search Index](/media/delete-search-index.png)
![Delete Search Index](images/delete-search-index.png)
1. Delete the existing indexer.
![Delete Search Index](/media/delete-search-indexer.png)
![Delete Search Index](images/delete-search-indexer.png)
1. Delete the existing skillset.
![Delete Search Index](/media/delete-search-skillset.png)
![Delete Search Index](images/delete-search-skillset.png)
1. Delete the existing datasource.
![Delete Search Index](/media/delete-search-datasource.png)
![Delete Search Index](images/delete-search-datasource.png)
1. Run the command `azd env set AZURE_SEARCH_USE_INTEGRATED_VECTORIZATION false`
1. Run `azd up`
9 changes: 9 additions & 0 deletions docs/speech_to_text.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[Back to *Chat with your data* README](../README.md)

![User Story](images/userStory.png)
# Speech-to-text functionality
Many users are used to the convenience of speech-to-text functionality in their consumer products. With hybrid work increasing, speech-to-text supports a more flexible way for users to chat with their data, whether they’re at their computer or on the go with their mobile device. The speech-to-text capability is combined with NLP capabilities to extract intent and context from spoken language, allowing the chatbot to understand and respond to user requests more intelligently.

![Web - Chat with unstructured data](images/web-unstructureddata.png)Chat with your unstructured data

![Web - Get responses using natural language](images/web-nlu.png)Get responses using natural language
14 changes: 14 additions & 0 deletions docs/supported_file_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[Back to *Chat with your data* README](../README.md)

![User Story](images/userStory.png)
# Supported file types

Out-of-the-box, you can upload the following file types:
* PDF
* JPEG
* JPG
* PNG
* TXT
* HTML
* MD (Markdown)
* DOCX
Loading

0 comments on commit cacb7c3

Please sign in to comment.