LegalLLM: A Multi-Task Large Language Model for US Legal Analytics

Current Status

LegalLLM has successfully implemented its core functionalities:

Similar Case Retrieval (SCR): Users can retrieve relevant cases based on input queries or details.
Precedent Case Recommendation (PCR): Identifies and explains the most applicable precedent cases.
Legal Judgment Prediction (LJP): Predicts judicial outcomes with high accuracy based on historical data.

Pending Feature

The ability for users to upload PDFs or images of case documents for analysis is under development.

Baseline Modules Description

1. Similar Case Retrieval (SCR)

Purpose: Identifies cases similar to the user-provided input.
Implementation: Uses semantic similarity algorithms on the CaseLaw dataset.
Output: Retrieves a ranked list of similar cases along with summaries and metadata.

2. Precedent Case Recommendation (PCR)

Purpose: Recommends relevant precedent cases for the input context.
Implementation: Fine-tuned Llama 3.0 on legal texts identifies critical precedents.
Output: Presents the top precedents with detailed relevance explanations.

3. Legal Judgment Prediction (LJP)

Purpose: Predicts potential outcomes of cases based on the input.
Implementation: Utilizes transformer models from the Hugging Face library to make predictions.
Output: Provides predicted verdicts with associated confidence scores.

Screenshots

References

Challenges Encountered

1. Running on Local Machine

Problem: The large model required for training and inference exceeded the computational capacity of standard local hardware.
Reason: Running the entire system on a local machine resulted in slow processing times and limited scalability.

2. Embedding Creation

Problem: Creating embeddings for the legal dataset was extremely time-consuming.
Reason: The extensive size of the dataset and the complexity of generating high-quality embeddings using large models.

3. Model Accuracy

Problem: Ensuring the chatbot provided accurate and relevant responses consistently.
Reason: Limitations in the underlying model and inconsistencies in the quality of retrieved legal documents.

Solutions Implemented

1. Optimized Embedding Process

Solution: Used the Llama 3.1 model for generating embeddings and stored them efficiently in ChromaDB.
Impact: Significantly reduced the time required for embedding creation and enabled faster retrieval of relevant documents.

2. Improved Model Accuracy

Solution: Applied prompt engineering techniques to refine the chatbot's responses.
Impact: Enhanced the accuracy and relevance of responses by ensuring the model adhered to specific instructions and context.

Plans to Overcome Challenges and Move Forward

1. PDF/Image Upload Feature

Plan: Leverage OCR tools (e.g., Tesseract, Google Cloud Vision) to extract and process text.
Next Steps: Develop pre-processing pipelines to clean and format the extracted data.

2. Scalability Improvements

Plan: Migrate the application to a cloud-based infrastructure (e.g., AWS, Google Cloud) to handle large-scale operations.
Next Steps: Set up auto-scaling instances to manage computational loads and optimize resource utilization.

3. Advanced Model Fine-Tuning

Plan: Fine-tune the Llama 3.1 model using domain-specific legal datasets for better contextual understanding.
Next Steps: Collect additional labeled data, perform transfer learning, and validate performance improvements.

How to Contribute

Contributions are welcome!

Submit issues or feature requests via the GitHub Issues page.

Contact Information:
For queries, reach out to the team:

Danishbir Singh Bhatti(017521647): danishbirsingh.bhatti@sjsu.edu

Jay Shon(017553289): seojun.shon@sjsu.edu

Anthony Kommareddy(017506957): anthonysandeshreddy.kommareddy@sjsu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
.lightning_studio		.lightning_studio
.vscode		.vscode
__pycache__		__pycache__
data		data
images		images
nltk_data		nltk_data
.bash_history		.bash_history
.git-credentials		.git-credentials
.gitignore		.gitignore
For_testing.py		For_testing.py
README.md		README.md
architecture.py		architecture.py
config.json		config.json
evaluate.py		evaluate.py
main.py		main.py
model_comparrison.ipynb		model_comparrison.ipynb
requirements.txt		requirements.txt
vectorize_documents.py		vectorize_documents.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LegalLLM: A Multi-Task Large Language Model for US Legal Analytics

Current Status

Pending Feature

Baseline Modules Description

1. Similar Case Retrieval (SCR)

2. Precedent Case Recommendation (PCR)

3. Legal Judgment Prediction (LJP)

Screenshots

References

Challenges Encountered

1. Running on Local Machine

2. Embedding Creation

3. Model Accuracy

Solutions Implemented

1. Optimized Embedding Process

2. Improved Model Accuracy

Plans to Overcome Challenges and Move Forward

1. PDF/Image Upload Feature

2. Scalability Improvements

3. Advanced Model Fine-Tuning

How to Contribute

About

Releases

Packages

Languages

danish7x7/LegalLLM

Folders and files

Latest commit

History

Repository files navigation

LegalLLM: A Multi-Task Large Language Model for US Legal Analytics

Current Status

Pending Feature

Baseline Modules Description

1. Similar Case Retrieval (SCR)

2. Precedent Case Recommendation (PCR)

3. Legal Judgment Prediction (LJP)

Screenshots

References

Challenges Encountered

1. Running on Local Machine

2. Embedding Creation

3. Model Accuracy

Solutions Implemented

1. Optimized Embedding Process

2. Improved Model Accuracy

Plans to Overcome Challenges and Move Forward

1. PDF/Image Upload Feature

2. Scalability Improvements

3. Advanced Model Fine-Tuning

How to Contribute

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages