LegalLLM has successfully implemented its core functionalities:
- Similar Case Retrieval (SCR): Users can retrieve relevant cases based on input queries or details.
- Precedent Case Recommendation (PCR): Identifies and explains the most applicable precedent cases.
- Legal Judgment Prediction (LJP): Predicts judicial outcomes with high accuracy based on historical data.
The ability for users to upload PDFs or images of case documents for analysis is under development.
- Purpose: Identifies cases similar to the user-provided input.
- Implementation: Uses semantic similarity algorithms on the CaseLaw dataset.
- Output: Retrieves a ranked list of similar cases along with summaries and metadata.
- Purpose: Recommends relevant precedent cases for the input context.
- Implementation: Fine-tuned Llama 3.0 on legal texts identifies critical precedents.
- Output: Presents the top precedents with detailed relevance explanations.
- Purpose: Predicts potential outcomes of cases based on the input.
- Implementation: Utilizes transformer models from the Hugging Face library to make predictions.
- Output: Provides predicted verdicts with associated confidence scores.
- CaseLaw Dataset
- Hugging Face Transformers Library
- Llama 3.0
- Revolutionizing Legal Research Paper
- Legal NLP Research Paper
- Problem: The large model required for training and inference exceeded the computational capacity of standard local hardware.
- Reason: Running the entire system on a local machine resulted in slow processing times and limited scalability.
- Problem: Creating embeddings for the legal dataset was extremely time-consuming.
- Reason: The extensive size of the dataset and the complexity of generating high-quality embeddings using large models.
- Problem: Ensuring the chatbot provided accurate and relevant responses consistently.
- Reason: Limitations in the underlying model and inconsistencies in the quality of retrieved legal documents.
- Solution: Used the Llama 3.1 model for generating embeddings and stored them efficiently in ChromaDB.
- Impact: Significantly reduced the time required for embedding creation and enabled faster retrieval of relevant documents.
- Solution: Applied prompt engineering techniques to refine the chatbot's responses.
- Impact: Enhanced the accuracy and relevance of responses by ensuring the model adhered to specific instructions and context.
- Plan: Leverage OCR tools (e.g., Tesseract, Google Cloud Vision) to extract and process text.
- Next Steps: Develop pre-processing pipelines to clean and format the extracted data.
- Plan: Migrate the application to a cloud-based infrastructure (e.g., AWS, Google Cloud) to handle large-scale operations.
- Next Steps: Set up auto-scaling instances to manage computational loads and optimize resource utilization.
- Plan: Fine-tune the Llama 3.1 model using domain-specific legal datasets for better contextual understanding.
- Next Steps: Collect additional labeled data, perform transfer learning, and validate performance improvements.
Contributions are welcome!
- Submit issues or feature requests via the GitHub Issues page.
Contact Information:
For queries, reach out to the team:
- Danishbir Singh Bhatti(017521647): danishbirsingh.bhatti@sjsu.edu
- Jay Shon(017553289): seojun.shon@sjsu.edu
- Anthony Kommareddy(017506957): anthonysandeshreddy.kommareddy@sjsu.edu