Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Implement LangGraph Information Extraction Specific Case #2

Open
6 tasks
ibevers opened this issue Oct 18, 2024 · 1 comment
Open
6 tasks

Task: Implement LangGraph Information Extraction Specific Case #2

ibevers opened this issue Oct 18, 2024 · 1 comment
Assignees

Comments

@ibevers
Copy link
Collaborator

ibevers commented Oct 18, 2024

Description

Implement LangGraph information extraction for a specific use case as a proof of concept.
Parameters: data template, prompt, documents
Output: Set of objects such that each object is valid according to the input data template

Tasks

  • Define specific case (coordinate with group)
  • PDF parsing
  • Data template
  • Prompting
  • Information extractor
  • Validator

Freeform Notes

No response

@fabiocat93
Copy link
Collaborator

**Project Update: @ibevers @hvgazula @puja-trivedi @tekrajchhetri **

  • Ran preliminary experiments using LangGraph for RAG, search, and multi-agent interaction to strengthen domain knowledge. POV: These approaches might be too complex and advanced to automate portions of the PRISMA framework in systematic literature reviews.
  • Defined a minimal use case. This is from my systematic literature review on AI-driven autism assessment using behavioral data. Our objective is to guess whether our literature review should include some papers based on their title, abstract, and keywords.

INCLUSION CRITERIA:

  • Relevance to Assessment: Papers must directly address assessment, screening, diagnosis, or detection methods related to autism.
  • Primary Resources: Only primary research articles will be considered, excluding secondary sources like meta-analyses and systematic reviews.
  • Target Population: Subjects should include individuals diagnosed with autism spectrum disorder (ASD) or suspected to have ASD.
  • AI Application: The study must involve applying artificial intelligence (AI) techniques for behavioral analysis.
  • Recent Publications: Articles must have been published within the last 10 years to ensure currency (2013-2024).
  • Study design: Papers are included no matter what is the study design(s) and setting(s), and duration of follow-up

EXCLUSION CRITERIA:

  • Treatment Focus: Papers only focusing on treatment approaches will be excluded to maintain the assessment-oriented scope.
  • Literature Reviews: Studies that are literature reviews or systematic reviews will not be included to maintain a primary research focus.
  • Theoretical Papers: Studies with only theoretical content and no empirical data will be excluded.
  • Non-Human Subjects: Studies involving animal subjects or computational simulations without human data will be excluded to ensure human relevance.
  • Subjects only suspected to have autism: high-risk subjects (e.g., siblings) will not be included in the literature review, unless they are assessed/diagnosed.
  • AI application on manually extracted features: Studies feeding ML models with manually extracted features are excluded to focus on entirely automatic processes.
  • Lack of Peer Review: Papers lacking peer review will be excluded to ensure a certain level of research quality. (e.g., book chapters)
  • Abstract Papers: Abstract papers will be excluded
  • Lack of Behavioral Data: Papers that lack behavioral data or analysis will not be included to ensure relevance to the behavioral analysis aspect.
  • Non-English Papers: Papers not in English will be excluded due to language limitations.
  • Non-AI-based Assessment: Use of AI-based Agents for interacting with the subjects, but not for assessment

PAPERS:
Let's consider the following arxiv papers as a start.:
{'arxiv_id': '2406.13470', 'should_include': True}
{'arxiv_id': '2201.00927', 'should_include': True}
{'arxiv_id': '2401.04088', 'should_include': False}

  • Created a script for extracting titles, abstracts, and keywords from arXiv (following @satra 's suggestion to start there). While I understand his reasoning, this approach may limit the project's utility. However, I'm proceeding with it (at least as a first step).
  • Tested specific inclusion/exclusion questions using several open-source models (e.g., smollm, phi-3), but responses weren't consistent or at least contextually relevant. GPT4-o1-mini provided better results. Concerns: how much can we pay for that?

Code is available in the respective branch under the experiments folder (including the preliminary experiments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants