Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

Paper • HuggingFace • Project Page • Poster • Video

Authors: Rose E. Wang and Dorottya Demszky

In the Proceedings of Innovative Use of NLP for Building Educational Applications 2023

Selected as the Ambassador Paper for BEA 2023! 🎉 To be presented at AIED 2024.

If you find our work useful or interesting, please consider citing it!

@inproceedings{wang2023chatgpt,
  title = {Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction},
  year = {2023},
  month = jun,
  author = {Wang, Rose and Demszky, Dorottya},
  booktitle = {18th Workshop on Innovative Use of NLP for Building Educational Applications},
  month_numeric = {6}
}

Repository Structure

.
├── data                    # Outcome data and annotation data
├── prompts                 # Prompts used for Tasks 
├── results                 # Result plots used in paper
├── scripts                 # Python scripts for analysis
├── requirements.txt        # Install requirements for running code
├── analyze_experiments.sh  # Complete analysis script
├── LICENSE
└── README.md

Transcripts

NOTE: The classroom transcripts are not needed to run the code. However, if you would like to see the transcripts, they can be downloaded following this repository's instructions: NCTE repository.

The file of interest is called single_utterances.csv. You may put that file under this repository as data/ncte_single_utterances.csv.

Installation

To install the required libraries:

conda create -n zero-shot python=3
conda activate zero-shot
pip install -r requirements.txt

Experiments (Replicating Paper Figures)

TLDR: Run source analyze_experiments.sh which will launch all the scripts to replicate the paper's figures. The results will be populated under the results/

Task A: Scoring Transcripts

In Table 1, we report the Spearman correlation values between the human scores and model predictions on CLASS and MQI dimensions. We also report the distribution over scores in Figure 4. To re-produce those plots, run To re-produce, run

# Task A: Scoring transcripts
# numerical = DA, numerical_reasoning = RA, numerical_descriptions = DA+
output_types=("numerical" "numerical_reasoning" "numerical_descriptions")

for output_type in "${output_types[@]}"; do
    echo "CLASS on $output_type"
    python3 scripts/class_obs/compare.py --rating_output=$output_type

    echo "MQI on $output_type"
    python3 scripts/mqi_obs/compare.py --rating_output=$output_type
done

Task B: Identify highlights and missed opportunities

In Figure a and b, we report the math teachers' evaluations for highlights and missed opportunities on CLASS and MQI. To re-produce those plots, run

# Task B: Identify highlights and missed opportunities
python3 scripts/plot_task_bc_results.py --key=class_examples
python3 scripts/plot_task_bc_results.py --key=mqi_examples

Task C: Provide actionable suggestions for eliciting student reasoning

In Figure c, we report the math teachers' evaluations for the actionable suggestions on eliciting student reasoning. To re-produce those plots, run

# Task C: Provide suggestions
python3 scripts/plot_task_bc_results.py --key=suggestions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

Repository Structure

Transcripts

Installation

Experiments (Replicating Paper Figures)

Task A: Scoring Transcripts

Task B: Identify highlights and missed opportunities

Task C: Provide actionable suggestions for eliciting student reasoning

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
data		data
prompts		prompts
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze_experiments.sh		analyze_experiments.sh
requirements.txt		requirements.txt

License

rosewang2008/zero-shot-teacher-feedback

Folders and files

Latest commit

History

Repository files navigation

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

Repository Structure

Transcripts

Installation

Experiments (Replicating Paper Figures)

Task A: Scoring Transcripts

Task B: Identify highlights and missed opportunities

Task C: Provide actionable suggestions for eliciting student reasoning

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages