Evidence for hierarchical representations of written and spoken words from an open-science human neuroimaging dataset

Code, results, and manuscript drafts for Banerjee et al

Tables and text files

dutch_celex_database_updatedv2.csv Contains phonetic pronunciations of Dutch words in the CELEX database. For more information, see Sun & Poeppel 2023.

SUBTLEX-NL with pos and Zipf.xlsx Contains word frequency measures of Dutch words in the SUBTLEX database, the most up-to-date version of the CELEX database. Zipf contains log-transformed values of FREQCOUNT, the number of word occurrences in the corpus. For more information, visit OSF.

MOUS_audio_onset_offsets.xlsx Onset times of words in each audio file play in the speech listening part of the experiment.

subtlex_phonetics.xlsx The intersection of the CELEX database and SUBTLEX databases, contains phonetics and occurrence counts of most words in Dutch.

MOUS_word_syllable_frequencies contains the occurrence counts of each syllable in every word presented in the MOUS experiment.

stimuli.txt The sentences and word lists used in both the reading and speech listening experiments of the MOUS study.

bigram_counts.csv Cumulative bigram occurrences (per million) in the SUBTLEX text corpus.

syllable_counts.csv Cumulative syllable occurrences (per million) in the SUBTLEX text corpus.

Code

master_table.ipynb Generates bigram, syllable, and word frequency statistics for every word presented in the MOUS experiments.

Auditory

source_auditory_trancription.py Takes in an auditory subject's events.tsv file and an output filename and tabulates the onset times and words played during that subject's scan. Generates transcription files that are saved in each subject's source subdirectory, e.g. sub-A2002_transcription.csv.

source_auditory_transcription_loop.ipynb Runs the above over all auditory subjects.

calculate_syllable_frequencies.m Takes in a 'transcription' generated by the above script and returns the syllable frequencies. Creates two .csv files, e.g., sub-A2002_transcription_syllables_raw.csv, which contains all onset times (including words for which syllable segmentations couldn't be sourced) and sub-A2002_transcription_syllables_processed, which only preserves the onset times and frequencies for words with available syllable segmentations.

SPM_auditory_word_frequency_1st_level.m Runs SPM12 first-level analysis for Word Frequency across all auditory subjects. For a primer on this technique, see Andy's Brain Book

SPM_auditory_word_frequency_2nd_level.m Runs SPM12 group-level analysis for word frequency.

SPM_auditory_syllable_frequency_1st_level.m Runs SPM12 first-level analysis for Syllable Frequency across all auditory subjects.

SPM_auditory_syllable_frequency_2nd_lvel.m Runs group-level analysis for syllable frequency.

Visual

source_visual_transcription.m converts an events.tsv file to a cleaned CSV containing onset time and word presented.

source_visual_transcription_loop.ipynb Runs the above function in a loop over all visual subjects.

calculate_word_frequencies_visual.ipynb generates CSV files containing both word frequency and minimum bigram frequency info for all words in the study.

SPM_visual_word_frequency_1st_level.m Runs SPM12 first-level analysis for Word Frequency across all visual subjects.

SPM_visual_word_frequency_2nd_level.m Runs SPM12 group-level analysis for Word Frequency.

SPM_visual_bigram_frequency_1st_level.m Runs SPM12 first-level analysis for Bigam Frequency across all visual subjects.

SPM_visual_bigram_frequency_2nd_level.m Runs SPM12 group-level analysis for bigram frequency.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.vscode		.vscode
__pycache__		__pycache__
deprecated		deprecated
figures		figures
-		-
.DS_Store		.DS_Store
IPA_individual_syllable_frequencies.csv		IPA_individual_syllable_frequencies.csv
MOUS_IPA_SyllableFrequencies.csv		MOUS_IPA_SyllableFrequencies.csv
MOUS_IPA_transcriptions.csv		MOUS_IPA_transcriptions.csv
MOUS_audio_onset_offsets.xlsx		MOUS_audio_onset_offsets.xlsx
MOUS_audio_onset_offsets_with_duration.csv		MOUS_audio_onset_offsets_with_duration.csv
MOUS_hierarchical-representations.code-workspace		MOUS_hierarchical-representations.code-workspace
MOUS_word_syllable_frequencies.csv		MOUS_word_syllable_frequencies.csv
README.md		README.md
SPM_auditory_syllable_frequency_1st_level.m		SPM_auditory_syllable_frequency_1st_level.m
SPM_auditory_syllable_frequency_1st_level_IPA.m		SPM_auditory_syllable_frequency_1st_level_IPA.m
SPM_auditory_syllable_frequency_2nd_level.m		SPM_auditory_syllable_frequency_2nd_level.m
SPM_auditory_syllable_max_mean_frequency_1st_level_IPA.m		SPM_auditory_syllable_max_mean_frequency_1st_level_IPA.m
SPM_auditory_word_frequency_1st_level.m		SPM_auditory_word_frequency_1st_level.m
SPM_auditory_word_frequency_1st_level_Positive.m		SPM_auditory_word_frequency_1st_level_Positive.m
SPM_auditory_word_frequency_2nd_level.m		SPM_auditory_word_frequency_2nd_level.m
SPM_visual_bigram_frequency_1st_level.m		SPM_visual_bigram_frequency_1st_level.m
SPM_visual_bigram_frequency_2nd_level.m		SPM_visual_bigram_frequency_2nd_level.m
SPM_visual_max_bigram_frequency_1st_level.m		SPM_visual_max_bigram_frequency_1st_level.m
SPM_visual_mean_bigram_frequency_1st_level.m		SPM_visual_mean_bigram_frequency_1st_level.m
SPM_visual_word_frequency_1st_level.m		SPM_visual_word_frequency_1st_level.m
SPM_visual_word_frequency_1st_level_Positive.m		SPM_visual_word_frequency_1st_level_Positive.m
SPM_visual_word_frequency_2nd_level.m		SPM_visual_word_frequency_2nd_level.m
SUBTLEX-NL with pos and Zipf.xlsx		SUBTLEX-NL with pos and Zipf.xlsx
bigram_counts.csv		bigram_counts.csv
calculate_word_frequencies_auditory.ipynb		calculate_word_frequencies_auditory.ipynb
calculate_word_frequencies_visual.ipynb		calculate_word_frequencies_visual.ipynb
celex_vs_IPA.ipynb		celex_vs_IPA.ipynb
cluster_separation.ipynb		cluster_separation.ipynb
dutch_celex_database_updated.csv		dutch_celex_database_updated.csv
dutch_celex_database_updated.xlsx		dutch_celex_database_updated.xlsx
dutch_celex_database_updatedv2.csv		dutch_celex_database_updatedv2.csv
dutch_celex_database_updatedv2.xlsx		dutch_celex_database_updatedv2.xlsx
eSpeakNG_IPA.py		eSpeakNG_IPA.py
master_table IPA.ipynb		master_table IPA.ipynb
merged-IPA_CELEX.csv		merged-IPA_CELEX.csv
mous_words_syllable_bigram_frequencies.csv		mous_words_syllable_bigram_frequencies.csv
run_subtlex_IPA_syllables_chunks.py		run_subtlex_IPA_syllables_chunks.py
source_auditory_transcription.py		source_auditory_transcription.py
source_auditory_transcription_loop.ipynb		source_auditory_transcription_loop.ipynb
source_visual_transcription.m		source_visual_transcription.m
source_visual_transcription_loop.m		source_visual_transcription_loop.m
stimuli.txt		stimuli.txt
subtlex_phonetics.xlsx		subtlex_phonetics.xlsx
subtlex_v2_cleaned_no_drop2.xlsx		subtlex_v2_cleaned_no_drop2.xlsx
subtlex_v3_IPA_syllables.csv		subtlex_v3_IPA_syllables.csv
syllabify_ipa_nl.py		syllabify_ipa_nl.py
syllable_counts.csv		syllable_counts.csv
syllable_freq_table.mat		syllable_freq_table.mat
syllable_freq_table_new.mat		syllable_freq_table_new.mat
syllable_freq_table_updated2.mat		syllable_freq_table_updated2.mat
titan_visual_syllable_regressor_gen.py		titan_visual_syllable_regressor_gen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evidence for hierarchical representations of written and spoken words from an open-science human neuroimaging dataset

About

Releases

Packages

Languages

suneelbanerjee/MOUS_hierarchical-representations

Folders and files

Latest commit

History

Repository files navigation

Evidence for hierarchical representations of written and spoken words from an open-science human neuroimaging dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages