A tipping point in word recognition? Investigating the relationship between root and form frequency across visual and auditory modalities.

Müller, H.M. (Hanno)
Bosch, L.F.M. ten (Louis)
Ernestus, M.T.C. (Mirjam)

For various theories of human word recognition, the question of how the recognition of suffixed words is influenced by the morphological root or the surface form of the word is of considerable relevance. According to many theories, (e.g., Baayen et al., 1997), the morphological root predominantly guides the recognition process unless the word is of a (relatively) high frequency of occurrence. We tested this `tipping point' hypothesis by comparing a statistical model based on this hypothesis with two alternative statistical models: one assuming that word recognition is always root-driven and another assuming it is always form-driven. To this end, we modeled response time distributions from two large-scale lexical decision experiments in Dutch - one visual and one auditory - focusing on three suffixes: the plural suffix -en for nouns, the derivational suffix -heid for nominalisations, and -t as the second/third person singular present tense suffix for verbs. Our results indicate that words with the suffixes -t and -heid are retrieved as whole forms in both visual and auditory word recognition. In contrast, words with the suffix -en are best accounted for by both the root-driven and the form-driven models in auditory word recognition, while in visual word recognition, they support the tipping point hypothesis. Taken together, our findings suggest that both root-driven and form-driven principles are relevant for word recognition, while the assumption of a categorical tipping point is less tenable. This study contributes to our understanding of word recognition mechanisms in both localist and distributional-connectionist theoretical frameworks. This Data Sharing Collection (DSC) includes: a) predictions of reaction times (RTs) in BALDEY with root-driven, form-driven and tipping point models. b) BALDEY and DLP subsets enriched with root and form frequency as well es relative frequency and the control predictors trial number and length/duration. c) All necessary scripts to derive the above materials. Please note that the collection does not include the CELEX or CGN databases used for computing frequencies, as we do not have the license to share them. Researchers with access to CELEX and CGN can use the provided scripts to recreate the predictors. Those without access will need to use the enriched materials provided in this collection.