This project was funded by the National Science Foundation (Language Across Cultures: The Communication Styles of World Leaders, #2117009) between 2021-2025.
Project team
Leah C. Windsor, PI
Director, Institute for Intelligent Systems (IIS)
Associate Professor of Applied Linguistics
Institute for Intelligent Systems + Department of English
Alistair Windsor, co-PI
Associate Professor
Department of Mathematical Sciences
Faculty Affiliate, IIS
Miriam Van Mersbergen, co-PI
Associate Professor
School of Communication Sciences and Disorders
Faculty Affiliate, IIS
Nicholas Simon, co-PI
Associate Professor
Department of Psychology
Faculty Affiliate, IIS
J. Elliott Casal
Assistant Professor of Applied Linguistics
Department of English
Faculty Affiliate, IIS
Deborah Tollefsen
Dean, Graduate School
Professor
Department of Psychology
Faculty Affiliate, IIS
Shaun Gallagher
Professor
Department of Psychology
Faculty Affiliate, IIS
James “Rusty” Haner
Senior Software Developer, IIS
August White
Senior Software Developer, IIS
Dashboard features
This dashboard provides a variety of linguistic features for three different analytic approaches. These include Coh-Metrix (107 features), LIWC (118 features), and LDA (50 features). Coh-Metrix provides text-level analysis of lexical and semantic word features.1,2 LIWC is a word-counting program for precision insights into rates of usage.3,4 LDA (also known as “topic modeling”) assigns words into groups based on their proximity and co-occurrence in the corpus.5,6
What the linguistic features MEAN
These three approaches to analyzing language span both word-order dependent (Coh-Metrix) and bag-of-word (LIWC, LDA) approaches.
Word-order dependent
Word-order dependent approaches preserve the relationship between words in a text (sentence, paragraph, document, etc.) and can be understood in the context of syntax parse trees. Click below for example.
The 2011 article on Coh-Metrix provides a description of the features generated by this software. Coh-Metrix was developed at The University of Memphis in the Institute for Intelligent Systems by Art Graesser, Max Louwerse, and Danielle McNamara. There are five high-level aggregate features generated by a principal components analysis (PCA) – these include syntax simplicity, word concreteness, narrativity, deep cohesion, and shallow cohesion. Syntactic features have been used in many disciplines to understand the contextual aspects of language such as more concrete or abstract word choices and the use of more or less complex phrases.
Bag-of-words
LIWC (Linguistic Inquiry and Word Count) is a dictionary-based program developed at UT Austin by James Pennebaker. This program is a quick and powerful approach to analyzing language by counting the number of words in a text, primarily closed-class words such as pronouns, conjunctions, articles, and prepositions.
LDA is an open-source process which can be analyzed in many ways, including R, Python, and many off-the-shelf programs. LDA assigns words into categories based on their proximity in the corpus, and these terms form “topics”. Researchers qualitatively assign names to the words in a topic, based on their relationship and expertise in the subject matter.
How to use the dashboard
1. Determine which features you would like to graph from Coh-Metrix, LIWC, and/or LDA.

2. Select the appropriate X-axis variable and Y-axis variable, chart type, and chart aggregation (if applicable). You can select a line chart, bar graph, or world map.

3. You can download the data as a csv, or as a chart or map.nd chart aggregation (if applicable). You can select a line chart, bar graph, or world map.
