This project was funded by the National Science Foundation (Language Across Cultures: The Communication Styles of World Leaders, #2117009) between 2021-2025.

Project team

Leah C. Windsor, PI

Director, Institute for Intelligent Systems (IIS)

Associate Professor of Applied Linguistics

Institute for Intelligent Systems + Department of English

Alistair Windsor, co-PI

Associate Professor

Department of Mathematical Sciences

Faculty Affiliate, IIS

Miriam Van Mersbergen, co-PI

Associate Professor

School of Communication Sciences and Disorders

Faculty Affiliate, IIS

Nicholas Simon, co-PI

Associate Professor

Department of Psychology

Faculty Affiliate, IIS

J. Elliott Casal

Assistant Professor of Applied Linguistics

Department of English

Faculty Affiliate, IIS

Deborah Tollefsen

Dean, Graduate School

Professor

Department of Psychology

Faculty Affiliate, IIS

Shaun Gallagher

Professor

Department of Psychology

Faculty Affiliate, IIS

James “Rusty” Haner

Senior Software Developer, IIS

August White

Senior Software Developer, IIS

Dashboard features

This dashboard provides a variety of linguistic features for three different analytic approaches. These include Coh-Metrix (107 features), LIWC (118 features), and LDA (50 features). Coh-Metrix provides text-level analysis of lexical and semantic word features.^1,2 LIWC is a word-counting program for precision insights into rates of usage.^3,4 LDA (also known as “topic modeling”) assigns words into groups based on their proximity and co-occurrence in the corpus.^5,6

What the linguistic features MEAN

These three approaches to analyzing language span both word-order dependent (Coh-Metrix) and bag-of-word (LIWC, LDA) approaches.

Word-order dependent

Word-order dependent approaches preserve the relationship between words in a text (sentence, paragraph, document, etc.) and can be understood in the context of syntax parse trees. Click below for example.

Example

The 2011 article on Coh-Metrix provides a description of the features generated by this software. Coh-Metrix was developed at The University of Memphis in the Institute for Intelligent Systems by Art Graesser, Max Louwerse, and Danielle McNamara. There are five high-level aggregate features generated by a principal components analysis (PCA) – these include syntax simplicity, word concreteness, narrativity, deep cohesion, and shallow cohesion. Syntactic features have been used in many disciplines to understand the contextual aspects of language such as more concrete or abstract word choices and the use of more or less complex phrases.

Bag-of-words

LIWC (Linguistic Inquiry and Word Count) is a dictionary-based program developed at UT Austin by James Pennebaker. This program is a quick and powerful approach to analyzing language by counting the number of words in a text, primarily closed-class words such as pronouns, conjunctions, articles, and prepositions.

LDA is an open-source process which can be analyzed in many ways, including R, Python, and many off-the-shelf programs. LDA assigns words into categories based on their proximity in the corpus, and these terms form “topics”. Researchers qualitatively assign names to the words in a topic, based on their relationship and expertise in the subject matter.

Open the dashboard

How to use the dashboard

1. Determine which features you would like to graph from Coh-Metrix, LIWC, and/or LDA.

2. Select the appropriate X-axis variable and Y-axis variable, chart type, and chart aggregation (if applicable). You can select a line chart, bar graph, or world map.

3. You can download the data as a csv, or as a chart or map.nd chart aggregation (if applicable). You can select a line chart, bar graph, or world map.

4. To reset the parameters and model a new feature or representation, click “Clear ISO filter.”

References

1. Graesser, A. C., McNamara, D. S. & Kulikowich, J. M. Coh-Metrix: Providing multilevel analyses of text characteristics. Educational researcher 40, 223–234 (2011).

2. Graesser, A. & Windsor, L. A perspective on the psychological analysis of language. in TBD (TBD, 2019).

3. Boyd, R. L., Ashokkumar, A., Seraj, S. & Pennebaker, J. W. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin 1–47 (2022).

4. Pennebaker, J. W. The secret life of pronouns: How our words reflect who we are. New York, NY: Bloomsbury (2011).

5. Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. Journal of machine Learning research 3, 993–1022 (2003).

6. Enderle, J. S. Topic Modeling Tool. (2022).