align is a Python library for extracting quantitative, reproducible
metrics of multi-level alignment between two speakers in naturalistic
language corpora. The method was introduced in "ALIGN: Analyzing
Linguistic Interactions with Generalizable techNiques" (Duran, Paxton, &
Fusaroli, 2019; Psychological Methods).
Examples of papers relying on the ALIGN library:
align may be downloaded directly using
To download the stable version released on PyPI:
pip install align
Or to update:
pip install align --upgrade
And it's always good practice to install a package like
align, which has several dependencies (see
requirements.txt), in a virtual environment.
Anaconda users: The above should work in the vast majority of cases. However, if you prefer an easy way to install
alignwithin a virtual environment in one go, or you are experiencing problems with trying to update
align, a YAML file has been provided to streamline things. Just follow these simple steps:
- Download the
environment.ymlfile and navigate to the folder where it has been downloaded
- Run the following command in Terminal:
conda env create -f environment.yml
- Be sure to activate the new enviroment (i.e.,
conda activate align0.1.1) before running any
alignanalyses (such as the tutorials; see below)
If you experience any problems, please put them in the "Issues" section of this repository.
ALIGN consists of two primary modules for conducting analyses,
calculate_alignment. To get a quick glance of the functions contained within each module, please check out the following:
The Google News pre-trained word2vec vectors (
and the Stanford part-of-speech tagger (
are required for some optional
align parameters but must be downloaded
separately. Please see the tutorials for more information.
Google News: https://code.google.com/archive/p/word2vec/ (page) or https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing (direct download)
Stanford POS tagger: https://nlp.stanford.edu/software/tagger.shtml#Download (page) or https://nlp.stanford.edu/software/stanford-tagger-4.2.0.zip (direct download)
We created Jupyter Notebook tutorials to provide an easily accessible
step-by-step walkthrough on how to use
align. Below are descriptions of the
current tutorials that can be found in the
examples directory within this
repository. If unfamiliar with Jupyter Notebooks, instructions for installing
and running can be found here: http://jupyter.org/install. We recommend installing
Jupyter using Anaconda. Anaconda is a widely-used Python data science platform
that helps streamline workflows.
Jupyter Notebook 1: CHILDES
This tutorial walks users through an analysis of conversations from a single English corpus from the CHILDES database (MacWhinney, 2000)---specifically, Kuczaj’s Abe corpus (Kuczaj, 1976). We analyze the last 20 conversations in the corpus in order to explore how ALIGN can be used to track multi-level linguistic alignment between a parent and child over time, which may be of interest to developmental language researchers. Specifically, we explore how alignment between a parent and a child changes over a brief span of developmental trajectory.
Jupyter Notebook 2: Devil's Advocate
We are in the process of adding more tutorials and would welcome additional tutorials by interested contributors.
If you find the package useful, please cite our manuscript:
Duran, N., Paxton, A., & Fusaroli, R. (2019). ALIGN: Analyzing Linguistic Interactions with Generalizable techNiques. Psychological Methods. http://dynamicog.org/papers/
Example corpus "Kuczaj Corpus" by Stan Kuczaj is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License (https://childes.talkbank.org/access/Eng-NA/Kuczaj.html):
Kuczaj, S. (1977). The acquisition of regular and irregular past tense forms. Journal of Verbal Learning and Verbal Behavior, 16, 589–600.
The complete de-identified dataset of raw conversational transcripts is hosted on a secure protected-access repository provided by the Inter-university Consortium for Political and Social Research (ICPSR). Please click on the link to access: http://dx.doi.org/10.3886/ICPSR37124.v1. Due to the requirements of our IRB, please note that users interested in obtaining these data must complete a Restricted Data Use Agreement, specify the reason for the request, and obtain IRB approval or notice of exemption for their research.
Duran, Nicholas, Alexandra Paxton, and Riccardo Fusaroli. Conversational Transcripts of Truthful and Deceptive Speech Involving Controversial Topics, Central California, 2012. ICPSR37124-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2018-08-29.
Bumps ipython from 8.4.0 to 8.10.0.
560ad10DOC: Update what's new for 8.10 (#13939)
7557adeDOC: Update what's new for 8.10
385d693Merge pull request from GHSA-29gw-9793-fvw7
e548ee2Swallow potential exceptions from showtraceback() (#13934)
0694b08MAINT: mock slowest test. (#13885)
8655912MAINT: mock slowest test.
a011765Isolate the attack tests with setUp and tearDown methods
c7a9470Add some regression tests for this change
fd34cf5Swallow potential exceptions from showtraceback()
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
I wonder what code overrides the automatic spelling correction function, especially for conversational scripts with meaningful sounds or local English variants (e.g. amm amm - eating). The current automatic spelling correction function has auto-corrected these words to standard English (e.g., am), leading to an inaccurate token/lemma. Hope to receive some assistance and insight on working with these issue (adult-child interactions)
Hi, I couldn't generate surrogate data. Can i get some advice please?
It's written index out of range
In prep phase, group the different types of POS tags and collapse them to a single tag. e.g. NN, NNS, NN* becomes NN.
In 'An Evaluation and Comparison of Linguistic Alignment Measures', Xu and Reitter (2015) evaluate Spearman's correlation coefficient, LLA, and RepDecay. They report LLA has the best performance with respect to normality and sensitivity, and 'yields more significant between-individual differences and has better within-individual stability'. It would be interesting to repeat the experiments with cosine similarity, and see how it compares to LLA. If the two exhibit complementary advantages, it may be interesting to incorporate an LLA implementation.
Hi, I tried to run binder in Chromium and Firefox and I get the attached error. Any idea?
Updated to allow user to toggle the spell-correction feature on and off within the
prepare_transcripts process. Please see tutorials for details.
Full Changelog: https://github.com/nickduran/align-linguistic-alignment/compare/v0.0.11...v0.1.0
Nicholas Duran is an associate professor in the Social and Behavioral Sciences division of the New College of Interdisciplinary Arts and Sciences at ASUGitHub Repository
python notebooks linguistic-alignment linguistic-analysis ngram-analysis nltk word2vec corpus-tools text-analysis conversation-analysis