Python library for extracting quantitative, reproducible metrics of multi-level alignment between two speakers in naturalistic language corpora.

nickduran, updated 🕥 2023-03-05 20:51:27

ALIGN, a computational tool for multi-level language analysis (optimized for Python 3.10)

align is a Python library for extracting quantitative, reproducible metrics of multi-level alignment between two speakers in naturalistic language corpora. The method was introduced in "ALIGN: Analyzing Linguistic Interactions with Generalizable techNiques" (Duran, Paxton, & Fusaroli, 2019; Psychological Methods).

Examples of papers relying on the ALIGN library:

  • Dideriksen, C., Christiansen, M. H., Tylén, K., Dingemanse, M., & Fusaroli, R. (in press). Quantifying the interplay of conversational devices in building mutual understanding. Journal of Experimental Psychology: General. Pre-print: https://doi.org/10.31234/osf.io/a5r74
  • Dideriksen C, Christiansen MH, Dingemanse M, et al. Language specific constraints on conversation: Evidence from Danish and Norwegian. PsyArXiv; 2022. Pre-print: https://doi.org/10.31234/osf.io/t3s6c.
  • Fusaroli, R., Weed, E., Fein, D., & Naigles, L. (in press). Caregiver linguistic alignment to autistic and typically developing children. Cognition. Pre-print: https://doi.org/10.31234/osf.io/ysjec
  • Fusaroli R, Weed E, Rocca R, fein d, Naigles L. (under review) Repeat after me? Both Children with and without Autism Commonly Align Their Language with That of Their Caregivers. PsyArXiv; 2023. DOI: 10.31234/osf.io/m8fhk.

Installation

align may be downloaded directly using pip.

To download the stable version released on PyPI:

pip install align

Or to update:

pip install align --upgrade

And it's always good practice to install a package like align, which has several dependencies (see requirements.txt), in a virtual environment.

Anaconda users: The above should work in the vast majority of cases. However, if you prefer an easy way to install align within a virtual environment in one go, or you are experiencing problems with trying to update align, a YAML file has been provided to streamline things. Just follow these simple steps:

  1. Download the environment.yml file and navigate to the folder where it has been downloaded
  2. Run the following command in Terminal: conda env create -f environment.yml
  3. Be sure to activate the new enviroment (i.e., conda activate align0.1.1) before running any align analyses (such as the tutorials; see below)

If you experience any problems, please put them in the "Issues" section of this repository.

Quick documentation

ALIGN consists of two primary modules for conducting analyses, prepare_transcripts and calculate_alignment. To get a quick glance of the functions contained within each module, please check out the following:

  • prepare_transcripts: https://nickduran.github.io/align-linguistic-alignment/prepare_transcripts.html

  • calculate_alignment: https://nickduran.github.io/align-linguistic-alignment/calculate_alignment.html

Additional tools required for some align options

The Google News pre-trained word2vec vectors (GoogleNews-vectors-negative300.bin) and the Stanford part-of-speech tagger (stanford-postagger-full-2020-11-17) are required for some optional align parameters but must be downloaded separately. Please see the tutorials for more information.

  • Google News: https://code.google.com/archive/p/word2vec/ (page) or https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing (direct download)

  • Stanford POS tagger: https://nlp.stanford.edu/software/tagger.shtml#Download (page) or https://nlp.stanford.edu/software/stanford-tagger-4.2.0.zip (direct download)

Tutorials

We created Jupyter Notebook tutorials to provide an easily accessible step-by-step walkthrough on how to use align. Below are descriptions of the current tutorials that can be found in the examples directory within this repository. If unfamiliar with Jupyter Notebooks, instructions for installing and running can be found here: http://jupyter.org/install. We recommend installing Jupyter using Anaconda. Anaconda is a widely-used Python data science platform that helps streamline workflows.

  • Jupyter Notebook 1: CHILDES

  • This tutorial walks users through an analysis of conversations from a single English corpus from the CHILDES database (MacWhinney, 2000)---specifically, Kuczaj’s Abe corpus (Kuczaj, 1976). We analyze the last 20 conversations in the corpus in order to explore how ALIGN can be used to track multi-level linguistic alignment between a parent and child over time, which may be of interest to developmental language researchers. Specifically, we explore how alignment between a parent and a child changes over a brief span of developmental trajectory.

  • Jupyter Notebook 2: Devil's Advocate

  • This tutorial walks users throught the analysis reported in (Duran, Paxton, & Fusaroli, 2019). The corpus consists of 94 written transcripts of conversations, lasting eight minutes each, collected from an experimental study of truthful and deceptive communication. The goal of the study was to examine interpersonal linguistic alignment between dyads across two conversations where participants either agreed or disagreed with each other (as a randomly assigned between-dyads condition) and where one of the conversations involved the truth and the other deception (as a within-subjects condition).

We are in the process of adding more tutorials and would welcome additional tutorials by interested contributors.

Attribution

If you find the package useful, please cite our manuscript:

Duran, N., Paxton, A., & Fusaroli, R. (2019). ALIGN: Analyzing Linguistic Interactions with Generalizable techNiques. Psychological Methods. http://dynamicog.org/papers/

Licensing of example data

  • CHILDES

  • Example corpus "Kuczaj Corpus" by Stan Kuczaj is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License (https://childes.talkbank.org/access/Eng-NA/Kuczaj.html):

Kuczaj, S. (1977). The acquisition of regular and irregular past tense forms. Journal of Verbal Learning and Verbal Behavior, 16, 589–600.

  • Devil's Advocate

  • The complete de-identified dataset of raw conversational transcripts is hosted on a secure protected-access repository provided by the Inter-university Consortium for Political and Social Research (ICPSR). Please click on the link to access: http://dx.doi.org/10.3886/ICPSR37124.v1. Due to the requirements of our IRB, please note that users interested in obtaining these data must complete a Restricted Data Use Agreement, specify the reason for the request, and obtain IRB approval or notice of exemption for their research.

Duran, Nicholas, Alexandra Paxton, and Riccardo Fusaroli. Conversational Transcripts of Truthful and Deceptive Speech Involving Controversial Topics, Central California, 2012. ICPSR37124-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2018-08-29.

Issues

Bump ipython from 8.4.0 to 8.10.0

opened on 2023-02-11 01:56:35 by dependabot[bot]

Bumps ipython from 8.4.0 to 8.10.0.

Commits
  • 15ea1ed release 8.10.0
  • 560ad10 DOC: Update what's new for 8.10 (#13939)
  • 7557ade DOC: Update what's new for 8.10
  • 385d693 Merge pull request from GHSA-29gw-9793-fvw7
  • e548ee2 Swallow potential exceptions from showtraceback() (#13934)
  • 0694b08 MAINT: mock slowest test. (#13885)
  • 8655912 MAINT: mock slowest test.
  • a011765 Isolate the attack tests with setUp and tearDown methods
  • c7a9470 Add some regression tests for this change
  • fd34cf5 Swallow potential exceptions from showtraceback()
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/nickduran/align-linguistic-alignment/network/alerts).

Possibility to Override the Automatic Spelling Correction Function

opened on 2022-08-23 12:07:00 by AdrianaChieng

I wonder what code overrides the automatic spelling correction function, especially for conversational scripts with meaningful sounds or local English variants (e.g. amm amm - eating). The current automatic spelling correction function has auto-corrected these words to standard English (e.g., am), leading to an inaccurate token/lemma. Hope to receive some assistance and insight on working with these issue (adult-child interactions)

Thanks

Surrogate Data

opened on 2021-10-03 13:04:36 by AdrianaChieng

Hi, I couldn't generate surrogate data. Can i get some advice please?

Surrogate Data

It's written index out of range

grouping of POS on lemmatized utterances

opened on 2019-10-07 15:51:23 by nickduran

In prep phase, group the different types of POS tags and collapse them to a single tag. e.g. NN, NNS, NN* becomes NN.

possible LLA implementation

opened on 2019-08-13 14:49:49 by jseale-asapp

In 'An Evaluation and Comparison of Linguistic Alignment Measures', Xu and Reitter (2015) evaluate Spearman's correlation coefficient, LLA, and RepDecay. They report LLA has the best performance with respect to normality and sensitivity, and 'yields more significant between-individual differences and has better within-individual stability'. It would be interesting to repeat the experiments with cosine similarity, and see how it compares to LLA. If the two exhibit complementary advantages, it may be interesting to incorporate an LLA implementation.

Binder doesn't work

opened on 2019-05-08 05:53:22 by pavelgold

Hi, I tried to run binder in Chromium and Firefox and I get the attached error. Any idea? Screenshot from 2019-05-08 08-50-05

Releases

ALIGN v0.1.1 2022-08-29 00:12:22

Updated to allow user to toggle the spell-correction feature on and off within the prepare_transcripts process. Please see tutorials for details.

ALIGN update for Python 3.10 2022-07-05 01:10:01

What's Changed

  • Fix: ADD HERE.

Full Changelog: https://github.com/nickduran/align-linguistic-alignment/compare/v0.0.11...v0.1.0

Release for PyPI 2020-02-04 23:26:35

Updating release number and copyright dates 2020-02-04 23:10:49

Implementing features: n-gram default update and expanded output 2020-02-04 23:02:39

Squeaky-clean align version compatible with Python 3 2020-01-16 22:40:45

Nicholas Duran

Nicholas Duran is an associate professor in the Social and Behavioral Sciences division of the New College of Interdisciplinary Arts and Sciences at ASU

GitHub Repository

python notebooks linguistic-alignment linguistic-analysis ngram-analysis nltk word2vec corpus-tools text-analysis conversation-analysis