(Lightly edited from an assignment in Professor Lisa Schweitzer’s Planning Theory class, with full intention of further pursuing this question at some point)
Articles in academic journals often include a series of keywords, meant to help the reader get a sense of what it is that they are about to read, and offering links to more articles on related topics. So far, so good – but are these keywords chosen in a manner that actually help the reader?
To answer this question, I evaluate the frequency of keywords in Planning Theory & Practice (henceforth referred to as “PT&P”), and evaluate whether the year 2016 stands out in any way relative to other years of the same journal. In its online form, keywords are displayed beneath articles’ abstracts, and signify what subfields an article touches on. Within Taylor & Francis Online, they are clickable links within the journal’s web site, each link taking the reader to a set of supposedly related articles. Specifically, this exercise evaluates the efficacy of keywords in finding related articles using scraped text data by determining whether keywords are sufficiently general for it to be likely that two articles may share a keyword.
Data
My data consist of all articles published since 2005 with abstracts, the first year in which PT&P included keywords with each article, scraped from Taylor & Francis Online using the R and RSelenium software packages. The only transformation applied to keywords was to put all of them in lowercase, I did not employ any stemming or other word regularization techniques common in Natural Language Processing.
Most Common Keywords, 2005-2019
The most common keywords for all 300 articles published between 2005 and 2019 are as follows:
Rank | Keyword | Frequency | ||
1 | planning | 26 | ||
2 | urban planning | 22 | ||
3 | planning theory | 10 | ||
4 – 5 (tie) | housing | 9 | ||
4 – 5 (tie) | participation | 9 | ||
6 – 9 (tie) | collaborative planning | 8 | ||
6 – 9 (tie) | governance | 8 | ||
6 – 9 (tie) | participatory planning | 8 | ||
6 – 9 (tie) | power | 8 | ||
10 – 14 (tie) | complexity | 7 | ||
10 – 14 (tie) | networks | 7 | ||
10 – 14 (tie) | planning practice | 7 | ||
10 – 14 (tie) | public participation | 7 | ||
10 – 14 (tie) | spatial planning | 7 |
Given the lower-than-expected frequencies for the most frequent keywords for my sample size of 300 articles, I evaluate how often each keyword appears in the data, and find the following distribution:
The presence of compound keywords such as “integrated river basin planning and management”, “asian coalition for housing rights”, or “integration and governance” makes this clear. The word counts[1] within key words from the 300 articles PT&P published between 2005 and 2019 are distributed as follows:
Most Common Keywords in 2016
To evaluate whether one can identify topics trending in a specific year, I look at the most frequent keywords included in the 23 articles published in 2016. These are as follows:
Based on keywords alone, it appears that property rights, gentrification, and community development stand out as being particularly specific to 2016 relative to other years of publication. However, no keyword appeared more than three times in that year, suggesting that this approach may skip over many important trends within the discipline that occurred in this year.
Conclusion and Further Research
This exercise suggests to me that Planning Theory & Practice as a journal (and the authors submitting materials) either covers a particularly wide spectrum of topics, or could benefit from choosing more general keywords for their articles, such that keywords may become more useful for pointing readers to articles covering similar topics rather than unintentionally identifying specific unique articles.
For
further research, I propose repeating this exercise using more sophisticated
word regularization techniques, and expanding the analysis to other journals.[2] Such an expansion would
both inform whether the observed phenomena of a “long right tail of keywords”
and needlessly specific compound keywords are common within the broader urban
planning field, and whether other academic disciplines suffer from the same
issues in their use of keywords.
[1] As defined by the function “stri_word_count()” from the stringi R software package, without performing any text transformations. At least one compound keyword (“planning history; strategic spatial planning; western australia”) appears to have become a compound keyword in error due to false use of keyword separators, however, this error is also present on the Taylor & Francis Online web site.
[2] In the interest of reusing existing code, an initial set might consist of other journals published on Taylor & Francis Online.