The Topic Overview tab contains a visual representation of the topic model used for this study. A topic model is a statistical model that uses machine learning to determine the probability of correspondence between specified topics in a collection of documents. The model generates these topics on the basis of word co-occurrence. After running the model, we categorized topics into four categories: general research topics (using NSF categories), specific subfield topics, water budget topics, and method topics. These topics allow us to interpret which fields of water science have the most and the least comprehensive research—or in other words, which are the “bright” and “blind” spots of water science in Latin America.
Important note: the Spanish and Portuguese topic models rely on far smaller corpora than the English topic model. Because of this, the other two models are not as comprehensive; other visualizations on the platform thus rely on the topics generated from the English corpus.
This searchable table contains labeling information for individual topics in the model. Each topic label corresponds to a specific topic shown above. Irrelevant topics (“noise”) are unlabeled. In the Spanish and Portuguese Topic Labels, there is a column representing the corresponding topic found in the English model. We include this data because we based the rest of our study off the English model due to the lack of data within the Spanish and Portuguese corpora. This is also noted in the Topic Model description.
Topic Number | Topic Label | NSF Specific | NSF General | Description | Topic in English Model |
---|---|---|---|---|---|
Topic Number | Topic Label | NSF Specific | NSF General | Description | Topic in English Model |
This searchable table contains information about the articles used in the model. With it, users can construct queries to find information about authors, publishers, and, in some cases, specific geographic features or areas. Irrelevant topics (“noise”) are unlabeled. We include this article listing as a supplemental resource, which lacks some data fields for the English corpus; this is because there were not enough articles available in Spanish and Portuguese to accurately identify the total amount and type of topic labels. This means all labels were derived from the English corpus.
Author(s) | Title | Year | Source | DOI | URL |
---|---|---|---|---|---|
Author(s) | Title | Year | Source | DOI | URL |