The authors have declared that no competing interests exist.
Conceived and designed the experiments: RAP TS DK BB TY. Performed the experiments: RAP TY. Analyzed the data: RAP JAM TY. Contributed reagents/materials/analysis tools: RAP JAM DK BB TY. Wrote the paper: RAP JAM TS TY.
Neuroimaging research has largely focused on the identification of associations between brain activation and specific mental functions. Here we show that data mining techniques applied to a large database of neuroimaging results can be used to identify the conceptual structure of mental functions and their mapping to brain systems. This analysis confirms many current ideas regarding the neural organization of cognition, but also provides some new insights into the roles of particular brain systems in mental function. We further show that the same methods can be used to identify the relations between mental disorders. Finally, we show that these two approaches can be combined to empirically identify novel relations between mental disorders and mental functions via their common involvement of particular brain networks. This approach has the potential to discover novel endophenotypes for neuropsychiatric disorders and to better characterize the structure of these disorders and the relations between them.
One of the major challenges of neuroscience research is to integrate the results of the large number of published research studies in order to better understand how psychological functions are mapped onto brain systems. In this research, we take advantage of a large database of neuroimaging studies, along with text mining methods, to extract information about the topics that are found in the brain imaging literature and their mapping onto reported brain activation data. We also show that this method can be used to identify new relations between psychological functions and mental disorders, through their shared brain activity patterns. This work provides a new way to discover the underlying structure that relates brain function and mental processes.
The search for clues regarding the underlying causes of mental disorders has led to the notion that these disorders may be best understood in terms of a set of underlying psychological and/or neural mechanisms that stand between genes and environment on the one hand and psychiatric diagnoses on the other hand. Such intermediate phenotypes, or “endophenotypes”, may provide the traction that has eluded research using diagnostic categories as primary phenotypes
The identification of endophenotypes requires an understanding the basic structure of mental functions and their associated brain networks. For more than 30 years, cognitive neuroscientists have used neuroimaging methods (including EEG/MEG, PET, and fMRI) in an attempt to address this question. This work has led to a large body of knowledge about associations between specific psychological processes or tasks and activity in brain regions or networks. However, this knowledge has not led to a commensurate improvement in our understanding of the basic mental operations that may be subserved by particular brain systems. Instead, diverse literatures often assign widely varying functions to the same networks. A prime example is the anterior cingulate cortex, which has been associated with such widespread functions as conflict monitoring, error processing, pain, and interoceptive awareness. In order to understand the unique functions that are subserved by brain regions or networks, a different approach is necessary; namely, we need to analyze data obtained across a broad range of mental domains and understand how these domains are organized with regard to neural function and structure.
The identification of basic operations can be understood statistically as a problem of latent structure identification; that is, what are the latent underlying mental functions and brain networks that give rise to to the broad range of observed behaviors and patterns of brain activity and neuropsychiatric disorders? The focus within cognitive neuroscience on establishing associations between activation and specific hypothesized processes has hindered the ability to identify such latent structures. However, within the fields of machine learning and text mining, a number of powerful approaches have been developed to estimate the latent structure that generates observed data, assuming that large enough datasets are available. In the present work, we take advantage of one class of such generative models to develop a new approach to identifying the underlying latent structure of mental processing and the associated brain functions, which we refer to as “topic mapping”. We examine the latent conceptual structure of the fMRI literature by mining the full text from a large text corpus comprising more than 5,800 articles from the neuroimaging literature, and model the relation between these topics and associated brain activation using automated methods for extracting activation coordinates from published papers. This analysis uncovers conceptual structure and activation patterns consistent with those observed in previous neuroimaging meta-analyses, which provides confirmation of the approach, while also providing some novel suggestions regarding structure/function relationships. We then use this approach to identify the topical structure of terms related neuropsychiatric diseases, and use multivariate methods to identify relations between these the mental and disorder domains based on common brain activation patterns. This approach provides an empirical means of discovering novel endophenotypes that may underlie mental disorders, as well providing new insights into the relations between diagnostic categories.
Within the fields of information retrieval and computer science, research into document retrieval has led to the development of a set of techniques for estimating the latent structure underlying a set of documents. Early work in this area treated documents as vectors in a high-dimensional space, and used matrix decomposition techniques such as singular value decomposition to identify the latent semantic structure of the documents
We characterized the latent structure of the cognitive neuroscience literature by applying latent Dirichlet allocation to a corpus of 5,809 articles (using an expanded version of the corpus developed in
Initial application of LDA to the full-text corpus identified a number of topics that were related to mental function, but also many topics related to methodological or linguistic aspects of the documents. Because we were specifically interested in estimating the conceptual structure of mental processes, we examined each document in the corpus and identified each occurrence of any of the 605 terms (both single words and phrases) that are present as mental concepts in the Cognitive Atlas (
The left panel shows the top words associated with each topic, and the right panel shows a map of voxels that were significantly associated with loading on that topic across documents. The image intensity is proportional to the Pearson correlation between the activation vector and the topic loading vector at each voxel (with red-yellow depicting positive correlations and blue-white depicting negative correlations), thresholded using a whole brain false discovery rate of q
In order to further examine the effects of topic dimensionality, we compared the results obtained across several values for the number of topics (10,50, 100, and 250). We chose the term “language” and identified all topics for each model in which that term occurred in the top five terms. We then examined the correlation in the loading vector across documents for each set of levels, in order to identify the hierarchical graph relating topics across levels (see
All topics with “language in their top 5 terms were first identified from the results for topic models fit to the data at 10, 50, 100, and 250 topics. At each level, each topic is linked to the topic at the previous level with which it had the highest correlation in its document loadings. The values on each edge reflect the correlation in the topic loading vector across documents between the two levels.
Using the topical structure of the literature discovered in the previous section, we developed a novel approach called
While concordance with the existing literature is reassuring, the true promise of this approach is in its ability to uncover novel associations between functions and activation, and the topic mapping analysis did in fact identify some unexpected associations, particularly when looking at negative associations. Two interesting examples are evident in
Based on the results from the foregoing analyses, we then examined whether it was possible to obtain new insights about the organization of brain disorders using the topic mapping approach developed above. We estimated a set of topics using only terms related to brain disorders, based on a lexicon of mental disorders terms derived from the NIFSTD Dysfunction ontology
Topics are ordered in terms of the number of documents loading on the topic; color maps reflect the correlation coefficient between topic loading and activation across documents. The images are presented in radiological convention (i.e., left-right reversed).
The results of this analysis are largely consistent with results from prior meta-analyses and known functional anatomy of the various disorders, but are novel in highlighting relations between some of the disorders. For example, Topic 7 demonstrates the relations between bipolar disorder, schizophrenia, and mood disorders, with activation centered on the medial prefrontal cortex, basal ganglia, and amygdala. Topic 8 highlights relations between obesity and eating disorders and drug abuse, with activation in the ventral striatum and ventromedial prefrontal cortex. Topic 14 demonstrates relations between a set of externalizing disorders (drug abuse, conduct disorder, alcoholism, antisocial personality disorder, and cannabis related disorder) with activation focused in the striatum, amygdala, orbitofrontal cortex, and dorsal prefrontal cortex. Conversely, Topic 25 demonstrates relations between a set of internalizing disorders (anxiety disorder, panic disorder, phobia, obsessive compulsive disorder, agoraphobia, and post traumatic stress disorder), with a very similar pattern of activation, though notably weaker in the striatum. One striking result of these analyses is the similarity of the patterns of brain activity associated with the mention of all of these different disorders. This could arise either from the fact that this particular set of limbic brain systems is the seat of all major psychiatric disorders, or the fact that these disorders are commonly mentioned in relation to tasks or cognitive domains that happen to preferentially engage these brain systems.
We further characterized the relations between different disorder concepts in their associated neural activations by clustering the disorder topics based on their associated brain activation patterns using hierarchical clustering. The results of this analysis are shown in
Euclidean distance was used as the distance metric for clustering, and hierarchical clustering was performed using Ward's method. The colored blocks show the four major groupings obtained by cutting the tree at a height of 2.0. Abbreviations: APH: aphasia, DLX:dyslexia, SLI: specific language impairment, DA: drug abuse, AD:Alzheimer's disease, DEP:depressive disorder, MDD:major depressive disorder, ANX:anxiety disorder, PAN: panic disorder, BPD: bipolar disorder, CD: conduct disorder, GAM: gambling, MD: mood disorder, PD: Parkinson's disease, OCD: obsessive compulsive disorder, PHO: phobia, EAT: eating disorder, SZ: schizophrenia, OBE: obesity, COC: cocaine related disorder, PSY: psychotic disorder, PAR: paranoid disorder, SZTY: schizotypal personality disorder, TIC: tic disorder, ALC: alcoholism, ALX: alexia, ADD: attention deficit disorder, AMN: amnesia, AUT: autism, ASP: Asperger syndrome.
It has commonly been proposed that cognitive functions and neural systems may serve as endophenotypes for neuropsychiatric disorders. We assessed whether it was possible to empirically discover candidate endophenotypes using a multivariate approach to identify sets of mental concepts and disorder terms that were closely associated via their respective activation patterns. We used an
CV # | Mental topics | Disorder topics |
0 | 77 (0.25): mood induction | 25 (0.40): anxiety_disorder panic_disorder |
94 (0.25): reward decision | 13 (0.36): depressive_disorder major_depressive_disorder | |
15 (0.24): reward anticipation | 22 (0.35): gambling drug_abuse | |
40 (0.23): fear generalization | 8 (0.35): obesity cocaine_related_disorder | |
105 (0.23): emotion sadness | 14 (0.32): drug_abuse conduct_disorder | |
1 | 93 (0.24): emotion valence | 10 (0.42): amnesia alzheimers_disease |
105 (0.23): emotion sadness | 3 (0.39): schizophrenia paranoid_schizophrenia | |
39 (0.22): valence arousal | 9 (0.37): schizophrenia schizotypal_personality_disorder | |
33 (0.22): memory retrieval | 23 (0.32): autism specific_language_impairment | |
44 (0.21): risk decision | 18 (0.30): schizophrenia psychotic_disorder | |
2 | 66 (0.27): language syntactic_processing | 24 (0.69): dyslexia specific_language_impairment |
13 (0.26): language comprehension | 11 (0.68): aphasia | |
107 (0.25): language language_processing | 27 (0.20): autism asperger_syndrome | |
26 (0.25): comprehension language | ||
5 (0.25): word_frequency decision | ||
3 | 15 (0.29): reward anticipation | 0 (0.54): mood_disorder parkinsons_disease |
117 (0.27): anticipation feedback | 15 (0.41): attention_deficit_disorder | |
94 (0.27): reward decision | 20 (0.39): attention_deficit_disorder conduct_disorder | |
77 (0.24): mood induction | 12 (0.35): drug_abuse gambling | |
44 (0.23): risk decision | 5 (0.24): obsessive_compulsive_disorder drug_abuse | |
4 | 113 (0.30): encoding memory | 10 (0.77): amnesia alzheimers_disease |
101 (0.27): recognition memory | 17 (0.60): alcoholism alexia | |
36 (0.27): memory explicit_memory | ||
79 (0.25): familiarity recognition | ||
7 (0.25): encoding memory | ||
5 | 129 (0.43): cognition social_cognition | 19 (0.66): autism asperger_syndrome |
1 (0.34): belief theory_of_mind | 27 (0.59): autism asperger_syndrome | |
108 (0.30): empathy pain | 23 (0.36): autism specific_language_impairment | |
45 (0.28): intention prospective_memory | ||
71 (0.27): narrative discourse | ||
11 | 58 (0.45): emotion facial_expression | 26 (0.57): phobia eating_disorder |
49 (0.40): fear emotion | 0 (0.44): mood_disorder parkinsons_disease | |
99 (0.37): facial_expression emotional_expression | 17 (0.42): alcoholism alexia | |
40 (0.30): fear generalization | 16 (0.35): schizophrenia | |
123 (0.28): stress induction | 2 (0.30): schizophrenia psychotic_disorder | |
22 | 59 (0.41): intelligence morphology | 21 (0.70): drug_abuse alzheimers_disease |
88 (0.38): focus attention | 4 (0.46): psychotic_disorder paranoid_disorder | |
14 (0.31): association context | 7 (0.41): bipolar_disorder schizophrenia | |
3 (0.29): memory episodic_memory | 19 (0.30): autism asperger_syndrome | |
35 (0.26): hallucination auditory |
The top five topics for each canonical variate exceeding a loading value of 0.2 are shown in the table.
The first canonical variate (#0) demonstrated associations between a number of both internalizing and externalizing disorders (anxiety, depression, obesity, gambling) which were centered around the involvement of emotional processes (such as mood and fear) and reward-related decision processes. Another canonical variate (#1) was focused on memory processes, and identified a cluster of disorders including classical memory disorders (amnesia and Alzheimer's disease) as well as schizophrenia. Another (#2) focused on language processes and was associated with activity in left prefrontal, temporal, and parietal regions.
The results of the CCA analysis provide a potential new window into the complex psychological and neural underpinnings of schizophrenia and its relation to other psychiatric disorders. Across different canonical variates, schizophrenia is related to mood and decision making processes (components 0 and 3), memory processes (component 5), and social perception (component 10). These could potentially relate to different aspects of schizophrenic symptomatology, such as the distinctions between positive versus negative symptoms or between cognitive versus affective impairments. Further, they provide novel potential targets for genetic association studies, which have struggled to identify meaningful and replicable associations between schizophrenic symptoms or endophenotypes and genetic polymorphisms (cf.
We also performed CCA directly using topic-document loading vectors, in order to determine whether the results differed from CCA computed on neural loading vectors; the results are presented in
CV # | Mental topics | Disorder topics |
0 | 93 (0.35): emotion valence | 22 (0.42): gambling drug_abuse |
77 (0.32): mood induction | 13 (0.42): depressive_disorder major_depressive_disorder | |
44 (0.31): risk decision | 25 (0.41): anxiety_disorder panic_disorder | |
94 (0.31): reward decision | 26 (0.38): phobia eating_disorder | |
49 (0.29): fear emotion | 28 (0.23): borderline_personality_disorder drug_abuse | |
1 | 62 (0.71): reading language | 24 (0.86): dyslexia specific_language_impairment |
72 (0.35): reading language | 11 (0.48): aphasia | |
107 (0.31): language language_processing | ||
5 (0.23): word_frequency decision | ||
2 | 129 (0.81): cognition social_cognition | 19 (0.66): autism asperger_syndrome |
58 (0.28): emotion facial_expression | 27 (0.49): autism asperger_syndrome | |
84 (0.20): gaze attention | 23 (0.46): autism specific_language_impairment | |
3 | 32 (0.47): naming retrieval | 11 (1.00): aphasia |
107 (0.44): language language_processing | ||
26 (0.32): comprehension language | ||
60 (0.32): auditory speech_production | ||
66 (0.31): language syntactic_processing | ||
4 | 90 (0.69): inhibition response_inhibition | 20 (0.69): attention_deficit_disorder conduct_disorder |
11 (0.52): attention sustained_attention | 15 (0.66): attention_deficit_disorder | |
122 (0.27): attention selective_attention | ||
8 (0.21): cognitive_control monitoring | ||
5 | 33 (0.76): memory retrieval | 10 (1.00): amnesia alzheimers_disease |
3 (0.36): memory episodic_memory | ||
64 (0.24): retrieval memory | ||
6 | 35 (0.62): hallucination auditory | 1 (0.61): schizophrenia drug_abuse |
44 (0.36): risk decision | 4 (0.42): psychotic_disorder paranoid_disorder | |
17 (0.33): verbal_fluency word_generation | 6 (0.36): schizophrenia tic_disorder | |
70 (0.31): memory working_memory | 16 (0.29): schizophrenia | |
14 (0.23): association context | 18 (0.27): schizophrenia psychotic_disorder | |
7 | 40 (0.71): fear generalization | 26 (0.82): phobia eating_disorder |
49 (0.55): fear emotion | 25 (0.55): anxiety_disorder panic_disorder | |
73 (0.28): arousal attention | ||
8 | 77 (0.92): mood induction | 13 (0.80): depressive_disorder major_depressive_disorder |
93 (0.22): emotion valence | 7 (0.57): bipolar_disorder schizophrenia | |
9 | 86 (0.56): decision decision_making | 22 (0.98): gambling drug_abuse |
100 (0.41): choice decision | ||
94 (0.39): reward decision | ||
15 (0.33): reward anticipation | ||
44 (0.33): risk decision | ||
10 | 98 (0.60): stress association | 28 (0.99): borderline_personality_disorder drug_abuse |
67 (0.45): maintenance distraction | ||
93 (0.29): emotion valence | ||
105 (0.28): emotion sadness | ||
81 (0.23): hearing auditory | ||
11 | 78 (0.38): movement motor_control | 5 (0.97): obsessive_compulsive_disorder drug_abuse |
21 (0.37): interference interference_resolution | 0 (0.23): mood_disorder parkinsons_disease | |
76 (0.35): planning motor_planning | ||
124 (0.34): feedback learning | ||
8 (0.27): cognitive_control monitoring | ||
12 | 75 (0.59): retention consolidation | 9 (1.00): schizophrenia schizotypal_personality_disorder |
26 (0.34): comprehension language | ||
96 (0.29): context context_memory | ||
1 (0.26): belief theory_of_mind | ||
127 (0.23): memory encoding | ||
13 | 54 (0.70): desire habit | 8 (0.80): obesity cocaine_related_disorder |
15 (0.33): reward anticipation | 21 (0.43): drug_abuse alzheimers_disease | |
94 (0.31): reward decision | 12 (0.35): drug_abuse gambling | |
9 (0.28): executive_function attention | ||
65 (0.25): recall humor | ||
14 | 3 (0.60): memory episodic_memory | 17 (1.00): alcoholism alexia |
48 (0.44): metaphor meaning | ||
117 (0.25): anticipation feedback | ||
62 (0.23): reading language | ||
125 (0.22): skill learning | ||
15 | 17 (0.50): verbal_fluency word_generation | 6 (1.00): schizophrenia tic_disorder |
45 (0.36): intention prospective_memory | ||
111 (0.33): memory working_memory | ||
97 (0.30): awareness consciousness | ||
4 (0.29): cognition recognition | ||
16 | 70 (0.57): memory working_memory | 16 (0.99): schizophrenia |
96 (0.33): context context_memory | ||
58 (0.30): emotion facial_expression | ||
10 (0.27): rehearsal memory | ||
74 (0.23): auditory perception | ||
17 | 4 (0.45): cognition recognition | 0 (1.00): mood_disorder parkinsons_disease |
128 (0.39): movement focus | ||
51 (0.38): learning sequence_learning | ||
38 (0.35): categorization prototype | ||
124 (0.29): feedback learning | ||
18 | 18 (0.63): lying deception | 14 (1.00): drug_abuse conduct_disorder |
64 (0.34): retrieval memory | ||
98 (0.30): stress association | ||
99 (0.24): facial_expression emotional_expression | ||
19 | 84 (0.50): gaze attention | 23 (1.00): autism specific_language_impairment |
85 (0.32): inference knowledge | ||
108 (0.29): empathy pain | ||
45 (0.25): intention prospective_memory | ||
4 (0.25): cognition recognition | ||
20 | 45 (0.46): intention prospective_memory | 3 (1.00): schizophrenia paranoid_schizophrenia |
35 (0.44): hallucination auditory | ||
21 (0.41): interference interference_resolution | ||
81 (0.27): hearing auditory | ||
3 (0.25): memory episodic_memory | ||
22 | 8 (0.75): cognitive_control monitoring | 12 (1.00): drug_abuse gambling |
102 (0.25): action goal | ||
43 (0.25): attention focus | ||
16 (0.22): pain perception | ||
54 (0.21): desire habit | ||
23 | 62 (0.83): reading language | 24 (1.00): dyslexia specific_language_impairment |
72 (0.48): reading language |
The top five topics for each canonical variate exceeding a loading value of 0.2 are shown in the table.
It is clear that neuroimaging can provide important evidence regarding the functional organization of the brain, but one of the most fundamental questions in cognitive neuroscience has been whether it can provide any new insights into psychological function
There is growing recognition that the diagnostic categories used in psychiatry are not reflective of sharp parallel biological distinctions; instead, a growing body of behavioral, genetic, and neuroimaging data suggest that these different disorders fall along a set of underlying continuous dimensions which likely relate to particular basic psychological processes
The present work is limited by several features of the data that were used in the analyses. The first limitation arises from the fact that we rely upon the presence of particular terms in the text, rather than on manual annotation of the relevance of those terms. Thus, obvious issues such as polysemy (e.g., the multiple senses of the term “working memory”) and negation can be problematic, though these issues could potentially be addressed using more powerful natural language processing. A second limitation arises from the meta-analytic nature of the activation data used in the analyses, which are reconstructed from a very sparse representation of the original data. A third limitation is that the activation maps are associated only with complete documents, not with specific terms within the document, and this coarseness undoubtedly adds a significant amount of noise to the modeling results. These limitations necessitate caution in drawing strong conclusions from the results reported here. At the same time, the concordance of many of the results with previous analyses using different datasets and analysis approaches suggests that these limitations have not greatly undermined the power of the technique. We propose that the approach outlined here is likely to be most useful for inspiring novel hypotheses rather than for confirming existing hypotheses, which means that any such results will be just the first step in a research program that must also include hypothesis-driven experimentation.
Another potential limitation of the present work is that the fact that a number of the parameters in the analyses were set arbitrarily. While the dimensionality of the topic models was determined using an automated method, there remain parameter settings (such as smoothness of the word and topic distributions) that must be chosen arbitrarily (in our case, we chose them based on previously published results). The results of the topic model are quite robust; for example, we saw very similar results when performing the topic models on the original set of 4,393 papers from the earlier paper by Yarkoni et al. compared to the results from the corpus of 5,809 papers. It is also evident from
The present work follows and extends other recent work that has aimed to mine the relations between mental function and brain function using coordinate-based meta-analyses. Smith et al.
In summary, we have shown how large neuroimaging and text databases can be used to identify novel relations between brain, mind, and mental disorders. The approach developed here has the potential to enable new discoveries about the neural and cognitive bases of neuropsychiatric disorders, and to provide empirically-driven functional characterizations of patterns of brain activation. The results also highlight the importance of the availability of large open datasets in cognitive neuroscience to enable discovery-based science as a complement to hypothesis-driven research.
Code to implement all of the analyses reported here, along with all of the auxiliary files, are available at
The full text from the Neurosynth corpus was used for the text mining analyses. The sources of these data as well as the process for automated extraction of activation coordinates are described in detail in
Synthetic activation peak images were created from the extracted activation coordinates by placing a sphere (10 mm radius) at each activation location, at 3 mm resolution using the MNI305 template. Activations detected to be in Talairach space were first converted to MNI305 coordinates using the Lancaster transform
We ran two topic modeling analyses using limited sets of terms to obtain focused topics in specific domains. In the first, we used 605 mental concept terms from the Cognitive Atlas database mentioned previously. In the second, we used a set of 55 terms describing mental disorders; these were obtained by taking the NIFSTD Dysfunction ontology and removing all terms not relevant to psychiatric disorders, and then adding a set of missing terms that described additional disorders listed in the DSM-IV. In each case, we processed the full text corpus and created restricted documents containing only terms that were present in the respective term list (along with synonyms, which were mapped back to the base term), and then performed topic modeling on those restricted documents. The median number of terms per document after filtering was 127 for cognitive terms and 3 for disease terms.
Topic modeling was performed using latent Dirichlet allocation
For each dataset, the optimal number of topics was determined by performing a grid search across a range of dimensionality values (from 10 to 250 in steps of 10). Each document set was split into 8 random sets of documents, and 8 separate models were trained, in each case leaving out one subset of documents. The empirical likelihood of the left-out documents was then estimated using an importance sampling method as implemented in MALLET
In order to identify the hierarchical relations between topics across different dimensionalities (as shown in
Topic maps were created separately for each topic by first computing a voxelwise chi-squared statistic for the association across documents between activation of the voxel (which is a binary feature due to the use of a spherical kernel) and the loading of that document on that topic (after thresholding the topic loading value p>0 and binarizing). This thresholding resulted in an mean number of documents per topic of 292 for the Cognitive Atlas analysis, and 177 for the neuropsychiatric disorders analysis. The voxelwise chi-squared p-value maps were then corrected for false discovery rate at q
Disorders were clustered using hierarchical clustering (Ward's method) applied to the Euclidean distance matrix computed across voxels for the disorder-based topic maps (Pearson r values).
Canonical correlation analysis (CCA) was used to identify sets of mental function and disorder topics that were closely associated in neural activation space. In order to reduce the dimensionality of the data, the topic maps (Pearson r values) were first sampled from the original 3 mm space into 6 mm voxels. These datasets were then submitted to penalized canonical correlation analysis
Complete list of topics identified through application of latent Dirichlet allocation to the text corpus filtered for Cognitive Atlas terms. The top 5 words shown for each topic are those which had the highest loading for that topic across documents. The number of documents that loaded on each topic is also listed.
(PDF)
Complete list of topics identified through application of latent Dirichlet allocation to the text corpus filtered for mental disorder terms. The top 5 words shown for each topic are those which had the highest loading for that topic across documents. The number of documents that loaded on each topic is also listed.
(PDF)
Thanks to Robert Bilder, Eliza Congdon, Steve Hanson, Oluwasanmi Koyejo, Jonathan Pillow, and Fred Sabb for helpful comments on a draft of this paper and to Daniela Witten for assistance with the R PMA package.