341
edits
Line 33: | Line 33: | ||
For the texts to generate the word-clouds, the text of the posts and comments are extracted from the JSON files and are analyzed using the natural language processing technique ''word2vec''. | For the texts to generate the word-clouds, the text of the posts and comments are extracted from the JSON files and are analyzed using the natural language processing technique ''word2vec''. | ||
[[File:VRI-LEK-Epochs.PNG|none|1000px|Learning Process]] | [[File:VRI-LEK-Epochs.PNG|none|1000px|Learning Process]] | ||
The resulting vector space is of very high dimensionality, thus cannot be easily visualized. To reduce the high dimensional space to three | The resulting vector space is of very high dimensionality, thus cannot be easily visualized. To reduce the high dimensional space to three dimensions the method ''t-distributed stochastic neighbor embedding'' is used, which keeps words close together that are close in the high dimensional space. | ||
[[File:VRI-LEK-Graph.png|none|500px|Resulting Graph]] | [[File:VRI-LEK-Graph.png|none|500px|Resulting Graph]] | ||
The resulting data is then imported into unity using a ''csv''-file and for every data-point a billboard-text of the word is generated. This process is repeated for every text. | The resulting data is then imported into unity using a ''csv''-file and for every data-point a billboard-text of the word is generated. This process is repeated for every text. |
edits