341
edits
Line 12: | Line 12: | ||
[[File:VRI-LEK-RawJSONData.png|none|1000px|Raw JSON data]] | [[File:VRI-LEK-RawJSONData.png|none|1000px|Raw JSON data]] | ||
For the texts | For the texts | ||
To generate the point word-clouds, the | To generate the point word-clouds, the text of the posts and comments are extracted from the JSON files and are analyzed using the natural language processing technique ''word2vec''. | ||
[[File:VRI-LEK-Epochs.PNG|none|1000px|Learning Process]] | [[File:VRI-LEK-Epochs.PNG|none|1000px|Learning Process]] | ||
The resulting vector space is of very high dimensionality, thus cannot be easily visualized. To reduce the high dimensional space to three.dimensions the method ''t-distributed stochastic neighbor embedding'' is used, which keeps words close together that are close in the high dimensional space. | The resulting vector space is of very high dimensionality, thus cannot be easily visualized. To reduce the high dimensional space to three.dimensions the method ''t-distributed stochastic neighbor embedding'' is used, which keeps words close together that are close in the high dimensional space. |
edits