Open-endedness topic classification

Textual data from open-ended systems may be analyzed for the detection and measurement of open-endedness, as in analysis of patent data for technological evolution [1,2,3]

This project proposal is to perform the same type of textual analysis on the textual data of this forum, using its topics as documents (for implementation of doc2vec followed by clustering).

The project will have various components:

  • scraping the forum for topics
  • building the doc2vec model
  • query the model for nearest existing topics to a newly parsed topic
  • interface discourse (rails) python calls (gensim), maybe through pipes?

Discussion of this topic on the discourse forum has started here.