Current unsupervised approaches (like LDA) require a lot of human review and pre-definition of the number of classes. In addition a good incremental approach hasn’t been achieved. Therefore, there is a need in the art for a methodology that is able to extract “Concepts” from various groups of texts, over a wide range of data points, with no need to predefine classes, no need to pre-label data, and no need to post-label data.
Such an “unsupervised” Concept extraction lends itself to analyzing the corpus at the speed of a computer processor, rather than a human supervisor, and doing so according to a predetermined algorithm, rather than the whims of a human supervisor, will eliminate bias in the extracted Concepts.
Moreover, such an unsupervised Concept extraction lends itself to utilization with a dynamic corpus, such that more data can be collected as the corpus expands, leading to new Concepts to be extracted and/or further specification of the Concepts.