Top keywords (global influence):
Top topics (local contexts):
Explore the main topics and terms outlined above or see them in the excerpts from this text below.
See the relevant data in context: click here to show the excerpts from this text that contain these topics below.
Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.
Tip: here are the keyword queries that people search for but don't actually find in the search results.

#Text_mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality #information from #text.

   edit   unpin & show all

 

@text_mining tasks include #text_categorization, #text_clustering, #concept_extraction, production of #granular_taxonomies, #sentiment_analysis, #document_summarization

   edit   unpin & show all

 

@sentiment_analysis #wordnet #conceptnet

   edit   unpin & show all

 

#text_mining #nlp

   edit   unpin & show all

 

#lsa is a technique in #nlp

   edit   unpin & show all

 

#LSA assumes that #words that are close in meaning will occur in similar pieces of #text

   edit   unpin & show all

 

#lsa can use a #term_document_matrix which describes the occurrences of #terms in #documents; it is a #sparse_matrix whose rows correspond to #terms and whose columns correspond to #documents.

   edit   unpin & show all

 

A typical example of the weighting of the elements of the #matrix in #lsa is #tf_idf (term #frequency–inverse document #frequency): the weight of an element of the #matrix is proportional to the number of times the #terms appear in each document, where rare #terms are upweighted to reflect their relative importance.

   edit   unpin & show all

 

#lsa can be used to #analyze #word_association in #text_corpus

   edit   unpin & show all

 

#lsa has been used to assist in performing #prior_art searches for #patents.

   edit   unpin & show all

 

The use of #lsa has been prevalent in the study of human #memory, especially in areas of#free_recall and #memory_search.

   edit   unpin & show all

 

#lsi is an #indexing and #retrieval #method

   edit   unpin & show all

 

#lsi is #lsa

   edit   unpin & show all

 

explicit semantic analysis (@esa) is a #vectoral_representation of #text (individual words or entire #documents) that uses a #document_corpus as a knowledge base. Specifically, in esa, a word is represented as a column vector in the #tf_idf matrix of the #text corpus and a #document (string of words) is represented as the centroid of the vectors representing its words.

   edit   unpin & show all

 

#document_corpus #document

   edit   unpin & show all

 

#documents #document

   edit   unpin & show all

 

@topic_model is a type of #statistical_model for discovering the abstract "#topics" that occur in a collection of #documents.

   edit   unpin & show all

 

An early #topic_model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998.[2] Another one, called probabilistic latent semantic analysis (#plsa), was created by Thomas Hofmann in 1999.[3] Latent Dirichlet allocation (#lda), perhaps the most common #topic_model currently in use, is a generalization of #plsa

   edit   unpin & show all

 

#plsa #lsa

   edit   unpin & show all

 

#lda introduces sparse #dirichlet prior distributions over #document_topic and #topic_word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words.[

   edit   unpin & show all

 

#topic_models are generally extensions on #lda, such as Pachinko allocation, which improves on #lda by modeling #correlations between #topics in addition to the #word #correlations which constitute #topics

   edit   unpin & show all

 

#tf_idf or #tfidf, short for term #frequency–inverse #document #frequency, is a numerical #statistic that is intended to reflect how important a #word is to a #document in a collection or #corpus.

   edit   unpin & show all

 

83% of text-based #recommender systems in digital libraries use #tf_idf

   edit   unpin & show all

 

latent Dirichlet allocation (#lda) is a generative statistical #model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

   edit   unpin & show all

 

In #lda, each #document may be viewed as a mixture of various #topics where each #document is considered to have a set of #topics that are assigned to it via #lda. This is identical to probabilistic latent semantic analysis (#plsa), except that in #lda the #topic distribution is assumed to have a sparse #dirichlet prior.

   edit   unpin & show all

 

#lda is a generalization of the #plsa model, which is equivalent to #lda under a uniform #dirichlet prior #distribution.

   edit   unpin & show all

 

#lda and #lsa study commonalities between different #words (how often they are used together) to identify #topics that go together. this data can then be used to extract a right set of #documents for a #search #query or to see which #words are related to a #search #query

   edit   unpin & show all

 

pachinko allocation model (#pam) is a #topic model. Topic #models are a suite of algorithms to uncover the hidden thematic structure of a #collection of #documents. [1] The algorithm improves upon earlier #topic #models such as latent Dirichlet allocation (#lda) by modeling #correlations between #topics in addition to the #word #correlations which constitute #topics.

   edit   unpin & show all

 

#dirichlet process is a #probability #distribution whose range is itself a set of #probability #distribution

   edit   unpin & show all

 

#n_gram is a contiguous #sequence of n items from a given sample of #text or #speech.

   edit   unpin & show all

 

An #n_gram #model is a type of probabilistic language #model for predicting the next item in such a #sequence in the form of a (n − 1)–order #markov #model

   edit   unpin & show all

 

#rst in #text #summarization and other applications. #rst addresses #text organization by means of #relations that hold between parts of #text. It explains coherence by postulating a hierarchical, connected #structure of #texts

   edit   unpin & show all

 

#rhetorical_structure_theory #rst

   edit   unpin & show all

 

#semantic_folding theory describes a procedure for encoding the #semantics of natural #language #text in a semantically grounded #binary #representation.

   edit   unpin & show all

 

#semantic_compression is a process of compacting a #lexicon used to build a textual #document

   edit   unpin & show all

 

#tf_idf with #k_means #clustering,

   edit   unpin & show all

 

#k_means returns #cluster centroids as "#topics" and #lda assigns #words to the different #topics

   edit   unpin & show all

 

#topic_modeling through 4 of the most popular techniques today: #lsa, #plsa, #lda, and the newer, #deep_learning based #lda2vec. https://medium.com/nanonets/topic-modeling-with-lsa-psla-lda-and-#lda2vec-555ff65b0b05

   edit   unpin & show all

 

for #lsa we need a #corpus of #documents to analyze which #words belong to which #documents and to also weigh out the ones that are too frequently appearing in all the #texts

   edit   unpin & show all

 

This #dimensionality_reduction can be performed using truncated #svd so that #topic #document #matrix is converted into #term #topic #matrix

   edit   unpin & show all

 

#lda is a #bayesian version of #plsa.

   edit   unpin & show all

 

At the #word level, we typically use something like #word2vec to obtain #vector representations.

   edit   unpin & show all

 

#lda2vec is an extension of #word2vec and #lda that jointly learns #word, #document, and #topic_vectors

   edit   unpin & show all

 
in graph:
     

    Show Nodes with degree > 0:

    0 0

    Filter Graphs:


    Filter Time Range
    from: 0
    to: 0


    Recalculate Metrics   Reset Filters
    mindviral immunity:
             
    Main Topical Groups:

    N/A
    +     ?

    The topics are the nodes (words) that tend to co-occur together in the same context (next to each other).

    We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph and are given a distinct color.
    Most Influential Elements:
    N/A
    +     Reveal Non-obvious   ?

    We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

    Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

    The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.

    Network Structure:
    N/A
     ?

    Modularity
    0
    Influence Distribution
    0
    %
    Topics Nodes in Top Topic Components Nodes in Top Comp
    0
    0
    %
    0
    0
    %
    Nodes Av Degree Density  
    0
    0
    0
     


    Undo Select   Export: PNG SVG Gexf
    Action Advice:
    N/A
    Structural Gap
    (ask a research question that would link these two topics):
    N/A
    Reveal the Gap   ?
     
    A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.

    This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

    Latent Topical Brokers
    :
    N/A
    ?

    These are the latent brokers between the topics: the nodes that have an unusually high rate of influence (betweenness centrality) to their freqency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.

    These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

    Emerging Topics
    N/A

    Top Relations
    :

    ⤓ Download   ⤓ Directed Bigrams   ?

    The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default as it allows us to better detect general patterns.

    As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).

     
    Main Topics
    (according to Latent Dirichlet Allocation):
    loading...

    Most Influential Words
    (main topics and words according to LDA):
    loading...

    LDA works only for English-language texts at the moment. More support is coming soon, subscribe @noduslabs to be informed.

     
    download data: CSV  Excel
    Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):

     
    We will build two graphs:
    1) Google search results for your query;
    2) Related searches for your query (Google's SERP);
    Click the Missing Content tab to see the graph that shows the difference between what people search for and what they actually find, indicating the content you could create to fulfil this gap.
    Find a market niche for a certain product, category, idea or service: what people are looking for but cannot yet find*

     
    We will build two graphs:
    1) the content that already exists when you make this search query (informational supply);
    2) what else people are searching for when they make this query (informational demand);
    You can then click the Niche tab to see the difference between the supply and the demand — what people need but do not yet find — the opportunity gap to fulfil.
    Please, enter your query to visualize the search results as a graph, so you can learn more about this topic:

     
       advanced settings
     
    Enter a search query to analyze the Twitter discourse around this topic (last 7 days):

         advanced settings

    Sign Up