graph view:
×  ⁝⁝ 
Graph Language Processing Settings:

 
Specify the settings for your text-to-network conversion algorithm for this graph.
Lemmatizer: ?
Every word will be converted to its lemma (e.g. bricks > brick, taken > take) and will be shown on the graph as a node. Set to your language for more precise results. Switch off to turn off lemmatization and add your custom stop words list below.
 
Show on Graph:   Double Brackets [[]]:  Categories and Tags:   
Stop Words: ?
List the words, comma-separated (no spaces), that should not appear in the graph, in addition to your default global stopwords list.
Example: is,the,as,to,in

 
Synonym Nodes: ? unmerge all
If you'd like some words to appear as one node on the graph, in addition to your default global synonyms list, list the synonyms, one per line.
Example:
machine:machine learning
learning:machine learning

 

×  ⁝⁝ 
Dynamic Graph Settings


See the dynamic evolution of this graph: scroll or "play" the text entries to see how the text propagated through the network graph over time.

the final graph

highlight propagation edge
show visible statements only



 
Play the Graph


current speed of the player:
0 2000

one statement at a time


×  ⁝⁝ 
Export the Data


Network Graph Images:

The graph images for publishing on the web or in a journal. For embeds and URLs use the share menu.
PNG (Image)  SVG (Hi-Res)

Visible Statements (Tagged):

Export the currently filtered (visible) statements with all the meta-data tags (topics, sentiment).
CSV (Spreadsheet)   MD (e.g.Obsidian)  

Network Graph Data:

The raw data with all the statistics for further analysis in another software.
JSON  CSV  Gexf (Gephi)

All the Text:

Plain text used to create this graph without any meta-data.
Download Plain Text (All Statements)
× ⁝⁝ 
Share Graph Image

 
Share a non-interactive image of the graph only, no text:
Download Image Tweet
 
Share Interactive Text Graph

 

 
×  ⁝⁝ 
Save This Graph View:

 

×  ⁝⁝ 
Delete This Graph:

 

×  ⁝⁝ 
Project Notes:
InfraNodus
Top keywords (global influence):
Top topics (local contexts):
Explore the main topics and terms outlined above or see them in the excerpts from this text below.
See the relevant data in context: click here to show the excerpts from this text that contain these topics below.
Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.
Tip: here are the keyword queries that people search for but don't actually find in the search results.

#Text_mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality #information from #text.

   edit   deselect   + to AI

 

@text_mining tasks include #text_categorization, #text_clustering, #concept_extraction, production of #granular_taxonomies, #sentiment_analysis, #document_summarization

   edit   deselect   + to AI

 

@sentiment_analysis #wordnet #conceptnet

   edit   deselect   + to AI

 

#text_mining #nlp

   edit   deselect   + to AI

 

#lsa is a technique in #nlp

   edit   deselect   + to AI

 

#LSA assumes that #words that are close in meaning will occur in similar pieces of #text

   edit   deselect   + to AI

 

#lsa can use a #term_document_matrix which describes the occurrences of #terms in #documents; it is a #sparse_matrix whose rows correspond to #terms and whose columns correspond to #documents.

   edit   deselect   + to AI

 

A typical example of the weighting of the elements of the #matrix in #lsa is #tf_idf (term #frequency–inverse document #frequency): the weight of an element of the #matrix is proportional to the number of times the #terms appear in each document, where rare #terms are upweighted to reflect their relative importance.

   edit   deselect   + to AI

 

#lsa can be used to #analyze #word_association in #text_corpus

   edit   deselect   + to AI

 

#lsa has been used to assist in performing #prior_art searches for #patents.

   edit   deselect   + to AI

 

The use of #lsa has been prevalent in the study of human #memory, especially in areas of#free_recall and #memory_search.

   edit   deselect   + to AI

 

#lsi is an #indexing and #retrieval #method

   edit   deselect   + to AI

 

#lsi is #lsa

   edit   deselect   + to AI

 

explicit semantic analysis (@esa) is a #vectoral_representation of #text (individual words or entire #documents) that uses a #document_corpus as a knowledge base. Specifically, in esa, a word is represented as a column vector in the #tf_idf matrix of the #text corpus and a #document (string of words) is represented as the centroid of the vectors representing its words.

   edit   deselect   + to AI

 

#document_corpus #document

   edit   deselect   + to AI

 

#documents #document

   edit   deselect   + to AI

 

@topic_model is a type of #statistical_model for discovering the abstract "#topics" that occur in a collection of #documents.

   edit   deselect   + to AI

 

An early #topic_model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998.[2] Another one, called probabilistic latent semantic analysis (#plsa), was created by Thomas Hofmann in 1999.[3] Latent Dirichlet allocation (#lda), perhaps the most common #topic_model currently in use, is a generalization of #plsa

   edit   deselect   + to AI

 

#plsa #lsa

   edit   deselect   + to AI

 

#lda introduces sparse #dirichlet prior distributions over #document_topic and #topic_word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words.[

   edit   deselect   + to AI

 

#topic_models are generally extensions on #lda, such as Pachinko allocation, which improves on #lda by modeling #correlations between #topics in addition to the #word #correlations which constitute #topics

   edit   deselect   + to AI

 

#tf_idf or #tfidf, short for term #frequency–inverse #document #frequency, is a numerical #statistic that is intended to reflect how important a #word is to a #document in a collection or #corpus.

   edit   deselect   + to AI

 

83% of text-based #recommender systems in digital libraries use #tf_idf

   edit   deselect   + to AI

 

latent Dirichlet allocation (#lda) is a generative statistical #model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

   edit   deselect   + to AI

 

In #lda, each #document may be viewed as a mixture of various #topics where each #document is considered to have a set of #topics that are assigned to it via #lda. This is identical to probabilistic latent semantic analysis (#plsa), except that in #lda the #topic distribution is assumed to have a sparse #dirichlet prior.

   edit   deselect   + to AI

 

#lda is a generalization of the #plsa model, which is equivalent to #lda under a uniform #dirichlet prior #distribution.

   edit   deselect   + to AI

 

#lda and #lsa study commonalities between different #words (how often they are used together) to identify #topics that go together. this data can then be used to extract a right set of #documents for a #search #query or to see which #words are related to a #search #query

   edit   deselect   + to AI

 

pachinko allocation model (#pam) is a #topic model. Topic #models are a suite of algorithms to uncover the hidden thematic structure of a #collection of #documents. [1] The algorithm improves upon earlier #topic #models such as latent Dirichlet allocation (#lda) by modeling #correlations between #topics in addition to the #word #correlations which constitute #topics.

   edit   deselect   + to AI

 

#dirichlet process is a #probability #distribution whose range is itself a set of #probability #distribution

   edit   deselect   + to AI

 

#n_gram is a contiguous #sequence of n items from a given sample of #text or #speech.

   edit   deselect   + to AI

 

An #n_gram #model is a type of probabilistic language #model for predicting the next item in such a #sequence in the form of a (n − 1)–order #markov #model

   edit   deselect   + to AI

 

#rst in #text #summarization and other applications. #rst addresses #text organization by means of #relations that hold between parts of #text. It explains coherence by postulating a hierarchical, connected #structure of #texts

   edit   deselect   + to AI

 

#rhetorical_structure_theory #rst

   edit   deselect   + to AI

 

#semantic_folding theory describes a procedure for encoding the #semantics of natural #language #text in a semantically grounded #binary #representation.

   edit   deselect   + to AI

 

#semantic_compression is a process of compacting a #lexicon used to build a textual #document

   edit   deselect   + to AI

 

#tf_idf with #k_means #clustering,

   edit   deselect   + to AI

 

#k_means returns #cluster centroids as "#topics" and #lda assigns #words to the different #topics

   edit   deselect   + to AI

 

#topic_modeling through 4 of the most popular techniques today: #lsa, #plsa, #lda, and the newer, #deep_learning based #lda2vec. https://medium.com/nanonets/topic-modeling-with-lsa-psla-lda-and-#lda2vec-555ff65b0b05

   edit   deselect   + to AI

 

for #lsa we need a #corpus of #documents to analyze which #words belong to which #documents and to also weigh out the ones that are too frequently appearing in all the #texts

   edit   deselect   + to AI

 

This #dimensionality_reduction can be performed using truncated #svd so that #topic #document #matrix is converted into #term #topic #matrix

   edit   deselect   + to AI

 

#lda is a #bayesian version of #plsa.

   edit   deselect   + to AI

 

At the #word level, we typically use something like #word2vec to obtain #vector representations.

   edit   deselect   + to AI

 

#lda2vec is an extension of #word2vec and #lda that jointly learns #word, #document, and #topic_vectors

   edit   deselect   + to AI

 

× ⁝⁝ 
        
Show Nodes with Degree > 0:

0 0

Total Nodes Shown:
 extend

Filter Graphs:


Filter Time Range
from: 0
to: 0


Recalculate Metrics Reset Filters
Show Labels for Nodes > 0 size:

0 0

Default Label Size: 0

0 20



Edges Type:



Layout Type:


 

Reset to Default
semantic variability:
×  ⁝⁝ 
×  ⁝⁝ 
Semantic Variability Score
— modulates diversity of the discourse network  how it works?
The score is calculated based on how modular the structure of the graph is (> 0.4 means the clusters are distinct and separate from one another = multiple perspectives). It also takes into account how the most influential nodes are dispersed among those clusters (higher % = lower concentration of power in a particular cluster).
Actionable Insight:

N/A

We distinguish 4 states of variability in your discourse. We recommend that a well-formed discourse should go through every stage during its evolution (in several iterations).

  1 - (bottom left quadrant) — biased — low variability, low diversity, one central idea (genesis and introduction stage).
  2 - (top right) - focused - medium variability and diversity, several concepts form a cluster (coherent communication stage).
  3 - (bottom right) - diversified — there are several distinct clusters of main ideas present in text, which interact on the global level but maintain specificity (optimization and reflection stage).
  4 - (left top) — dispersed — very high variability — there are disjointed bits and pieces of unrelated ideas, which can be used to construct new ideas (creative reformulation stage).

Read more in the cognitive variability help article.
Generate AI Suggestions
Your Workflow Variability:
 
Shows to what extent you explored all the different states of the graph, from uniform and regular to fractal and complex. Read more in the cognitive variability help article.

You can increase the score by adding content into the graph (your own and AI-generated), as well as removing the nodes from the graph to reveal latent topics and hidden patterns.
Phases to Explore:
AI Suggestions  
×  ⁝⁝ 
     
Main Topical Clusters:

please, add your data to display the stats...
+     full table   ?     Show Categories

The topical clusters are comprised of the nodes (words) that tend to co-occur together in the same context (next to each other).

We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph using the Force Atlas algorithm (Jacomy et al) and are given a distinct color.
Most Influential Elements:
please, add your data to display the stats...
+     Reveal Non-obvious   ?

AI Paraphrase Graph

We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.
Network Structure:
N/A
?
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.


Reset Graph   Export: Show Options
Action Advice:
N/A
Structural Gap
(ask a research question that would link these two topics):
N/A
Reveal the Gap   ?   Generate an AI Question
 
A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.

This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

Latent Topical Brokers
(less visible terms that link important topics):
N/A
?

These are the latent brokers between the topics: the nodes that have an unusually high rate of influence (betweenness centrality) to their freqency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.

These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

Emerging Keywords
N/A

Evolution of Topics
(number of occurrences per text segment) ?
The chart shows how the main topics and the most influential keywords evolved over time. X-axis: time period (split into 10% blocks). Y-axis: cumulative number of occurrences.

Drag the slider to see how the narrative evolved over time. Select the checkbox to recalculate the metrics at every step (slower, but more precise).

 
Main Topics
(according to Latent Dirichlet Allocation):
loading...
 ?  

LDA stands for Latent Dirichlet Allocation — it is a topic modelling algorithm based on calculating the maximum probability of the terms' co-occurrence in a particular text or a corpus.

We provide this data for you to be able to estimate the precision of the default InfraNodus topic modeling method based on text network analysis.
Most Influential Words
(main topics and words according to LDA):
loading...

We provide LDA stats for comparison purposes only. It works with English-language texts at the moment. More languages are coming soon, subscribe @noduslabs to be informed.

Sentiment Analysis


positive: | negative: | neutral:
reset filter    ?  

We analyze the sentiment of each statement to see whether it's positive, negative, or neutral. You can filter the statements by sentiment (clicking above) and see what kind of topics correlate with every mood.

The approach is based on AFINN and Emoji Sentiment Ranking

 
Use the Bert AI model for English, Dutch, German, French, Spanish and Italian to get more precise results (slower). Standard model is faster, works for English only, is less precise, and is based on a fixed AFINN dictionary.

Keyword Relations Analysis:

please, select the node(s) on the graph see their connections...
+   ⤓ download CSV   ?

Use this feature to compare contextual word co-occurrences for a group of selected nodes in your discourse. Expand the list by clicking the + button to see all the nodes your selected nodes are connected to. The total influence score is based on betweenness centrality measure. The higher is the number, the more important are the connections in the context of the discourse.
Top Relations / Bigrams
(both directions):

⤓ Download   ⤓ Directed Bigrams CSV   ?

The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default. Occurrences shows the number of the times a relationship appears in a 4-gram window. Weight shows the weight of that relation.

As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).

Text Statistics:
Word Count Unique Lemmas Characters Lemmas Density
0
0
0
0
Text Network Statistics:
Show Overlapping Nodes Only

⤓ Download as CSV  ⤓ Download an Excel File
Network Structure Insights
 
mind-viral immunity:
N/A
  ?
stucture:
N/A
  ?
The higher is the network's structure diversity and the higher is the alpha in the influence propagation score, the higher is its mind-viral immunity — that is, such network will be more resilient and adaptive than a less diverse one.

In case of a discourse network, high mind-viral immunity means that the text proposes multiple points of view and propagates its influence using both highly influential concepts and smaller, secondary topics.
The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.

Modularity
0
Influence Distribution
0
%
Topics Nodes in Top Topic Components Nodes in Top Comp
0
0
%
0
0
%
Nodes Av Degree Density Weighed Betweenness
0
0
0
0
 

Narrative Influence Propagation:
  ?
The chart above shows how influence propagates through the network. X-axis: lemma to lemma step (narrative chronology). Y-axis: change of influence.

The more even and rhythmical this propagation is, the stronger is the central idea or agenda (see alpha exponent below ~ 0.5 or less).

The more variability can be seen in the propagation profile, the less is the reliance on the main concepts (agenda), the stronger is the role of secondary topical clusters in the narrative.
propagation dynamics: | alpha exponent: (based on Detrended Fluctuation Analysis of influence) ?   show the chart
We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify fractality of this time series, plotting the log2 scales (x) to the log2 of accumulated fluctuations (y). If the resulting loglog relation can be approximated on a linear polyfit, there may be a power-law relation in how the influence propagates in this narrative over time (e.g. most of the time non-influential words, occasionally words with a high influence).

Using the alpha exponent of the fit (which is closely related to Hurst exponent)), we can better understand the nature of this relation: uniform (pulsating | alpha <= 0.65), variable (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).

For maximal diversity, adaptivity, and plurality, the narrative should be close to "fractal" (near-critical state). For fiction, essays, and some forms of poetry — "uniform". Informative texts will often have "variable + stationary" score. The "complex" state is an indicator that the text is always shifting its state.

Degree Distribution:
  calculate & show   ?
(based on kolmogorov-smirnov test) ?   switch to linear
Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.

This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.
If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.

Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.
Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):

 
We will build two graphs:
1) Google search results for your query;
2) Related searches for your query (Google's SERP);
Click the Missing Content tab to see the graph that shows the difference between what people search for and what they actually find, indicating the content you could create to fulfil this gap.
Please, enter a search query to discover what else people are searching for (from Google search or AdWords suggestions):

 
We will build a graph of the search phrases related to your query (Google's SERP suggestions).
Find a market niche for a certain product, category, idea or service: what people are looking for but cannot yet find*

 
We will build two graphs:
1) the content that already exists when you make this search query (informational supply);
2) what else people are searching for when they make this query (informational demand);
You can then click the Niche tab to see the difference between the supply and the demand — what people need but do not yet find — the opportunity gap to fulfil.
Please, enter your query to visualize Google search results as a graph, so you can learn more about this topic:

   advanced settings    add data manually
Discover the main topics, recurrent themes, and missing connections in any text or an article:  
Discover the main themes, sentiment, recurrent topics, and hidden connections in open survey responses:  
Discover the main themes, sentiment, recurrent topics, and hidden connections in customer product reviews:  
Enter a search query to analyze the Twitter discourse around this topic (last 7 days):

     advanced settings    add data manually

Enter a topic or a @user to analyze its social network on Twitter:

 advanced settings    add data manually

Sign Up