Semantic Variability Score
— modulates diversity of the discourse network  how it works?
The score is calculated based on how modular the structure of the graph is (> 0.4 means the clusters are distinct and separate from one another = multiple perspectives). It also takes into account how the most influential nodes are dispersed among those clusters (higher % = lower concentration of power in a particular cluster).
Actionable Insight:


We distinguish 4 states of variability in your discourse. We recommend that a well-formed discourse should go through every stage during its evolution (in several iterations).

  1 - (bottom left quadrant) — biased — low variability, low diversity, one central idea (genesis and introduction stage).
  2 - (top right) - focused - medium variability and diversity, several concepts form a cluster (coherent communication stage).
  3 - (bottom right) - diversified — there are several distinct clusters of main ideas present in text, which interact on the global level but maintain specificity (optimization and reflection stage).
  4 - (left top) — dispersed — very high variability — there are disjointed bits and pieces of unrelated ideas, which can be used to construct new ideas (creative reformulation stage).

Read more in the cognitive variability help article.
Your Workflow Variability:
Shows to what extent you explored all the different states of the graph, from uniform and regular to fractal and complex. Read more in the cognitive variability help article.

You can increase the score by adding content into the graph (your own and AI-generated), as well as removing the nodes from the graph to reveal latent topics and hidden patterns.
Phases to Explore:
The summary is generated based on the text analyzed as well as the underlying knowledge graph structure and topics to retain semantic relations.
We identify the topics that could be better connected and generate a question bridges a gap between them: helping you get the maximum informational gain in this particular context.
The topical clusters are comprised of the nodes (words) that tend to co-occur together in the same context (next to each other). Click Reveal High-Level Ideas to use GPT-4 to auto-generate names for them.
The percentage shows the relative influence of a topic in a discourse based on the sum of betweenness centrality of the nodes contained within. The number that follows is the total number of nodes in each cluster.

We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph using the Force Atlas algorithm (Jacomy et al) and are given a distinct color.
The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.

We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

Click the Reveal Underlying Ideas button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.
Topical diversity score indicates whether this text is biased or focused toward a few central concepts or has multiple topical clusters that are distinct from one another. There are 4 possible states: Biased (Low), Focused (Medium), Diverse (Optimal), and Dispersed (Very High). The desired state depends on your objective, but for most situations the optimal state is Diverse. You can read more about it in our discourse diversity help article.

This measure is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top topic, and relative influence of the top 2 topical clusters.
A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not very well connected. That's where the new potential and innovative ideas may reside.

This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.
These concepts can be effective entrance / connector points that can be used to embed ideas into this discourse or start a conversation on a related topic without touching upon the most obvious terms. They connect the discourse to both main ideas and peripheral topics. They also have high inlfuence, but not too many connections, which makes them less congested.

Technically, these are the nodes with high "diversivity" — they have unusually high rate of influence (betweenness centrality) to their frequency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.
The chart shows how the main topics and the most influential keywords evolved over time. X-axis: time period (split into 10% blocks). Y-axis: cumulative number of occurrences.

Drag the slider to see how the narrative evolved over time. Select the checkbox to recalculate the metrics at every step (slower, but more precise).

LDA stands for Latent Dirichlet Allocation — it is a topic modelling algorithm based on calculating the maximum probability of the terms' co-occurrence in a particular text or a corpus.

We provide this data for you to be able to estimate the precision of the default InfraNodus topic modeling method based on text network analysis.
We analyze the sentiment of each statement to see whether it's positive, negative, or neutral. You can filter the statements by sentiment (clicking above) and see what kind of topics correlate with every mood.

The approach is based on AFINN and Emoji Sentiment Ranking

Use the Bert AI model for English, Dutch, German, French, Spanish and Italian to get more precise results (slower). Standard model is faster, works for English only, is less precise, and is based on a fixed AFINN dictionary.

Use this feature to compare contextual word co-occurrences for a group of selected nodes in your discourse. Expand the list by clicking the + button to see all the nodes your selected nodes are connected to. The total influence score is based on betweenness centrality measure. The higher is the number, the more important are the connections in the context of the discourse.
The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default. Occurrences shows the number of the times a relationship appears in a 4-gram window. Weight shows the weight of that relation.

As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).

The higher is the network's structure diversity and the higher is the alpha in the influence propagation score, the higher is its mind-viral immunity — that is, such network will be more resilient and adaptive than a less diverse one.

In case of a discourse network, high mind-viral immunity means that the text proposes multiple points of view and propagates its influence using both highly influential concepts and smaller, secondary topics.

We recommend to try to increase mind-viral immunity for texts that have a low score and to decrease it for texts that have a high score. This ensures that your discourse will be open, but not dispersed.
The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community, and relative influence of the top 2 topical clusters.

We recommend to aim for Diversified structure if you're in the Biased or Focused score range and to aim for the Focused structure if you're in the Dispersed score range.

The chart above shows how influence propagates through the network. X-axis: lemma to lemma step (narrative chronology). Y-axis: change of influence.

The more even and rhythmical this propagation is, the stronger is the central idea or agenda (see alpha exponent below ~ 0.5 or less).

The more variability can be seen in the propagation profile, the less is the reliance on the main concepts (agenda), the stronger is the role of secondary topical clusters in the narrative.
We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify fractality of this time series, plotting the log2 scales (x) to the log2 of accumulated fluctuations (y). If the resulting loglog relation can be approximated on a linear polyfit, there may be a power-law relation in how the influence propagates in this narrative over time (e.g. most of the time non-influential words, occasionally words with a high influence).

Using the alpha exponent of the fit (which is closely related to Hurst exponent)), we can better understand the nature of this relation: uniform (pulsating | alpha <= 0.65), variable (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).

For maximal diversity, adaptivity, and plurality, the narrative should be close to "fractal" (near-critical state). For fiction, essays, and some forms of poetry — "uniform". Informative texts will often have "variable + stationary" score. The "complex" state is an indicator that the text is always shifting its state.

Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.

This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.
If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.

Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.

