graph view:
×  ⁝⁝ 
Graph Language Processing Settings:

 
Specify the settings for your text-to-network conversion algorithm for this graph.
Lemmatizer: ?
Every word will be converted to its lemma (e.g. bricks > brick, taken > take) and will be shown on the graph as a node. Set to your language for more precise results. Switch off to turn off lemmatization and add your custom stop words list below.
 
Show on Graph:   Double Brackets [[]]:  Categories and Tags:   
Stop Words: ?
List the words, comma-separated (no spaces), that should not appear in the graph, in addition to your default global stopwords list.
Example: is,the,as,to,in

 
Synonym Nodes: ? unmerge all
If you'd like some words to appear as one node on the graph, in addition to your default global synonyms list, list the synonyms, one per line.
Example:
machine:machine learning
learning:machine learning

 

×  ⁝⁝ 
Dynamic Graph Settings


See the dynamic evolution of this graph: scroll or "play" the text entries to see how the text propagated through the network graph over time.

the final graph

highlight propagation edge
show visible statements only



 
Play the Graph


current speed of the player:
0 2000

one statement at a time


×  ⁝⁝ 
Export the Data


Network Graph Images:

The graph images for publishing on the web or in a journal. For embeds and URLs use the share menu.
PNG (Image)  SVG (Hi-Res)

Visible Statements (Tagged):

Export the currently filtered (visible) statements with all the meta-data tags (topics, sentiment).
CSV (Spreadsheet)   MD (e.g.Obsidian)  

Network Graph Data:

The raw data with all the statistics for further analysis in another software.
JSON  CSV  Gexf (Gephi)

All the Text:

Plain text used to create this graph without any meta-data.
Download Plain Text (All Statements)
× ⁝⁝ 
Share Graph Image

 
Share a non-interactive image of the graph only, no text:
Download Image Tweet
 
Share Interactive Text Graph

 

 
×  ⁝⁝ 
Save This Graph View:

 

×  ⁝⁝ 
Delete This Graph:

 

×  ⁝⁝ 
Project Notes:
InfraNodus
Top keywords (global influence):
Top topics (local contexts):
Explore the main topics and terms outlined above or see them in the excerpts from this text below.
See the relevant data in context: click here to show the excerpts from this text that contain these topics below.
Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.
Tip: here are the keyword queries that people search for but don't actually find in the search results.

Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into

   edit   deselect   + to AI

 

a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).

   edit   deselect   + to AI

 

Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods.

   edit   deselect   + to AI

 

A typical application is to scan a set of documents written in a natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.

   edit   deselect   + to AI

 

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.[1] The term is roughly synonymous with text mining; indeed, Ronen Feldman modified a 2000 description of "text mining"[2] in 2004 to describe "text analytics".[3] The latter term is now used more frequently in business settings while "text mining"

   edit   deselect   + to AI

 

is used in some of the earliest application areas, dating to the 1980s,[4] notably life-sciences research and government intelligence.

   edit   deselect   + to AI

 

The term text analytics also describes that application of text analytics to respond to business problems, whether independently or in conjunction with query and analysis of fielded, numerical data. It is a truism that 80 percent of business-relevant information originates in unstructured form, primarily text.[5] These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated

   edit   deselect   + to AI

 

processing.

   edit   deselect   + to AI

 

Increasing interest is being paid to multilingual data mining: the ability to gain information across languages and cluster similar items from different linguistic sources according to their meaning.

   edit   deselect   + to AI

 

The challenge of exploiting the large proportion of enterprise information that originates in "unstructured" form has been recognized for decades.[6] It is recognized in the earliest definition of business intelligence (BI), in an October 1958 IBM Journal article by H. P. Luhn, A Business Intelligence System, which describes a system that will:

   edit   deselect   + to AI

 

"...utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points."

   edit   deselect   + to AI

 

Yet as management information systems developed starting in the 1960s, and as BI emerged in the '80s and '90s as a software category and field of practice, the emphasis was on numerical data stored in relational databases. This is not surprising: text in "unstructured" documents is hard to process. The emergence of text analytics in its current form stems from a refocusing of research in the late 1990s from algorithm development to application, as described by Prof. Marti A. Hearst in the paper

   edit   deselect   + to AI

 

Untangling Text Data Mining:[7]

   edit   deselect   + to AI

 

For almost a decade the computational linguistics community has viewed large text collections as a resource to be tapped in order to produce better text analysis algorithms. In this paper, I have attempted to suggest a new emphasis: the use of large online text collections to discover new facts and trends about the world itself. I suggest that to make progress we do not need fully artificial intelligent text analysis; rather, a mixture of computationally-driven and user-guided analysis may open

   edit   deselect   + to AI

 

the door to exciting new results.

   edit   deselect   + to AI

 

Hearst's 1999 statement of need fairly well describes the state of text analytics technology and practice a decade later.

   edit   deselect   + to AI

 

The technology is now broadly applied for a wide variety of government, research, and business needs. Applications can be sorted into a number of categories by analysis type or by business function. Using this approach to classifying solutions, application categories include:

   edit   deselect   + to AI

 

Many text mining software packages are marketed for security applications, especially monitoring and analysis of online plain text sources such as Internet news, blogs, etc. for national security purposes.[10] It is also involved in the study of text encryption/decryption.

   edit   deselect   + to AI

 

A range of text mining applications in the biomedical literature has been described.[11] e.g. Protein Docking [12]

   edit   deselect   + to AI

 

One online text mining application in the biomedical literature is PubGene that combines biomedical text mining with network visualization as an Internet service.[13][14]

   edit   deselect   + to AI

 

Text mining methods and software is also being researched and developed by major firms, including IBM and Microsoft, to further automate the mining and analysis processes, and by different firms working in the area of search and indexing in general as a way to improve their results. Within public sector much effort has been concentrated on creating software for tracking and monitoring terrorist activities.[15]

   edit   deselect   + to AI

 

Text mining is being used by large media companies, such as the Tribune Company, to clarify information and to provide readers with greater search experiences, which in turn increases site "stickiness" and revenue. Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content.

   edit   deselect   + to AI

 

Text mining is starting to be used in marketing as well, more specifically in analytical customer relationship management.[16] Coussement and Van den Poel (2008)[17][18] apply it to improve predictive analytics models for customer churn (customer attrition).[17] Text mining is also being applied in stock returns prediction.[19]

   edit   deselect   + to AI

 

Sentiment analysis may involve analysis of movie reviews for estimating how favorable a review is for a movie.[20]

   edit   deselect   + to AI

 

Such an analysis may need a labeled data set or labeling of the affectivity of words. Resources for affectivity of words and concepts have been made for WordNet[21] and ConceptNet,[22] respectively.

   edit   deselect   + to AI

 

Text has been used to detect emotions in the related area of affective computing.[23] Text based approaches to affective computing have been used on multiple corpora such as students evaluations, children stories and news stories.

   edit   deselect   + to AI

 

The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to

   edit   deselect   + to AI

 

machines to answer specific queries contained within text without removing publisher barriers to public access.

   edit   deselect   + to AI

 

Academic institutions have also become involved in the text mining initiative:

   edit   deselect   + to AI

 

The automatic analysis of vast textual corpora has created the possibility for scholars to analyse millions of documents in multiple languages with very limited manual intervention. Key enabling technologies have been parsing, machine translation, topic categorization, and machine learning.

   edit   deselect   + to AI

 

The automatic parsing of textual corpora has enabled the extraction of actors and their relational networks on a vast scale, turning textual data into network data. The resulting networks, which can contain thousands of nodes, are then analysed by using tools from network theory to identify the key actors, the key communities or parties, and general properties such as robustness or structural stability of the overall network, or centrality of certain nodes.[28] This automates the approach

   edit   deselect   + to AI

 

introduced by quantitative narrative analysis,[29] whereby subject-verb-object triplets are identified with pairs of actors linked by an action, or pairs formed by actor-object.[27]

   edit   deselect   + to AI

 

Content analysis has been a traditional part of social sciences and media studies for a long time. The automation of content analysis has allowed a "big data" revolution to take place in that field, with studies in social media and newspaper content that include millions of news items. Gender bias, readability, content similarity, reader preferences, and even mood have been analyzed based on text mining methods over millions of documents.[30][31][32][33][34] The analysis of readability, gender

   edit   deselect   + to AI

 

bias and topic bias was demonstrated in Flaounas et al.[35] showing how different topics have different gender biases and levels of readability; the possibility to detect mood patterns in a vast population by analysing Twitter content was demonstrated as well.[36][37]

   edit   deselect   + to AI

 

Text mining computer programs are available from many commercial and open source companies and sources. See List of text mining software.

   edit   deselect   + to AI

 

Because of a lack of flexibilities in European copyright and database law, the mining of in-copyright works (such as web mining) without the permission of the copyright owner is illegal. In the UK in 2014, on the recommendation of the Hargreaves review the government amended copyright law[38] to allow text mining as a limitation and exception. It was only the second country in the world to do so, following Japan, which introduced a mining-specific exception in 2009. However, owing to the

   edit   deselect   + to AI

 

restriction of the Copyright Directive, the UK exception only allows content mining for non-commercial purposes. UK copyright law does not allow this provision to be overridden by contractual terms and conditions.

   edit   deselect   + to AI

 

The European Commission facilitated stakeholder discussion on text and data mining in 2013, under the title of Licences for Europe.[39] The fact that the focus on the solution to this legal issue was licences, and not limitations and exceptions to copyright law, led representatives of universities, researchers, libraries, civil society groups and open access publishers to leave the stakeholder dialogue in May 2013.[40]

   edit   deselect   + to AI

 

By contrast to Europe, the flexible nature of US copyright law, and in particular fair use, means that text mining in America, as well as other fair use countries such as Israel, Taiwan and South Korea, is viewed as being legal. As text mining is transformative, meaning that it does not supplant the original work, it is viewed as being lawful under fair use. For example, as part of the Google Book settlement the presiding judge on the case ruled that Google's digitisation project of

   edit   deselect   + to AI

 

in-copyright books was lawful, in part because of the transformative uses that the digitisation project displayed—one such use being text and data mining.[41]

   edit   deselect   + to AI

 

Until recently, websites most often used text-based searches, which only found documents containing specific user-defined words or phrases. Now, through use of a semantic web, text mining can find content based on meaning and context (rather than just by a specific word). Additionally, text mining software can be used to build large dossiers of information about specific people and events. For example, large datasets based on data extracted from news reports can be built to facilitate social

   edit   deselect   + to AI

 

networks analysis or counter-intelligence. In effect, the text mining software may act in a capacity similar to an intelligence analyst or research librarian, albeit with a more limited scope of analysis. Text mining is also used in some email spam filters as a way of determining the characteristics of messages that are likely to be advertisements or other unwanted material. Text mining plays an important role in determining financial market sentiment. https://en.wikipedia.org/wiki/Text_mining

   edit   deselect   + to AI

 

× ⁝⁝ 
        
Show Nodes with Degree > 0:

0 0

Total Nodes Shown:
 extend

Filter Graphs:


Filter Time Range
from: 0
to: 0


Recalculate Metrics Reset Filters
Show Labels for Nodes > 0 size:

0 0

Default Label Size: 0

0 20



Edges Type:



Layout Type:


 

Reset to Default
semantic variability:
×  ⁝⁝ 
×  ⁝⁝ 
Semantic Variability Score
— modulates diversity of the discourse network  how it works?
The score is calculated based on how modular the structure of the graph is (> 0.4 means the clusters are distinct and separate from one another = multiple perspectives). It also takes into account how the most influential nodes are dispersed among those clusters (higher % = lower concentration of power in a particular cluster).
Actionable Insight:

N/A

We distinguish 4 states of variability in your discourse. We recommend that a well-formed discourse should go through every stage during its evolution (in several iterations).

  1 - (bottom left quadrant) — biased — low variability, low diversity, one central idea (genesis and introduction stage).
  2 - (top right) - focused - medium variability and diversity, several concepts form a cluster (coherent communication stage).
  3 - (bottom right) - diversified — there are several distinct clusters of main ideas present in text, which interact on the global level but maintain specificity (optimization and reflection stage).
  4 - (left top) — dispersed — very high variability — there are disjointed bits and pieces of unrelated ideas, which can be used to construct new ideas (creative reformulation stage).

Read more in the cognitive variability help article.
Generate AI Suggestions
Your Workflow Variability:
 
Shows to what extent you explored all the different states of the graph, from uniform and regular to fractal and complex. Read more in the cognitive variability help article.

You can increase the score by adding content into the graph (your own and AI-generated), as well as removing the nodes from the graph to reveal latent topics and hidden patterns.
Phases to Explore:
AI Suggestions  
×  ⁝⁝ 
     
Main Topical Clusters:

please, add your data to display the stats...
+     full table   ?     Show Categories

The topical clusters are comprised of the nodes (words) that tend to co-occur together in the same context (next to each other).

We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph using the Force Atlas algorithm (Jacomy et al) and are given a distinct color.
Most Influential Elements:
please, add your data to display the stats...
+     Reveal Non-obvious   ?

AI Paraphrase Graph

We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.
Network Structure:
N/A
?
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.


Reset Graph   Export: Show Options
Action Advice:
N/A
Structural Gap
(ask a research question that would link these two topics):
N/A
Reveal the Gap   ?   Generate an AI Question
 
A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.

This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

Latent Topical Brokers
(less visible terms that link important topics):
N/A
?

These are the latent brokers between the topics: the nodes that have an unusually high rate of influence (betweenness centrality) to their freqency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.

These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

Emerging Keywords
N/A

Evolution of Topics
(number of occurrences per text segment) ?
The chart shows how the main topics and the most influential keywords evolved over time. X-axis: time period (split into 10% blocks). Y-axis: cumulative number of occurrences.

Drag the slider to see how the narrative evolved over time. Select the checkbox to recalculate the metrics at every step (slower, but more precise).

 
Main Topics
(according to Latent Dirichlet Allocation):
loading...
 ?  

LDA stands for Latent Dirichlet Allocation — it is a topic modelling algorithm based on calculating the maximum probability of the terms' co-occurrence in a particular text or a corpus.

We provide this data for you to be able to estimate the precision of the default InfraNodus topic modeling method based on text network analysis.
Most Influential Words
(main topics and words according to LDA):
loading...

We provide LDA stats for comparison purposes only. It works with English-language texts at the moment. More languages are coming soon, subscribe @noduslabs to be informed.

Sentiment Analysis


positive: | negative: | neutral:
reset filter    ?  

We analyze the sentiment of each statement to see whether it's positive, negative, or neutral. You can filter the statements by sentiment (clicking above) and see what kind of topics correlate with every mood.

The approach is based on AFINN and Emoji Sentiment Ranking

 
Use the Bert AI model for English, Dutch, German, French, Spanish and Italian to get more precise results (slower). Standard model is faster, works for English only, is less precise, and is based on a fixed AFINN dictionary.

Keyword Relations Analysis:

please, select the node(s) on the graph see their connections...
+   ⤓ download CSV   ?

Use this feature to compare contextual word co-occurrences for a group of selected nodes in your discourse. Expand the list by clicking the + button to see all the nodes your selected nodes are connected to. The total influence score is based on betweenness centrality measure. The higher is the number, the more important are the connections in the context of the discourse.
Top Relations / Bigrams
(both directions):

⤓ Download   ⤓ Directed Bigrams CSV   ?

The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default. Occurrences shows the number of the times a relationship appears in a 4-gram window. Weight shows the weight of that relation.

As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).

Text Statistics:
Word Count Unique Lemmas Characters Lemmas Density
0
0
0
0
Text Network Statistics:
Show Overlapping Nodes Only

⤓ Download as CSV  ⤓ Download an Excel File
Network Structure Insights
 
mind-viral immunity:
N/A
  ?
stucture:
N/A
  ?
The higher is the network's structure diversity and the higher is the alpha in the influence propagation score, the higher is its mind-viral immunity — that is, such network will be more resilient and adaptive than a less diverse one.

In case of a discourse network, high mind-viral immunity means that the text proposes multiple points of view and propagates its influence using both highly influential concepts and smaller, secondary topics.
The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.

Modularity
0
Influence Distribution
0
%
Topics Nodes in Top Topic Components Nodes in Top Comp
0
0
%
0
0
%
Nodes Av Degree Density Weighed Betweenness
0
0
0
0
 

Narrative Influence Propagation:
  ?
The chart above shows how influence propagates through the network. X-axis: lemma to lemma step (narrative chronology). Y-axis: change of influence.

The more even and rhythmical this propagation is, the stronger is the central idea or agenda (see alpha exponent below ~ 0.5 or less).

The more variability can be seen in the propagation profile, the less is the reliance on the main concepts (agenda), the stronger is the role of secondary topical clusters in the narrative.
propagation dynamics: | alpha exponent: (based on Detrended Fluctuation Analysis of influence) ?   show the chart
We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify fractality of this time series, plotting the log2 scales (x) to the log2 of accumulated fluctuations (y). If the resulting loglog relation can be approximated on a linear polyfit, there may be a power-law relation in how the influence propagates in this narrative over time (e.g. most of the time non-influential words, occasionally words with a high influence).

Using the alpha exponent of the fit (which is closely related to Hurst exponent)), we can better understand the nature of this relation: uniform (pulsating | alpha <= 0.65), variable (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).

For maximal diversity, adaptivity, and plurality, the narrative should be close to "fractal" (near-critical state). For fiction, essays, and some forms of poetry — "uniform". Informative texts will often have "variable + stationary" score. The "complex" state is an indicator that the text is always shifting its state.

Degree Distribution:
  calculate & show   ?
(based on kolmogorov-smirnov test) ?   switch to linear
Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.

This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.
If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.

Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.
Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):

 
We will build two graphs:
1) Google search results for your query;
2) Related searches for your query (Google's SERP);
Click the Missing Content tab to see the graph that shows the difference between what people search for and what they actually find, indicating the content you could create to fulfil this gap.
Please, enter a search query to discover what else people are searching for (from Google search or AdWords suggestions):

 
We will build a graph of the search phrases related to your query (Google's SERP suggestions).
Find a market niche for a certain product, category, idea or service: what people are looking for but cannot yet find*

 
We will build two graphs:
1) the content that already exists when you make this search query (informational supply);
2) what else people are searching for when they make this query (informational demand);
You can then click the Niche tab to see the difference between the supply and the demand — what people need but do not yet find — the opportunity gap to fulfil.
Please, enter your query to visualize Google search results as a graph, so you can learn more about this topic:

   advanced settings    add data manually
Discover the main topics, recurrent themes, and missing connections in any text or an article:  
Discover the main themes, sentiment, recurrent topics, and hidden connections in open survey responses:  
Discover the main themes, sentiment, recurrent topics, and hidden connections in customer product reviews:  
Enter a search query to analyze the Twitter discourse around this topic (last 7 days):

     advanced settings    add data manually

Enter a topic or a @user to analyze its social network on Twitter:

 advanced settings    add data manually

Sign Up