Text Network Analysis: Methodology and Applications


Text network analysis can be used to represent a text as a network graph. The words are the nodes and their co-occurrences are the relations. Once a text is encoded as a network, advanced graph theory algorithms can be used to detect the most influential keywords, identify the main topics, the relations between them, and get insights into the structure of any discourse.

A text network allows you to see, in one image, how a text unfolds in time. You can also discover the patterns within any discourse that would not be visible using traditional text mining techniques.


 

The most basic approach would connect the words that are next to each other (bigrams). InfraNodus, however, uses 4-grams (a sliding window containing 4 words) to build the relations between the cleaned concepts (lemmas). This approach allows us to retain the context and mimicks the natural way of retaining the information we read in short-term memory.

After applying several layout algorithms, such as increasing the size of the most common words and grouping them on a 2D / 3D plane into clusters, we get a neat visual representation of any discourse that can be used to get an overview of any text and detect the structural gaps within.


What Can Text Network Analysis Be Used For?

Text network analysis can be used for the same purposes as any other text mining technique. Its main advantage is that it focuses on the relations between the words and retains contextual information and the narrative. Unlike bag-of-words, LDA-based, or Word2Vec models which may lose information about the words sequence, text network can be built in a way that retains the narrative and, therefore, provides more accurate information about the text and its topical structure.

Here are some interesting use cases for text network analysis:

  1. Scientific discourse analysis
  2. Customer sentiment and product review analysis
  3. Text classification
  4. Feature extraction for machine-learning models
  5. Topic detection
  6. Keyword extraction
  7. Literary analysis
  8. Creative writing
  9. Having fun with text


Step-by-Step Text Network Analysis Tutorial


In order to try the text mining app on your data, log in InfraNodus and choose the "Add a new text" app on the Apps screen. Follow the steps outlined below:

You will see a graph where the most prominent terms are bigger and the terms that belong to the same topic have the same color and are closer, so you can better understand the main topics of the text and its structure.

Visualize google search results as a text network graph

You can then select the most prominent keywords to see what parts of text they relate to and to better understand the context they appear in. Like an upgraded Tagcloud.

Zoom into the topic and explore it further

You can also select a topic you're interested in and explore that direction.

Select the most relevant keywords from your search to see relevant results.

You can also generate new ideas in relation to the text if you look at the structural gaps in the graph using the Insight feature.

Select the most relevant keywords from your search to see relevant results.

Video Tutorial:

The short video tutorial below explains this process in detail and shows the workflow that you can follow when you want to do text mining using network visualization.



 

How It Works? TNA Methodology

The basic approach is outlined in this peer-reviewed paper: InfraNodus: Generating Insight Using Network Analysis (Paranyushkin, 2019).

In order to demonstrate how TNA works, we will use a sentence below:

There are multiple fundamental research laboratories based in the USA.
 

This is the basic algorithm used in InfraNodus:

  1. Remove the auxiliary words (e.g. "there", "are", "the", etc.) using stopwords and / or tf-idf models. This helps us focus on the most important concepts
  2. Transform the remaining words into lemmas (e.g. "laboratories" become "laboratory") to avoid redundancy. E.g. multiple fundamental research laboratory base usa
  3. Build a network graph where the lemmas are the nodes in the graph
  4. Arrange the words into the windows of 4 words (4-grams): e.g. [multiple fundamental research laboratory] and [fundamental research laboratory base] and [research laboratory base usa]
  5. Add the edges into the network based on the co-occurrence of words in these windows. If the words / nodes are next to each other, the edge between them gets a maximum weight of 3. If there is 1 words between the two words in the window, the weight is 2. If there are 2 words between the two words, the weight is 1.
  6. Align the words / nodes on a 2D / 3D plane where the most connected nodes (the hubs) are pushed apart from each other, while the less connected nodes are pulled towards hubs (Force-Atlas layout algorithm). This helps build topical clusters that reflect the words' co-occurrence.
  7. Run a modularity algorithm (Blondel et al based on Louvain) that will identify the communities in the network graph. The nodes (words) that appear more frequently together will belong to the same community and have a distinct color. These communities will correlate with the clusters created in step 6.
  8. Calculate betweenness centrality (BC) for each node and rank those nodes that have higher BC as bigger on the graph.
  9. Tag the tag statements with the topics and sentiment data.
  10. Use AI-based generative NLP models to reveal the structural gaps and propose the research questions that could bridge those gaps.


 

Try It Yourself


You can try this approach yourself using InfraNodus for any text, text corpus, research article, PDF document, a book, or your research notes:


Log In    Sign Up
 

 

Bonus Track: Diversity vs Bias in Text


You can use text network analysis to get insights about the text's structure, so you can see how connected (biased) or dispersed (diverse) it is.

Using this structural approach, you can detect the level of plurality in your text and identify whether it focuses on one perspective or reveals multiple points of view.





Custom Text Mining Using Network Analysis


The application above works best for single texts, writing work-in-progress, and articles. If you are interested to apply the same method to a big text corpora there may be some scalability issues related to network graph saturation.

However, using our in-house tools we can help you design custom analysis of your own text corpora. Please, let us know if you have an interesting proposition, use case, research project, or service request.


Contact Us