Thematic Analysis in Qualitative Research with AI: Step by Step Tutorial


Thematic analysis is a process of reviewing data to uncover recurring patterns, topics, or concepts that aid interpretation and comprehension. It goes beyond counting words or phrases to capture the underlying meaning and relevance of a dataset. It is commonly referred to as a basic method in qualitative research and is adaptable to a wide range of research issues and theoretical viewpoints. In this tutorial, we will demonstrate various frameworks and methods that can be used in thematic analysis to generate insights from data using AI.

Thematic analysis knowledge graph generated by InfraNodus

Thematic analysis is a reflective process in which the researcher actively analyzes and interacts with the data (sometimes using the AI or specialized thematic analysis software), realizing their influence in forming the study. A strong theme analysis provides in-depth insights into the subject being studied. Some methodologies suggest employing interview facilitators distinct from the coding team; this separation aims to promote an environment where participants' lived experiences are clearly articulated and equitable participation is encouraged.


Core Principles: Inductive vs. Deductive Coding


A fundamental aspect of thematic analysis involves coding the data – applying labels or tags to segments of text (or other data forms) that capture their essence. Two primary coding approaches guide this process:


1. Inductive Coding

Inductive Thematic Analysis is a "bottom-up" technique. Researchers immerse themselves in the data, with no prior preconceptions or theories. Codes and motifs originate from the data itself. This methodology shares concepts with grounded theory, which tries to produce new theory directly from data through iterative analysis, frequently adopting a constant comparative method in which data collection and analysis take place simultaneously. An introduction to grounded theory frequently emphasizes its usefulness when existing theories are inadequate. It's perfect for discovering a new place or seeking new perspectives.


2. Deductive Coding

Deductive Thematic Analysis employs a "top-down" method. Researchers start with pre-existing theories, notions, or a specific research question to guide the coding process. They search for evidence in the data that supports, contradicts, or modifies the existing framework. Before beginning analysis, deductive coding frequently requires the creation of an initial codebook based on theory or earlier research. It is commonly used to test a theory or compare findings to earlier research, such as in a systematic or literature review.


3. Hybrid Approach

Researchers frequently use a hybrid technique, starting with the inductive approach to study the data and then using deductive approach to connect discoveries to prior knowledge. Understanding the distinction between inductive and deductive procedures is critical when building your analysis process.

As is demonstrated in a recent review of the different coding approaches, depending on the nature of the data, codes might either fall into flat categories or be arranged hierarchically. Flat categories are most common when the data deal with topics on the same conceptual level. In other words, one topic is not a subset of another topic. By contrast, hierarchical codes are more appropriate for concepts that naturally fall above or below each other. Hierarchical coding can also be a useful form of data management and might be necessary when working with a large or complex dataset.


How InfraNodus can help: Before beginning formal coding, tools like InfraNodus can help with the first exploratory phase for both techniques. By visualizing the text data as a network graph, researchers can rapidly discover frequently occurring concepts and their connections, offering an early, data-driven snapshot. This visual map and built-in AI can suggest potential areas of focus for inductive coding or help locate segments relevant to predefined concepts in deductive coding. For example, the screenshot below demonstrates how you can explore the statements that belong to a topical cluster identified by InfraNodus AI:

Thematic analysis AI-identified topic: exploring the statements within
 

The Braun and Clarke Six-Step Framework


Virginia Braun and Victoria Clarke created a well-known and important guide to doing thematic analysis, including a six-step procedure. This Braun Clarke technique offers a clear path for researchers:


1. Familiarization:

Immerse yourself in the data. Read and re-read transcripts (interviews, focus groups, etc.), examine visual data, or go over survey replies several times. Take preliminary notes and write down early ideas. This level of engagement is critical for grasping the subtleties of the text.

How InfraNodus can help: Uploading transcripts or text data into InfraNodus at this stage provides an immediate visual summary. Exploring the graph helps quickly grasp the main topics and relationships, accelerating familiarization with large datasets.


2. Generating Initial Codes:

Work through the full dataset in a systematic manner, discovering noteworthy features and assigning concise labels (codes) to represent them. Code inclusively, capturing anything potentially related to the study question. This first coding is the foundation of the analysis.

How InfraNodus can help: InfraNodus can automatically identify key terms and their co-occurrences, suggesting potential initial codes. Its network visualization helps see how different concepts are linked, aiding in the generation of meaningful codes beyond simple frequency, focusing on the context where they appear.

Thematic analysis: coding the statements with InfraNodus

3. Searching for Themes:

Group the generated codes into prospective overarching themes. Examine the relationships between codes, looking for patterns and larger regions of significance. Consider how multiple codes might mix to create a single theme. This level includes active interpretation and pattern recognition.

How InfraNodus can help: The visual clustering of nodes (concepts) in InfraNodus can directly highlight potential themes. The tool's community detection algorithms group related concepts, visually representing how codes might aggregate into broader thematic areas.

Alternatively, you can upload the content you coded with other tools, visualize it as a graph, and let InfraNodus identify recurring themes. InfraNodus can also identify the gaps between the different themes.


4. Reviewing Themes:

Compare the proposed themes to the coded data extracts and the complete dataset. Refine, merge, separate, or remove themes. Ensure that they accurately represent the facts and answer the research question. Examine for internal coherence and external distinctiveness. This recurrent assessment is critical for a thorough theme development.


5. Defining and Naming Themes:

Determine the essence of each final topic. What story does each theme convey? Give each theme a succinct and informative title. Create extensive definitions of the breadth and purpose of each topic, frequently resulting in a final codebook or thematic map that clearly defines each subject with supporting data examples.


6. Writing Up:

Combine the analytic narrative with data extracts to create a scholarly report or output (presentation). Explain how the analysis was carried out, including the technique and reflexive process, as well as how the themes answer the research question, frequently by locating findings within previous literature. It can also be helpful to use the quotes from the text analyzed to better convey the meaning or underlying emotion for the themes you identified.

This step-by-step approach gives structure, although Braun and Clarke note that thematic analysis is frequently recursive and reflexive, rather than strictly linear.


 

Reflexive Thematic Analysis (RTA) Using AI


Building on their earlier work, Braun and Clarke advocate for Reflexive Thematic Analysis (RTA). RTA openly recognizes and emphasizes the researcher's active involvement in knowledge development. It underlines that themes do not just "emerge" from data; rather, they are formed by the researcher's interpretive involvement.

Reflexivity entails ongoing self-awareness of one's assumptions, prejudices, and theoretical positions, as well as how these influence the analytical process and outcomes. Transparency regarding these factors is a cornerstone of RTA and helps overcome objectivity issues.

Interestingly, AI and thematic analysis software that uses text analysis and NLP algorithms can be very helpful for overcoming personal bias and making the analysis more objective. For example, a certain topic may be missed by a researcher but revealed by a software analysis tool that is trained to detect latent ideas (as is shown in the InfraNodus visualization below):

Thematic analysis: uncovering latent ideas with InfraNodus


 

Creating Themes: From Codes to Meaningful Patterns

The development of themes is the heart of the analysis, requiring careful understanding of the data discourse. It's about moving from descriptive initial codes to interpretive patterns that illuminate the research question. This theme development process involves:

- Clustering Codes: Grouping related codes together.

- Identifying Relationships: Understanding how different coded segments connect.

- Abstraction: Moving from specific data points to broader concepts.

- Interpretation: Assigning meaning to the identified patterns, often iteratively comparing data within and between codes (similar to the constant comparative aspect of grounded theory)

- Defining Themes: Clearly expressing the scope and essence of each pattern, which is typically documented in a codebook or theme outline.

Think of it like creating a map (template map word): codes are the landmarks, and themes are the distinct territories or regions that emerge when you see how these landmarks relate to each other. The goal is to create a thematic map that accurately represents the "terrain" of your data. Distilling patterns reveals these clusters—emerging themes that are tightly interwoven yet distinct, offering insight into the main content of the data and highlighting differences and similarities.

How InfraNodus can help: InfraNodus in Action: Visualizing the network of codes and concepts in InfraNodus helps immensely in this theme creation stage. You can see which codes cluster together, identify central concepts that bridge different clusters (potential core themes), and explore the connections between potential themes, aiding in the review and definition phases.

You can do this within InfraNodus itself, or you can import the coded data from other tools, like MaxQDA, where the excerpts and highlights are tagged with the codes. Then you can visualize a graph of those codes and then use the InfraNodus highlight function to see how well they cover the discourse.


 

Tools and Software for Thematic Analysis


While theme analysis can be performed manually (by highlighting, cutting/pasting, and tables), software can substantially speed up the process, particularly with large datasets. There are several Qualitative Data Analysis Software (QDAS) packages available, each with functionality for coding, organizing, retrieving, and, in some cases, visualizing data.

Some of the most popular and widely used QDAS tools include:

- NVivo: A comprehensive platform supporting various data types and analysis approaches.

- MAXQDA: Known for its user-friendly interface and strong mixed-methods capabilities, including AI features.

- InfraNodus: Focused on data visualization and AI thematic analysis, InfraNodus builds a knowledge graph and helps identify themes through relationships.

- ATLAS.ti: Offers powerful tools for coding, visualizing relationships, and incorporating AI for analysis.

- Dedoose: A web-based option particularly suited for collaborative and mixed-methods research.

- QDA Miner Lite: A free version of QDA Miner, useful for basic coding and analysis tasks.

- Delve: An intuitive, web-based tool focusing specifically on streamlining the coding process.


 

Applications and Considerations


Thematic analysis is widely employed in many areas, including psychology, healthcare, education, customer feedback analysis, survey studies, and social sciences. Its applications include:

- Qualitative Insights in Healthcare: TA is essential for uncovering insights about patient experiences, attitudes toward treatments, and impediments to care in healthcare settings. It aids in understanding the 'why' underlying health-related behaviors. However, there are significant downsides to qualitative research, such as smaller sample sizes, which limit generalizability when compared to quantitative studies. Integrating TA into mixed methodologies research (combining qualitative and quantitative data) or doing secondary research (analyzing existing qualitative datasets) might provide a more thorough insight while addressing some limitations.

- Survey Analysis: Thematic analysis is vital in survey analysis and case studies, which focus on a specific instance, group, or event. TA allows researchers to delve deep into the particular case, identify key themes and patterns within that bounded context, and draw rich, detailed, and meaningful conclusions specific to that scenario.

- Grounded Theory Development: While distinct, TA can be a technique used within a grounded theory approach, particularly in the initial stages of coding and theme identification, helping to build theory directly from data, which is valuable in complex areas like healthcare.

- Synthesizing findings in systematic reviews and literature reviews. Finding recurrent themes, topics, and gaps between them.

- Understanding user experiences based on customer reviews and feedback, support requests, and surveys. Thematic analysis can help identify the main topics and better understand the pain points of customers as well as the positive sides of a product. These insights can then be integrated into product development strategies.


 

Challenges, Strengths, and Weaknesses


Common challenges and pitfalls involved in thematic analysis include managing large volumes of data, ensuring analytical rigor, and the potential for researcher bias or superficial analysis. A key challenge is balancing interpretation with objectivity; while pure objectivity is arguably impossible in qualitative research, researchers strive for trustworthiness and credibility through reflexivity and systematic procedures.

Strategies to ensure rigor and enhance reliability (or trustworthiness) include maintaining a clear audit trail of analytical decisions, using multiple coders (and checking inter-coder agreement if appropriate), peer debriefing, and respondent validation. It's crucial to avoid simply cherry-picking data that fits preconceived ideas. The benefit of carefully navigating these challenges is a rich, credible, and insightful analysis.

Like any method, thematic analysis, especially in fields like psychology and broader qualitative research, has strengths and weaknesses:

Common Strengths: High flexibility across different theoretical frameworks; relative accessibility for novice researchers; capability to handle large datasets (especially with software); excellent for summarizing key features of complex data; useful for identifying patterns across diverse data sources like interviews or surveys. Offers significant advantage in providing rich, detailed, and nuanced qualitative insights. The potential benefit of uncovering unexpected findings is high.

Common Weaknesses: The flexibility can sometimes lead to inconsistency or lack of clarity if procedures aren't well-documented; potential for researcher bias to influence theme development if reflexivity is poor (a common pitfall); analysis can sometimes remain descriptive rather than deeply interpretive if not pushed further; can be very time-consuming, particularly the familiarization and coding stages; findings may have limited generalizability compared to quantitative studies (though this is often not the primary goal, the challenge lies in appropriate claims); ensuring reliability requires deliberate effort.


 

Amplifying the Patients' Experience: Inviting Partners and Facilitators


The inclusion of patient partners and external facilitators represents a strategic enhancement to the thematic analysis process. In healthcare contexts, this ensures researchers gain a genuine understanding of patients' subjective experiences. This method extends effectively to organizational settings as well, providing a means to minimize researcher bias and navigate internal power structures, thereby validating the contributions of less represented members whose insights might otherwise be marginalized by organizational hierarchy.

A dedicated facilitator — often someone not involved in the primary coding — is key to navigating team dynamics and hierarchies. Their role is to ensure that the valuable insights from all members, including less vocal participants like patient partners or junior researchers, are drawn out and acknowledged alongside the perspectives of senior team members. This inclusive approach fosters an environment where diverse viewpoints contribute to a more nuanced understanding and interpretation of the data.

The systematic process of organizing data through iterative coding and theme development benefits immensely from this collaborative dynamic. Clear communication about the coding framework allows the team to collectively refine codes and themes based on ongoing discussion, enhancing the rigor of the analysis.

This careful intersection of robust research methodology, inclusive facilitation, and direct engagement with patient experience ensures that the themes identified through thematic analysis in healthcare are not only analytically sound but also deeply resonate with the lived realities of those receiving or providing care, leading to more meaningful and impactful findings.


 

The Art of Listening: Contextual Understanding


Achieving deep contextual understanding is a primary goal of thematic analysis, and paying close attention to the specific language used by participants plays a crucial role in this process. Researchers often identify keywords – recurring terms, significant phrases, or evocative words – that emerge from the data. As noted, the repetition of a particular word or concept, perhaps expressing the same meaning in different ways, can highlight its importance within the participants' lived experience and social context. These keywords serve as key signposts, helping to capture the essence of participant perspectives.

However, insight doesn't come from isolated terms alone; thematic analysis demands examining these keywords within their surrounding narrative context. Analyzing how a specific phrase is used, what precedes and follows it, and by which participant, helps unlock its nuanced meaning. Direct quotations or quotes featuring these key words are often used in reporting; they provide powerful evidence, grounding the analysis and offering readers a direct glimpse into the participant context. This meticulous attention to language allows thematic analysis to move beyond surface-level descriptions and provide a rich, deep understanding of the phenomenon being studied.


 

Data Saturation


In thematic analysis, as in much qualitative research, the principle of data saturation is crucial for establishing the credibility and comprehensiveness of the findings, referring conceptually to the point during data collection and/or analysis where gathering additional data seems unlikely to yield substantial new insights or significantly alter the developing thematic framework. Reaching saturation suggests the researcher has likely captured the necessary range and depth of information pertinent to the research question, providing confidence that the resulting themes are well-grounded in participant experiences.

However, "saturation" itself is nuanced, often distinguishing between at least two key forms: code saturation and meaning saturation.

Code saturation, perhaps the more commonly understood form, refers to the point where no new codes or potential new themes emerge from analyzing further data segments, indicating the breadth of conceptual issues has likely been identified – essentially, substantially different kinds of relevant information cease to appear. While there's generally no strict formula for determining code saturation, it's often reached when the researcher has a good understanding of the data and has identified a significant number of codes that are meaningful and useful for the research question.

In contrast, meaning saturation focuses on the depth of understanding, reached when the researcher feels they have developed a rich, complex, nuanced grasp of the themes already identified, such that further analysis adds examples but doesn't fundamentally change the interpretation of the core meaning within existing themes, signaling that the analyst has thoroughly explored their dimensions and significance. In practice, while code saturation might be reached relatively early, achieving deep meaning saturation often requires more extensive engagement, reflection, and interpretation of existing data; determining either type involves careful researcher judgment and iterative comparison rather than a strict formula, assessing whether further analysis genuinely enriches understanding for the research aims or merely adds redundant information.


 

Conclusion


Thematic analysis is a powerful and adaptable method for uncovering meaning within qualitative data. By systematically coding data and developing well-defined themes, researchers can gain deep insights into their research topics. Following established guidelines, like the Braun Clarke six-step framework, engaging in reflexive practice, and actively addressing conduct challenges enhances the credibility and depth of the findings.

Using specialized software, including popular QDAS packages like NVivo, MAXQDA, and ATLAS.ti, or innovative visual tools like InfraNodus, can further streamline the process involved in thematic data analysis, helping researchers navigate complex datasets, identify patterns efficiently, and ultimately create compelling and insightful analyses.

While thematic analysis is normally used in academic and therapeutic settings, it can be applied to any qualitative data, including customer feedback, survey responses, and more. It is a very powerful technique for understanding human experiences and uncovering common themes that can be addressed to study how people experience the world, interact with products, what their pain points are, and how those could be addressed to improve the quality of our lives.


 

References


[1] Kiger, M & Varpio L (2020). Thematic analysis of qualitative data: AMEE Guide No. 131

[2] Braun, V., & Clarke, V. (2013). Using thematic analysis in psychology. Qualitative Research in Psychology, 10(2), 177-198.

[3] Roberts, K., Dowell, A., Nie, J.B. (2018). Attempting rigour and replicability in thematic analysis of qualitative research data; a case study of codebook development


 

Try It Yourself


Try InfraNodus for thematic analysis. We offer a free 2-week trial and after signing up you can request an education discount.


Sign Up     Log In   
 

 

Consulting Services


Contact us if you are interested to hire certified InfraNodus experts to conduct thematic analysis of your data.


Contact Us