×
Download the Graph Image:

PNG (Image)  SVG (Hi-Res)

Download the Graph Data:

JSON  CSV  Gexf (Gephi)

Download the Text Data:

CSV tagged w Topics   Blocks with Topics   Plain Text
×
Share Graph Image

 
Share a non-interactive image of the graph only, no text:
Download Image Tweet
 
Share Interactive Text Graph

 

 
×
Save / Rename This Graph:

 

×
Delete This Graph:

 

×
About this Context Graph:

 
total nodes:  extend
 
InfraNodus
Top keywords (global influence):
Top topics (local contexts):
Explore the main topics and terms outlined above or see them in the excerpts from this text below.
See the relevant data in context: click here to show the excerpts from this text that contain these topics below.
Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.
Tip: here are the keyword queries that people search for but don't actually find in the search results.

#machine_learning #supervised #unsupervised

   edit   deselect   +add

 

#features are used in #machine_learning for #differentiation

   edit   deselect   +add

 

#features are used for #training in #machine_learning

   edit   deselect   +add

 

the #relationship is called a #model in #machine_learning

   edit   deselect   +add

 

#supervised #machine_learning finds #patterns between #data and #labels

   edit   deselect   +add

 

#patterns are used to make #predictions

   edit   deselect   +add

 

#supervised works with #labeled #data

   edit   deselect   +add

 

the goal of #unsupervised is to identify meaningful #patterns in #data

   edit   deselect   +add

 

sometimes a #model can find #patterns that represent #stereotypes of #bias

   edit   deselect   +add

 

#clustering is a type of #unsupervised #learning

   edit   deselect   +add

 

with #reinforcement_learning (RL) you set up a #model (called an #agent in RL) where it receives a #reward each time it performs well (#reward_function)

   edit   deselect   +add

 

a #shaped #reward increases in the states closer to the #goal #state

   edit   deselect   +add

 

a #sparse #reward is in the #goal #state only

   edit   deselect   +add

 

#positive_reinforcement is an important element of #reinforcement_learning

   edit   deselect   +add

 

if the #reward is providing the #features to the #model that could improve the performance

   edit   deselect   +add

 

#machine_learning #problems: #classification, #regression, #clustering, #association learning #structured_output #ranking

   edit   deselect   +add

 

#clustering is an #unsupervised #learning problem

   edit   deselect   +add

 

#regression requires labeled data — #supervised learning problem

   edit   deselect   +add

 

#classification requires a set of #labels - so it is #supervised

   edit   deselect   +add

 

a #neural_network works through #representations

   edit   deselect   +add

 

#machine_learning process as an #experiment where we run #test after test after test to converge on a workable #model

   edit   deselect   +add

 

A well-defined #problem has both #inputs and #outputs. #inputs are the #features. #outputs are the #labels to predict.

   edit   deselect   +add

 

#training means creating or #learning the #model

   edit   deselect   +add

 

#inference means applying the trained #model to #unlabeled #examples

   edit   deselect   +add

 

A #regression #model #predicts continuous #values.

   edit   deselect   +add

 

A #classification #model #predicts #discrete #values.

   edit   deselect   +add

 

#machine_learning needs to provide #decisions rather than just #predictions

   edit   deselect   +add

 

#labels are the #variables or #values for #predictions

   edit   deselect   +add

 

#features are #inputs #variables describing the #data

   edit   deselect   +add

 

#model_training is done by the #data which has #features and #labels so that it knows what #correlations to extract

   edit   deselect   +add

 

#model maps #examples to predicted #labels

   edit   deselect   +add

 

#loss_function is showing us the degree of #deviation of the #model #prediction from the real #values

   edit   deselect   +add

 

#loss_function can be a square #difference between the #prediction and the #labels

   edit   deselect   +add

 

#loss_function = #observation - #prediction

   edit   deselect   +add

 

#mean_square_error estimates the #deviation of the #loss_function - #prediction from #labels for each element on average

   edit   deselect   +add

 

#model_training is usually based on reducing the #loss in the #loss_function via #mean_square_error but not only

   edit   deselect   +add

 

the #gradient_descent approach is used to minimize the #loss_function

   edit   deselect   +add

 

the #learning_rate determines the size of the #gradient_descent

   edit   deselect   +add

 

#epoch is the number of #batches used in #machine_learning

   edit   deselect   +add

 

if the #training #loss is decreased we say that it's #converged

   edit   deselect   +add

 

#goldilocks #learning rate for a curve, where a #gradient descent reaches the minimum point in the fewest number of steps?

   edit   deselect   +add

 

In #supervised #learning, a machine #learning algorithm builds a model by examining many examples and attempting to find a model that minimizes #loss; this process is called empirical #risk_minimization.

   edit   deselect   +add

 

#loss is the penalty for a bad #prediction. That is, loss is a number indicating how bad the model's #prediction was on a single example.

   edit   deselect   +add

 

Mean square error (#mse) is the average squared #loss per example over the whole #dataset.

   edit   deselect   +add

 

#iterative #learning is used to decrease #loss over time by making small steps and receiving #feedback on the #outputs

   edit   deselect   +add

 

Usually, you #iterate until overall #loss stops changing or at least changes extremely slowly. When that happens, we say that the #model has #converged.

   edit   deselect   +add

 

#epoch represents a full training pass over the entire #dataset such that each example has been seen once. Thus, an #epoch represents N / #batch size training iterations, where N is the total number of examples.

   edit   deselect   +add

 

#learning_rate is a scalar used to #train a model via #gradient_descent. During each iteration, the #gradient_descent algorithm multiplies the #learning_rate by the #gradient. The resulting product is called the #gradient step. #learning_rate is a key #hyperparameter.

   edit   deselect   +add

 

#batch is a number of #examples used in an #iteration

   edit   deselect   +add

 

#anomalies in #features may indicate a potential problem in a #dataset - one should be more careful using that sort of data.

   edit   deselect   +add

 

#learning_rate specifies the size of the step, #batch specifies how many elements we take into the learning process, #epoch specify how many iterations we're going to have

   edit   deselect   +add

 

a #synthetic_feature is made out of several #features and may help #prediction

   edit   deselect   +add

 

a #correlation_matrix shows if there are any #correlations between the #features

   edit   deselect   +add

 

we take a #training_set from our #data and a #test_set then we train the model on the #training_set to see how well it #prediction on the #test_set

   edit   deselect   +add

 

An #overfit model gets a low #loss during training but does a poor job #predicting new data.

   edit   deselect   +add

 

#machine_learning 's goal is to predict well on new #data drawn from a (hidden) true #probability #distribution.

   edit   deselect   +add

 

The less complex a #machine_learning model, the more likely that a good #empirical result is not just due to the #peculiarities of the #sample.

   edit   deselect   +add

 

Partitioning a #data set into a #training_set and #test_set lets you judge whether a given #model will generalize well to new #data.

   edit   deselect   +add

 

#partitioning a #data set into a #training_set and a #test_set

   edit   deselect   +add

 

a #training_set can be split into a smaller #training_set, #test_set and a #validation_set so that the model can be trained better and no #overfitting occurs

   edit   deselect   +add

 

#features are very important for #machine_learning models

   edit   deselect   +add

 

#features engineering should remove #outliers (which might lead to the problem that a #model cannot #predicting outstanding events)

   edit   deselect   +add

 

#data_visualization is important for knowing #data and improving #machine_learning models

   edit   deselect   +add

 

#one_hot_encoding allows us to incorporate categorical #data into our #model

   edit   deselect   +add

 

for very large #values #sparse #representation is used

   edit   deselect   +add

 

#binning_values allows us to to simplify the #data and bring it to the #feature_vector using #one_hot_encoding

   edit   deselect   +add

 

#binning_values by #quantile ensures the number of #examples in each bucket is different

   edit   deselect   +add

 

#feature_crossing can be a powerful way to improve #prediction by combining the #data #features of the #dataset

   edit   deselect   +add

 

#feature_crossing is often done for #one_hot_encoding where multiple #features are crossed to produce interesting #feature_vector

   edit   deselect   +add

 

#feature_crossing is one learning strategy #neural_network is another

   edit   deselect   +add

 

minimize #loss_complexity, which is called #structural_risk_minimization allows to avoid #overfitting the #model

   edit   deselect   +add

 

the #loss_term, measures how well the #model fits the #data, and the #regularization term, which measures model #complexity.

   edit   deselect   +add

 

model #complexity as a function of the #weights of all the #features in the #model.

   edit   deselect   +add

 

model #complexity as a function of the total number of #features with nonzero #weights. (A later module covers this approach.)

   edit   deselect   +add

 

#model developers tune the overall impact of the #regularization term by multiplying its value #scalar known as #lambda (also called the #regularization rate).

   edit   deselect   +add

 

#regularization term, which measures model #complexity.

   edit   deselect   +add

 

#regularization is a technique used in an attempt to solve the #overfitting problem in statistical models.

   edit   deselect   +add

 

#regularization penalizes the #loss_function in that it pushes the model to give lower #weights to each parameter in the model

   edit   deselect   +add

 

a model is #learning the #weights for each of the #features as it is #training itself to minimize #loss and #complexity

   edit   deselect   +add

 

#regularization penalizes the #loss_function for too much #complexity (a high number of #features with nonzero #weights )

   edit   deselect   +add

 

#sigmoid_function maps the the output of the #linear_layer of a model trained with #logistic_regression between zero and one

   edit   deselect   +add

 

#logistic_regression returns a #probability for a #classification

   edit   deselect   +add

 

In order to map a #logistic_regression value to a binary category, you must define a #classification_threshold (also called the #decision_threshold) #threshold

   edit   deselect   +add

 

A #true_positive is when a model correctly #predicts the #positive_class (it made it a prediction and it was true). A #true_negative is when a model correctly #predicts a #negative_class

   edit   deselect   +add

 

A #false_positive is when a model incorrectly #predicts the #positive_class. A #false_negative is when a model incorrectly #predicts the #negative_class

   edit   deselect   +add

 

#accuracy of a model #prediction is the ratio of the #correct_predictions to the total number of #predictions

   edit   deselect   +add

 

#prediction #predicts

   edit   deselect   +add

 

#predicting the #prediction

   edit   deselect   +add

 

#accuracy then is a ratio of #true_positive plus #true_negative to the sum of all other predictions (#true_negative + #true_positive + #false_negative + #false_positive)

   edit   deselect   +add

 

#accuracy alone doesn't tell the full story when you're working with a #class_imbalanced_data_set, like this one, where there is a significant #disparity between the number of #positive and #negative #labels.

   edit   deselect   +add

 

#precision is a ratio of #true_positive to #total_positives claimed (#true_positive + #false_positive)

   edit   deselect   +add

 

#recall is the ratio of #true_positive to the #total_positives that really happend (#true_positive + #false_negative)

   edit   deselect   +add

 

#classification_threshold should strike a balance between #precision and #recall so that both values are at their maximum

   edit   deselect   +add

 

#precision is based on a #claim and #recall is based on #reality

   edit   deselect   +add

 

The #f1_score is the harmonic #mean of the #precision and #recall

   edit   deselect   +add

 

#roc_curve (receiver operating characteristic curve) is a graph showing the performance of a #classification model at all #classification thresholds: the #true_positive rate vs the #false_positive rate

   edit   deselect   +add

 

#auc provides an aggregate measure of performance across all possible #classification #threshold

   edit   deselect   +add

 

#true_positive rate is basically the #recall because it's the relation of #true_positive to the sum of #true_positive and #false_negative

   edit   deselect   +add

 

#false_positive is the reverse of that: what is the ratio of the #claim of the positive that is not true to #reality

   edit   deselect   +add

 

#logistic_regression #predictions should be #unbiased. That is: "average of #predictions" should ≈ "average of observations"

   edit   deselect   +add

 

#prediction_bias is a quantity that measures how far apart the #predictions are from the #observations

   edit   deselect   +add

 

A #z_score is the number of #standard_deviations from the #mean for a particular raw value

   edit   deselect   +add

 

#sparse_vector often contain many #dimensions. Creating a #feature_cross results in even more #dimensions. Which may lead to a higher use of #resources and #memory

   edit   deselect   +add

 

#sparse_vector often contain many #dimensions. Creating a #feature_cross results in even more #dimensions. Which may lead to a higher use of #resources and #memory

   edit   deselect   +add

 

in a high-dimensional #sparse_vector it is good to encourage as many #weights as possible to be zero, so that we reduce the #complexity of the #model and the toll on #resources

   edit   deselect   +add

 

#l2_regularization encourages #weights to be small, but doesn't force them to exactly zero

   edit   deselect   +add

 

#l2_regularization is a sum of #weights and it encourages them to be smaller to reduce the #complexity of the model

   edit   deselect   +add

 

#lambda is used together with #l2_regularization to reduce complexity without pushing the #weights too high

   edit   deselect   +add

 

#regularization and #l2_regularization

   edit   deselect   +add

 

#l2_regularization penalizes #weights square, while #l1_regularization penalizes the #weights

   edit   deselect   +add

 

if we then take a #derivative of #l1_regularization we will remove all the zero #values

   edit   deselect   +add

 

#neural_networks are a more sophisticated version of #feature_cross. In essence, neural networks do the appropriate #feature_crossing for you.

   edit   deselect   +add

 

"#nonlinear" means that you can't accurately predict a #label with a model of the form In other words, the "#decision_surface" is not a #line

   edit   deselect   +add

 

#feature_cross is one possible approach to modeling #nonlinear problems.

   edit   deselect   +add

 

a #linear model (#linear_layer) can be represented as a #graph: the #inputs are the #features and the #output is the weighed sum of the #inputs (sum of #weights)

   edit   deselect   +add

 

where a #linear model doesn't work we can use the #nonlinear

   edit   deselect   +add

 

a #hidden_layer is a weighed sum of the #Input #values

   edit   deselect   +add

 

a #hidden_layer is a combination of #inputs

   edit   deselect   +add

 

a #hidden_layer is still part of a #linear model

   edit   deselect   +add

 

a #linear model cannot serve #nonlinear problems (e.g. it cannot fit the #predictions to a curve or identify certain areas or complex #patterns

   edit   deselect   +add

 

that's why we want to introduce #nonlinear model - we do that by piping each #hidden_layer node through a #nonlinear function

   edit   deselect   +add

 

the #nonlinear function is called the #activation_function - this lets us model very complicated #relations between the #inputs and #outputs

   edit   deselect   +add

 

#sigmoid #nonlinear #activation_function converts the #weights sum to a value between 0 and 1

   edit   deselect   +add

 

#rectified #linear unit #activation_function (or #relu, for short) often works a little better than a smooth function like the #sigmoid as #relu helps add #nonlinear dynamics into the layers of the #inputs

   edit   deselect   +add

 

#redundancy can be important for a #neural_network as it increases the possibility of #feature_cross that is useful

   edit   deselect   +add

 

an extreme case of #overfitting is #memorizing in which case rather than learning the general #ground_truth the model starts to adapt to the peculiarities and specificities in the #training_set so it becomes less fit to detect the new #patterns in a new set of #data

   edit   deselect   +add

 

nother form of #regularization, called #dropout, is useful for neural networks. It works by randomly "dropping out" unit #activations in a network for a single gradient step.

   edit   deselect   +add

 

#multi_class #neural_networks help identify multiple #labels

   edit   deselect   +add

 

Given a #classification problem with N possible solutions, a #one_vs_all solution consists of N separate #binary #classifiers—one #binary classifier for each possible #outcome.

   edit   deselect   +add

 

#softmax extends the idea of #logistic_regression into a #multi_class world. That is, #softmax assigns decimal #probabilities to each class in a #multi_class problem.

   edit   deselect   +add

 

instead of the #binary answer in #one_vs_all layer, #softmax gives a #probability for each #outcome

   edit   deselect   +add

 

for example, a #number #classification problem is a #multi_class #classification problem with 10 output classes, one for each digit.

   edit   deselect   +add

 

#collaborative_filtering is the task of making #predictions about the #interests of a user based on #interests of many other #users.

   edit   deselect   +add

 

an #embedding_space maps data by its #features so that the items that are more #similar (or are more likely to be used together) are closer to each other in the #space.

   edit   deselect   +add

 

the #embedding_space can consist of many #dimensions and some of them might not have the exact semantic #meanings in which case they are called #latent_dimension representing a #feature that is not explicit in the #data but is rather inferred from it.

   edit   deselect   +add

 

ultimately it's the distances between the data elements that are important in #embedding_space not the actual #values.

   edit   deselect   +add

 

#categorical_data refers to input #features that represent one or more discrete items from a finite set of choices. For example, it can be the set of movies a user has watched, the set of words in a document, or the occupation of a person. #categorical_data is most efficiently represented via #sparse_tensors which are tensors with very few non-zero elements (also see #sparse_vector)

   edit   deselect   +add

 

in order to use #sparse_vector #representations within a machine learning system, we need a way to represent each #sparse_vector as a vector of #numbers so that semantically #similar items (movies or words) have #similar distances in the vector space. But how do you represent a word as a vector of #numbers?

   edit   deselect   +add

 

for example, in #one_hot_encoding you would map one of the 500 000 words in a vocabulary as a #sparse_vector where item 2019 would be 1 and the rest are zero (the vector represents the word)

   edit   deselect   +add

 

a "#bag_of_words" #representation contains chunks of words in a #sparse_vector so several values are 1 and most are zero

   edit   deselect   +add

 

#embeddings translate large #sparse_vector into a lower-dimensional #space that preserves #semantic relationships.

   edit   deselect   +add

 

An #embeddings is a #matrix in which each column is the #vector that corresponds to an item in your #vocabulary. To get the dense #vector for a single #vocabulary item, you retrieve the column corresponding to that item.

   edit   deselect   +add

 

#principal_component_analysis (PCA) has been used to create #word #embeddings. Given a set of instances like #bag_of_word #vectors, PCA tries to find highly correlated #dimensions that can be collapsed into a single #dimension.

   edit   deselect   +add

 

#word2vec is an example of representing the language as a #sparse_vector — mapping semantically #similar words to geometrically close #embedding #vectors

   edit   deselect   +add

 

#word2_vec exploits contextual information like this by training a neural net to distinguish actually co-occurring groups of #words from randomly grouped #words. The #input layer takes a #sparse_vector representation of a target word together with one or more context words.

   edit   deselect   +add

 

#static_model is trained #offline a #dynamic_model is trained #online

   edit   deselect   +add

 

#bias arises when we don't include what we consider to be #typical in a set of #features

   edit   deselect   +add

 

#reporting_bias occurs when the #frequency of events, properties, and/or outcomes captured in a #data set does not accurately reflect their real-world #frequency.

   edit   deselect   +add

 

#automation_bias is a tendency to favor results generated by #automated systems over those generated by #humans

   edit   deselect   +add

 

#selection_bias occurs if a #data set's examples are chosen in a way that is not reflective of their real-world #distribution

   edit   deselect   +add

 

#confirmation_bias, where model builders unconsciously process data in ways that affirm preexisting #beliefs and #hypotheses. In some cases, a model builder may actually keep training a model until it produces a result that aligns with their original hypothesis; this is called #experimenters_bias

   edit   deselect   +add

 

#confirmation_bias #experimenters_bias #selection_bias #automation_bias #reporting_bias are all times of @bias

   edit   deselect   +add

 

#confusion_matrix summarizes how successful #predictions are (it has #precision / #recall #matrix)

   edit   deselect   +add

 

in order to avoid #bias it is important to also test it across the #categorical_data (eg. only for men only for women) with #recall #precision or #accuracy in order to see if it's biased towards a certain category

   edit   deselect   +add

 

#detrended_fluctuation_analysis or #dfa is a method for determining the statistical #self_affinity of a #signal. It is useful for analysing #time_series that appear to be long-memory processes (diverging correlation time, e.g. #power_law decaying autocorrelation function) or #1f_noise.

   edit   deselect   +add

 

The obtained #exponent is similar to the #hurst_exponent, except that #dfa may also be applied to signals whose underlying statistics (such as #mean and #variance) or dynamics are #non_stationary (changing with time)

   edit   deselect   +add

 

In #dfa the scaling exponent #alpha is calculated as the #slope of a straight line fit to the log #log graph of F(n)}F(n) using leas #squares. an exponent of 0.5 would correspond to #uncorrelated #white_noise, an exponent of 1 is #pink_noise

   edit   deselect   +add

 

Another way to detect #pink_noise is to build a graph where the x axis are the #events while the y axis records a #time_series estimation relative to the #standard_deviation from the #average (#mean) time interval.

   edit   deselect   +add

 

At its essence #pink_noise is based on #self_affinity and #self_similarity, so that no matter what scale you look at, the pattern is #similar (#scale_free)

   edit   deselect   +add

 

#power_spectral_analysis describes distribution of #power across #frequency components composing the #signal - for #pink_noise we have a 1/f relationship — few powerful signals with low frequency, a long tail of less powerful ones (of which there are many) (hence #1f_noise)

   edit   deselect   +add

 

#envelope is a smooth #curve outlining the extremes of a #signal and it is also calculated in #hilbert_transform, which, in turn is used in calculating #dfa or #detrended_fluctuation_analysis

   edit   deselect   +add

 

#detrended_fluctuation_analysis (#dfa) has proven particularly useful, revealing that genetic #variation, normal development, or #disease can lead to differences in the #scale_free #amplitude #modulation of oscillations https://www.frontiersin.org/articles/10.3389/fphys.2012.00450/full

   edit   deselect   +add

 

The reason why #chaotic #variation (#pink_noise) is indicative of a #healthy state is because it reflects #winnerless_competition behind the process. If there's a deviation in this dynamics (eg some #patterns), it could mean that one process is #dominating the rest.

   edit   deselect   +add

 

#self_affinity is a property of #fractal #time_series where the small parts of the whole are #similar to the whole

   edit   deselect   +add

 

#self_affinity processes and #self_similar structures have in common that the statistical #distribution of the measured quantity follows a #power_law function, which is the only mathematical function without a characteristic scale. Self-affine and #self_similar phenomena are therefore called "#scale_free.”

   edit   deselect   +add

 

In #power_law #distribution the #mean would not necessarily be the same as the #median (which is are closer to each other in #normal #distribution)

   edit   deselect   +add

 

A #power_law #distribution means that there is big number of #small #variation and a small number of #big #variation (hence the line with a negative #slope when expressed as a #log)

   edit   deselect   +add

 

In a #1f #signal the lower #frequency objects have larger #amplitude than the higher #frequency objects (#1f_noise) https://www.frontiersin.org/files/Articles/23105/fphys-03-00450-HTML/image_m/fphys-03-00450-g001.jpg

   edit   deselect   +add

 

the #frequency of a certain #size of flower being inversely #proportional to its #size.

   edit   deselect   +add

 

#time_series in which all #frequency are represented with the same #amplitude will lack the rich variability of the #scale_free #time_series and is referred to as "#white_noise”

   edit   deselect   +add

 

To estimate the #scale_free property we calculate the #standard_deviation (#signal in relation to #mean) over the differently sized #time_windows. If as the #time_windows size increases the #standard_deviation also increases, we're dealing with a #scale_free process. If the #scaling_effect is not there, then it's not a scale free process.

   edit   deselect   +add

 

a stationary #random #fluctuating process has a #signal profile, which is #self_affine with a #scaling_exponent α = 0.5

   edit   deselect   +add

 

when we add #memory in the sense that the #probability of an action depends on the previous actions that the walker has made — we will get a process that will exhibit #self_affinity across scales (#scale_free)

   edit   deselect   +add

 

Different classes of processes with #memory exist: #positive_correlation and those with #anti_correlation - anti-correlations can be seen as a #stabilizing mechanism - a future action is more likely to be opposite than the ones made before. In this case on longer windows (time scales) we will have lower #fluctuating so the coefficient will be lower (α 0 to 0.5) - has #memory, #anti_correlation. 0.5 - #random, 0.5 to 1 - has #memory and #positive_correlation (previous actions increase the likelyhood of that action taken again) https://www.frontiersin.org/files/Articles/23105/fphys-03-00450-HTML/image_m/fphys-03-00450-g003.jpg

   edit   deselect   +add

 

for #dfa the signal is transformed into the #cumulative_signal, then it is split into several #windows equal in size on the #log scale. then for each the data is #detrended and #standard_deviation is calculated for each #window. then #fluctuating function is calculated as the mean #standard_deviation for all the #windows. Then we plot that as a graph on #log scales. The #dfa exponent α is the #slope of the trend. If it follows a straight line 45° then it means that with every #window increase we do not have a #proportional increase in the mean of fluctuation (so it is #linear). If it is more, then it is #non_linear and shows that it is in fact #scale_free

   edit   deselect   +add

 

The lower end of the fitting range is at least four samples, because #linear #detrending will perform poorly with less points (Peng et al., 1994). For the high end of the fitting range, #dfa estimates for window sizes >10% of the #signal length are more noisy due to a low number of windows available for averaging (i.e., less than 10 windows). Finally, the 50% overlap between windows is commonly used to increase the number of windows, which can provide a more accurate estimate of the fluctuation function especially for the long-time-scale windows.

   edit   deselect   +add

 

A #brown_noise process can be obtained by successively summing data points in the #white_noise process. https://www.researchgate.net/publication/232236967_A_tutorial_introduction_to_adaptive_fractal_analysis/figures?lo=1

   edit   deselect   +add

 

Using the classical #dfa method, the #cumulative_sum of data are divided into segments, and the #variance of these sums is studied as a function of segment length after linearly detrending them in each segment. https://www.nature.com/articles/s41598-019-42732-7

   edit   deselect   +add

 

In #dfa, data are divided into segments of length L and are #linearly detrended. The #square_root of the #variance (called #fluctuation) of the detrended data is studied as a function of L. It can be shown that a #linear relationship between the #logarithm of the #fluctuation and the #logarithm of L is indicative of a #power_law behavior of the spectrum. https://www.nature.com/articles/s41598-019-42732-7

   edit   deselect   +add

 

If a #linear relationship between the length of a #segment or #time_windows and the strength of the #fluctuation (or the #square_root of the #variance of the #cumulative_signal) exists, the slope of the corresponding line is also referred to as #hurst_exponent.

   edit   deselect   +add

 

For #white_noise the #hurst_exponent or the relation between the #time_windows and the #fluctuation (square root of #variance) will be #linear: when we double the #time_windows the #fluctuation (or #variance of the #cumulative_sum) will also double.

   edit   deselect   +add

 

For #pink_noise #1f_noise the #hurst_exponent will equal #1 and will mean that for #time_windows twice longer the #fluctuation will increase about 4 times. In other words, the the longer is the #time_windows the more #fluctuation occurs (#positive_correlation).

   edit   deselect   +add

 

#hurst_exponent in this context is #alpha_exponent, because we use #alpha_exponent for #non_stationary processes

   edit   deselect   +add

 

if #alpha_exponent is more than 1, it means that for every increase of scale (#time_windows) the cumulative_sum of #fluctuation increases a lot. That means, the longer we look at the process, the more likely it is to have big #fluctuation — there is a tendency in the #short_term to be #small and in the #long_term there's a tendency to be #big.

   edit   deselect   +add

 

the #cumulative_sum of the difference from the #average of a #time_series will be #brown_noise (#random_walk) for the #white_noise

   edit   deselect   +add

 

In contrast, #0.5 < #hurst_exponent < #1 indicates a #correlated process for #f_gn or what is termed a #persistent process for #f_bm. In this case, #increases in the signal (for #f_gn) or increments of the signal (for #f_bm) are likely to followed by further #increase, and #decrease are likely to be followed by #decreases (i.e., a #positive #long_term #correlation). Anti-#persistent and #persistent processes contain #structure that distinguishes them from truly #random sequences of data. (2) (PDF) A tutorial introduction to adaptive fractal analysis. Available from: https://www.researchgate.net/publication/232236967_A_tutorial_introduction_to_adaptive_fractal_analysis [accessed Apr 21 2021].

   edit   deselect   +add

 

The difference between the #exponent or #exponential_decay and the #power_law #decay is that #power_law #decay is slower: there are more values with a low #amplitude in the case of the #power_law https://math.stackexchange.com/questions/164436/difference-between-power-law-distribution-and-exponential-decay

   edit   deselect   +add

 

#downsampling (in this context) means #training on a disproportiona#tely_low_subset_of_the_#majority_class examples.

   edit   deselect   +add

 

#up#weighting means adding an example #weight to the downsampled class equal to the factor by which you performed #downsampling.

   edit   deselect   +add

 

#normalizing - transforming #numeric data to the same #scale as other #numeric data.

   edit   deselect   +add

 

#bucketing - transforming #numeric (usually #continuous) #data to #categorical_data.

   edit   deselect   +add

 

#scaling means converting #floating_point #feature #values from their #natural #range (for example, 100 to 900) into a #standard #range—usually 0 and 1

   edit   deselect   +add

 

If your data set contains extreme #outliers, you might try #feature_clipping, which caps all feature #values above (or below) a certain value to fixed value. https://developers.google.com/machine-learning/data-prep/transform/normalization

   edit   deselect   +add

 

#log #scaling computes the #log of your values to compress a wide #range to a narrow #range. #log_scaling is helpful when a handful of your values have many points, while most other values have few points. This data #distribution is known as the #power_law #distribution. Movie ratings are a good example. In the chart below, most movies have very few ratings (the data in the tail), while a few have lots of ratings (the data in the head). #log_scaling changes the #distribution, helping to improve linear model performance.

   edit   deselect   +add

 

#z_score is a #variation of #scaling that represents the number of #standard_deviations away from the #mean. You would use z-score to ensure your #feature distributions have #mean = 0 and std = 1. It’s useful when there are a few #outliers, but not so extreme that you need #clipping.

   edit   deselect   +add

 

#transformation of #numeric #features into #categorical #features, using a set of #thresholds, is called #bucketing (or #binning) - creating #buckets

   edit   deselect   +add

 

creating #buckets that each have the same number of points. This technique is called #quantile_bucketing.

   edit   deselect   +add

 

when we represent a #categorical #value with a #number it's called a #vocabulary

   edit   deselect   +add

 

#one_hot_encoding represents #numeric #values as #vectors - which can then be further compressed with #sparse_vector

   edit   deselect   +add

 

#grouping #un#labeled #examples is called #clustering. As the examples are un#labeled, #clustering relies on #unsupervised #machine_learning. If the examples are #labeled, then #clustering becomes #classification.

   edit   deselect   +add

 

#hierarchical_clustering creates a tree of #clusters. #hierarchical #clustering, not surprisingly, is well suited to #hierarchical #data, such as #taxonomies.

   edit   deselect   +add

 

#distribution_based_#clustering This #clustering approach assumes #data is composed of #distributions, such as#gaussian_#distributions. Then it #clusters them accordingly.

   edit   deselect   +add

 

#density_based_clustering connects areas of high example #density into #clusters. This #clustering allows for arbitrary-shaped #distributions as long as dense areas can be connected. These algorithms have difficulty with data of varying #densities and high #dimensions and also with #outliers.

   edit   deselect   +add

 

#centroid_based_#clustering organizes the data into #non_#hierarchical_clusters, in contrast to #hierarchical #clustering defined below. #k_means is the most widely-used centroid-based #clustering #algorithm.

   edit   deselect   +add

 

In order to perform #clustering we need to quantify the #similarity between examples by creating the #similarity_metrics for our #dataset

   edit   deselect   +add

 

for #data #processing we need to create #quantile or use #quantile_bucketing when the #distribution is #poisson — neither #gaussian nor #power_law.

   edit   deselect   +add

 

when the #distribution is #gaussian we can #normalizing our #data

   edit   deselect   +add

 

when the #distribution is #power_law we might want to use #log_scaling #normalizing for our data

   edit   deselect   +add

 

we can do either #manual #similarity or #supervised #similarity. you switch to a #supervised_similarity_measure when you have trouble creating a #manual_similarity_measure.

   edit   deselect   +add

 

#mean_square_error shows the #average squared #loss for an #example

   edit   deselect   +add

 

we can calculate #similarity by calculating a root #mean_square_error or the #sums of the #features (e.g. size and price). the lower the value, the higher is the similarity.

   edit   deselect   +add

 

For #categorical_data we can calculate #similarity using #jaccard_similarity which shows the proportion of intersection between the #sets

   edit   deselect   +add

 

#k_means groups points into #clusters by minimizing the #distances between points and their #cluster’s #centroid (as seen in Figure 1 below). The #centroid of a #cluster is the #mean of all the points in the #cluster.

   edit   deselect   +add

 

Instead of comparing manually-combined #feature #data, you can reduce the #feature #data to #representations called #embeddings, and then compare the #embeddings

   edit   deselect   +add

 

#embeddings are generated by training a #supervised deep neural network (#dnn) on the #feature data itself. The #embeddings_map the #feature data to a #vector in an #embedding_space. Typically, the #embedding_space has fewer dimensions than the #feature data in a way that captures some #latent #structure of the #feature data set.

   edit   deselect   +add

 

A #dnn that learns #embeddings of #input data by predicting the #input data itself is called an #autoencoder. An #autoencoder is the simplest choice to generate #embeddings. However, an #autoencoder isn't the optimal choice when certain features could be more important than others in determining #similarity.

   edit   deselect   +add

 

Since this #dnn predicts a specific input #feature instead of predicting all input #features, it is called a predictor #dnn

   edit   deselect   +add

 

To train the #dnn, you need to create a #loss_functionby following these steps: 1) calculate the #loss for every #output of #dnn. For #numeric outputs use #mean_square_error, for #categorical use #log_loss, for #multivalent #categorical use #softmax_cross_entropy (#entropy) loss.

   edit   deselect   +add

 

in #poisson distribution the #decay happens much faster than in the #power_law #distribution — if in #power_law you have a significant number of nodes the #tail, then in #poisson you only have a few.

   edit   deselect   +add

 

A #similarity measure takes the

   edit   deselect   +add

 

#embeddings generated by our neural network (#dense_features) and returns a number measuring their #similarity.

   edit   deselect   +add

 

To calculate #similarity we have 3 measures to choose from: #euclidian_distance (substraction of vectors), #cosine_distance (cosine of the angle between the vectors) and the #dot_product (cosine multiplied by the lengths of both vectors)

   edit   deselect   +add

 

In contrast to the #cosine_distance, the #dot_product is proportional to the #vector #length. This is important because examples that appear very frequently in the training set (for example, popular YouTube videos) tend to have embedding #vectors with large #lengths. If you want to capture #popularity, then choose #dot_product.

   edit   deselect   +add

 

#cluster #cardinality is the number of examples per #cluster. We are looking for #outliers and if we do find them, this may indicate some interesting #patterns

   edit   deselect   +add

 

Cluster #magnitude is the sum of #distances from all examples to the #centroid of the #cluster. Similar to #cardinality, check how the #magnitude varies across the #clusters, and investigate #anomalies. and #outliers

   edit   deselect   +add

 

Notice that a higher #cluster #cardinality tends to result in a higher #cluster #magnitude, which intuitively makes sense. Clusters are #anomalous when #cardinality doesn't correlate with #magnitude relative to the other #clusters.

   edit   deselect   +add

 

#content_based_filtering Uses #similarity between items to #recommend items similar to what the user likes.

   edit   deselect   +add

 

#collaborative_filtering uses Uses #similarity between #queries and #items simultaneously to provide #recommendations.

   edit   deselect   +add

 

Both #content_based_filtering and #collaborative_filtering map each #item and each #query (or #context) to an #embedding_vector

   edit   deselect   +add

 

#recommendations - We again place our #users in the same #embedding_space to best explain the #feedback_matrix: for each (#user, #item) pair, we would like the #dot_product of the #user #embedding and the #item #embedding to be close to 1 when the #user watched the movie, and to 0 otherwise.

   edit   deselect   +add

 

The #dot_product of the #user_#matrix and #item_#matrix yields a #recommendation #matrix that contains not only the original user ratings but also #predictions for the movies that each user hasn't seen

   edit   deselect   +add

 

#matrix_factorization In math, a mechanism for finding the matrices whose #dot_product approximates a #target_matrix.

   edit   deselect   +add

 

#generative_adversarial_networks (#gans) are an exciting recent innovation in #machine_learning. #gans are #generative models: they create new #data_instances that resemble your #training_data. For example, #gans can create images that look like photographs of human faces, even though the faces don't belong to any real person.

   edit   deselect   +add

 

#gans achieve this level of realism by pairing a #generator, which learns to produce the target output, with a #discriminator, which learns to distinguish true data from the output of the #generator. The #generator tries to fool the #discriminator, and the #discriminator tries to keep from being fooled.

   edit   deselect   +add

 

#generative" describes a class of statistical models that contrasts with #discriminative models. #generative models can generate new #data #instances. #discriminative models #discriminate between different kinds of #data #instances.

   edit   deselect   +add

 

More formally, given a set of #data #instances X and a set of #labels Y: #generative models capture the joint #probability p(X, Y), or just p(X) if there are no #labels. #discriminative models capture the #conditional #probability p(Y | X).

   edit   deselect   +add

 

The #generator learns to generate #plausible #data. The generated instances become negative #training #examples for the #discriminator.

   edit   deselect   +add

 

The #discriminator learns to distinguish the #generator's #fake #data from #real #data. The #discriminator #penalizes the #generator for producing implausible results.

   edit   deselect   +add

 

Through #backpropagation, the #discriminator s #classification provides a signal that the #generator uses to update its #weights.

   edit   deselect   +add

 

The #discriminator in a #gan is simply a #classifier. It tries to distinguish real #data from the #fake #data created by the #generator.

   edit   deselect   +add

 

The #discriminator connects to two #loss functions. During #discriminator training, the #discriminator ignores the #generator #loss and just uses the #discriminator #loss.

   edit   deselect   +add

 

The #generator part of a #gan learns to create #fake data by incorporating #feedback from the #discriminator. It learns to make the #discriminator #classify its output as real.

   edit   deselect   +add

 

The #generator feeds into the #discriminator net, and the #discriminator produces the output we're trying to affect. The #generator #loss penalizes the #generator for producing a sample that the #discriminator network classifies as #fake.

   edit   deselect   +add

 

#research has suggested that if your #discriminator is too good, then #generator training can fail due to #vanishing_gradients. In effect, an optimal #discriminator doesn't provide enough #information for the #generator to make #progress.

   edit   deselect   +add

 

#wasserstein_loss: The #wasserstein_loss is designed to prevent #vanishing_gradients even when you train the #discriminator to #optimality.

   edit   deselect   +add

 

#convolutional_neural_network (#cnn) could be used to progressively extract higher- and higher-level #representations of the image #content.

   edit   deselect   +add

 

#backpropagation is a process of calculating the #gradient for the #neural_network. it is used to see how to bring the #loss_function to the minimum, in which direction the learning should take place.

   edit   deselect   +add

 

in the process of #learning each #layer of a #neural_network will have #neurons with #weights ascribed to them, which enable #differentiation of various #features

   edit   deselect   +add

 

The #weights of the neurons combine through #layer using the #activation_function (e.g. #sigmoid or #relu), which then leads to only a certain neuron at the last layer to get activated. We can say that first #layer detects some general #features, the next one — more specific ones, and so on. but this is not the case.

   edit   deselect   +add

 


        
Show Nodes with Degree > 0:

0 0

Filter Graphs:


Filter Time Range
from: 0
to: 0


Recalculate Metrics Reset Filters
Show Labels for Nodes > 0 size:

0 0

Default Label Size: 0

0 20



Edges Type:



Layout Type:


 

Reset to Default
Language Processing Settings:

language logic: stop words:
 
merged nodes: unmerge
show as nodes: double brackets: categories as mentions:
network structure:
×  ⁝⁝ 
×  ⁝⁝ 
Network Structure Insights
 
mind-viral immunity:
N/A
  ?
stucture:
N/A
  ?
The higher is the network's structure diversity and the higher is the alpha in the influence propagation score, the higher is its mind-viral immunity — that is, such network will be more resilient and adaptive than a less diverse one.

In case of a discourse network, high mind-viral immunity means that the text proposes multiple points of view and propagates its influence using both highly influential concepts and smaller, secondary topics.
The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.

Modularity
0
Influence Distribution
0
%
Topics Nodes in Top Topic Components Nodes in Top Comp
0
0
%
0
0
%
Nodes Av Degree Density Weighed Betweenness
0
0
0
0
 

Narrative Influence Propagation:
  ?
The chart above shows how influence propagates through the network. X-axis: lemma to lemma step (narrative chronology). Y-axis: change of influence.

The more even and rhythmical this propagation is, the stronger is the central idea or agenda (see alpha exponent below ~ 0.5 or less).

The more variability can be seen in the propagation profile, the less is the reliance on the main concepts (agenda), the stronger is the role of secondary topical clusters in the narrative.
propagation dynamics: | alpha exponent: (based on Detrended Fluctuation Analysis of influence) ?   show the chart
We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify fractality of this time series, plotting the log2 scales (x) to the log2 of accumulated fluctuations (y). If the resulting loglog relation can be approximated on a linear polyfit, there may be a power-law relation in how the influence propagates in this narrative over time (e.g. most of the time non-influential words, occasionally words with a high influence).

Using the alpha exponent of the fit (which is closely related to Hurst exponent)), we can better understand the nature of this relation: uniform (pulsating | alpha <= 0.65), variable (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).

For maximal diversity, adaptivity, and plurality, the narrative should be close to "fractal" (near-critical state). For fiction, essays, and some forms of poetry — "uniform". Informative texts will often have "variable + stationary" score. The "complex" state is an indicator that the text is always shifting its state.

Degree Distribution:
  calculate & show   ?
(based on kolmogorov-smirnov test) ?   switch to linear
Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.

This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.
If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.

Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.
×  ⁝⁝ 
     
Main Topical Groups:

please, add your data to display the stats...
+     full stats   ?     show categories

The topics are the nodes (words) that tend to co-occur together in the same context (next to each other).

We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph using the Force Atlas algorithm (Jacomy et al) and are given a distinct color.
Most Influential Elements:
please, add your data to display the stats...
+     Reveal Non-obvious   ?

We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.
Network Structure:
N/A
?
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.


Reset Graph   Export: Show Options
Action Advice:
N/A
Structural Gap
(ask a research question that would link these two topics):
N/A
Reveal the Gap   Generate a Question   ?
 
A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.

This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

Latent Topical Brokers
:
N/A
?

These are the latent brokers between the topics: the nodes that have an unusually high rate of influence (betweenness centrality) to their freqency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.

These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

Emerging Keywords
N/A

Evolution of Topics
(frequency / time) ?
The chart shows how the main topics and the most influential keywords evolved over time. X-axis: time period (split into 10% blocks). Y-axis: cumulative frequency of occurrence.

Drag the slider to see how the narrative evolved over time. Select the checkbox to recalculate the metrics at every step (slower, but more precise).

 
Main Topics
(according to Latent Dirichlet Allocation):
loading...
 ?  

LDA stands for Latent Dirichlet Allocation — it is a topic modelling algorithm based on calculating the maximum probability of the terms' co-occurrence in a particular text or a corpus.

We provide this data for you to be able to estimate the precision of the default InfraNodus topic modeling method based on text network analysis.
Most Influential Words
(main topics and words according to LDA):
loading...

We provide LDA stats for comparison purposes only. It works with English-language texts at the moment. More languages are coming soon, subscribe @noduslabs to be informed.

Sentiment Analysis


positive: | negative: | neutral:
reset filter    ?  

We analyze the sentiment of each statement to see whether it's positive, negative, or neutral. You can filter the statements by sentiment (clicking above) and see what kind of topics correlate with every mood.

The approach is based on AFINN and Emoji Sentiment Ranking

 
Use the Bert AI model for English, Dutch, German, French, Spanish and Italian to get better results (slower). Standard model is based on a fixed AFINN dictionary and works faster, but for English only.

Text Statistics:
Word Count Unique Lemmas Characters Lemmas Density
0
0
0
0
Network Statistics:
Show Overlapping Nodes Only

⤓ Download as CSV  ⤓ Download an Excel File

Top Relations / Bigrams
(both directions):

⤓ Download   ⤓ Directed Bigrams CSV   ?

The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default as it allows us to better detect general patterns.

As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).
Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):

 
We will build two graphs:
1) Google search results for your query;
2) Related searches for your query (Google's SERP);
Click the Missing Content tab to see the graph that shows the difference between what people search for and what they actually find, indicating the content you could create to fulfil this gap.
Find a market niche for a certain product, category, idea or service: what people are looking for but cannot yet find*

 
We will build two graphs:
1) the content that already exists when you make this search query (informational supply);
2) what else people are searching for when they make this query (informational demand);
You can then click the Niche tab to see the difference between the supply and the demand — what people need but do not yet find — the opportunity gap to fulfil.
Please, enter your query to visualize Google search results as a graph, so you can learn more about this topic:

   advanced settings    add data manually
Enter a search query to analyze the Twitter discourse around this topic (last 7 days):

     advanced settings    add data manually

Sign Up