×
Download the Graph Image:

PNG (Image)  SVG (Hi-Res)

Download the Graph Data:

JSON  CSV  Gexf (Gephi)

Download the Text Data:

CSV tagged w Topics   Plain Text
Top keywords (global influence):
Top topics (local contexts):
Explore the main topics and terms outlined above or see them in the excerpts from this text below.
See the relevant data in context: click here to show the excerpts from this text that contain these topics below.
Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.
Tip: here are the keyword queries that people search for but don't actually find in the search results.

#machine_learning #supervised #unsupervised

   edit   unpin & show all

 

#features are used in #machine_learning for #differentiation

   edit   unpin & show all

 

#features are used for #training in #machine_learning

   edit   unpin & show all

 

the #relationship is called a #model in #machine_learning

   edit   unpin & show all

 

#supervised #machine_learning finds #patterns between #data and #labels

   edit   unpin & show all

 

#patterns are used to make #predictions

   edit   unpin & show all

 

#supervised works with #labeled #data

   edit   unpin & show all

 

the goal of #unsupervised is to identify meaningful #patterns in #data

   edit   unpin & show all

 

sometimes a #model can find #patterns that represent #stereotypes of #bias

   edit   unpin & show all

 

#clustering is a type of #unsupervised #learning

   edit   unpin & show all

 

with #reinforcement_learning (RL) you set up a #model (called an #agent in RL) where it receives a #reward each time it performs well (#reward_function)

   edit   unpin & show all

 

a #shaped #reward increases in the states closer to the #goal #state

   edit   unpin & show all

 

a #sparse #reward is in the #goal #state only

   edit   unpin & show all

 

#positive_reinforcement is an important element of #reinforcement_learning

   edit   unpin & show all

 

if the #reward is providing the #features to the #model that could improve the performance

   edit   unpin & show all

 

#machine_learning #problems: #classification, #regression, #clustering, #association learning #structured_output #ranking

   edit   unpin & show all

 

#clustering is an #unsupervised #learning problem

   edit   unpin & show all

 

#regression requires labeled data — #supervised learning problem

   edit   unpin & show all

 

#classification requires a set of #labels - so it is #supervised

   edit   unpin & show all

 

a #neural_network works through #representations

   edit   unpin & show all

 

#machine_learning process as an #experiment where we run #test after test after test to converge on a workable #model

   edit   unpin & show all

 

A well-defined #problem has both #inputs and #outputs. #inputs are the #features. #outputs are the #labels to predict.

   edit   unpin & show all

 

#training means creating or #learning the #model

   edit   unpin & show all

 

#inference means applying the trained #model to #unlabeled #examples

   edit   unpin & show all

 

A #regression #model #predicts continuous #values.

   edit   unpin & show all

 

A #classification #model #predicts #discrete #values.

   edit   unpin & show all

 

#machine_learning needs to provide #decisions rather than just #predictions

   edit   unpin & show all

 

#labels are the #variables or #values for #predictions

   edit   unpin & show all

 

#features are #inputs #variables describing the #data

   edit   unpin & show all

 

#model_training is done by the #data which has #features and #labels so that it knows what #correlations to extract

   edit   unpin & show all

 

#model maps #examples to predicted #labels

   edit   unpin & show all

 

#loss_function is showing us the degree of #deviation of the #model #prediction from the real #values

   edit   unpin & show all

 

#loss_function can be a square #difference between the #prediction and the #labels

   edit   unpin & show all

 

#loss_function = #observation - #prediction

   edit   unpin & show all

 

#mean_square_error estimates the #deviation of the #loss_function - #prediction from #labels for each element on average

   edit   unpin & show all

 

#model_training is usually based on reducing the #loss in the #loss_function via #mean_square_error but not only

   edit   unpin & show all

 

the #gradient_descent approach is used to minimize the #loss_function

   edit   unpin & show all

 

the #learning_rate determines the size of the #gradient_descent

   edit   unpin & show all

 

#epoch is the number of #batches used in #machine_learning

   edit   unpin & show all

 

if the #training #loss is decreased we say that it's #converged

   edit   unpin & show all

 

#goldilocks #learning rate for a curve, where a #gradient descent reaches the minimum point in the fewest number of steps?

   edit   unpin & show all

 

In #supervised #learning, a machine #learning algorithm builds a model by examining many examples and attempting to find a model that minimizes #loss; this process is called empirical #risk_minimization.

   edit   unpin & show all

 

#loss is the penalty for a bad #prediction. That is, loss is a number indicating how bad the model's #prediction was on a single example.

   edit   unpin & show all

 

Mean square error (#mse) is the average squared #loss per example over the whole #dataset.

   edit   unpin & show all

 

#iterative #learning is used to decrease #loss over time by making small steps and receiving #feedback on the #outputs

   edit   unpin & show all

 

Usually, you #iterate until overall #loss stops changing or at least changes extremely slowly. When that happens, we say that the #model has #converged.

   edit   unpin & show all

 

#epoch represents a full training pass over the entire #dataset such that each example has been seen once. Thus, an #epoch represents N / #batch size training iterations, where N is the total number of examples.

   edit   unpin & show all

 

#learning_rate is a scalar used to #train a model via #gradient_descent. During each iteration, the #gradient_descent algorithm multiplies the #learning_rate by the #gradient. The resulting product is called the #gradient step. #learning_rate is a key #hyperparameter.

   edit   unpin & show all

 

#batch is a number of #examples used in an #iteration

   edit   unpin & show all

 

#anomalies in #features may indicate a potential problem in a #dataset - one should be more careful using that sort of data.

   edit   unpin & show all

 

#learning_rate specifies the size of the step, #batch specifies how many elements we take into the learning process, #epoch specify how many iterations we're going to have

   edit   unpin & show all

 

a #synthetic_feature is made out of several #features and may help #prediction

   edit   unpin & show all

 

a #correlation_matrix shows if there are any #correlations between the #features

   edit   unpin & show all

 

we take a #training_set from our #data and a #test_set then we train the model on the #training_set to see how well it #prediction on the #test_set

   edit   unpin & show all

 

An #overfit model gets a low #loss during training but does a poor job #predicting new data.

   edit   unpin & show all

 

#machine_learning 's goal is to predict well on new #data drawn from a (hidden) true #probability #distribution.

   edit   unpin & show all

 

The less complex a #machine_learning model, the more likely that a good #empirical result is not just due to the #peculiarities of the #sample.

   edit   unpin & show all

 

Partitioning a #data set into a #training_set and #test_set lets you judge whether a given #model will generalize well to new #data.

   edit   unpin & show all

 

#partitioning a #data set into a #training_set and a #test_set

   edit   unpin & show all

 

a #training_set can be split into a smaller #training_set, #test_set and a #validation_set so that the model can be trained better and no #overfitting occurs

   edit   unpin & show all

 

#features are very important for #machine_learning models

   edit   unpin & show all

 

#features engineering should remove #outliers (which might lead to the problem that a #model cannot #predicting outstanding events)

   edit   unpin & show all

 

#data_visualization is important for knowing #data and improving #machine_learning models

   edit   unpin & show all

 

#one_hot_encoding allows us to incorporate categorical #data into our #model

   edit   unpin & show all

 

for very large #values #sparse #representation is used

   edit   unpin & show all

 

#binning_values allows us to to simplify the #data and bring it to the #feature_vector using #one_hot_encoding

   edit   unpin & show all

 

#binning_values by #quantile ensures the number of #examples in each bucket is different

   edit   unpin & show all

 

#feature_crossing can be a powerful way to improve #prediction by combining the #data #features of the #dataset

   edit   unpin & show all

 

#feature_crossing is often done for #one_hot_encoding where multiple #features are crossed to produce interesting #feature_vector

   edit   unpin & show all

 

#feature_crossing is one learning strategy #neural_network is another

   edit   unpin & show all

 

minimize #loss_complexity, which is called #structural_risk_minimization allows to avoid #overfitting the #model

   edit   unpin & show all

 

the #loss_term, measures how well the #model fits the #data, and the #regularization term, which measures model #complexity.

   edit   unpin & show all

 

model #complexity as a function of the #weights of all the #features in the #model.

   edit   unpin & show all

 

model #complexity as a function of the total number of #features with nonzero #weights. (A later module covers this approach.)

   edit   unpin & show all

 

#model developers tune the overall impact of the #regularization term by multiplying its value #scalar known as #lambda (also called the #regularization rate).

   edit   unpin & show all

 

#regularization term, which measures model #complexity.

   edit   unpin & show all

 

#regularization is a technique used in an attempt to solve the #overfitting problem in statistical models.

   edit   unpin & show all

 

#regularization penalizes the #loss_function in that it pushes the model to give lower #weights to each parameter in the model

   edit   unpin & show all

 

a model is #learning the #weights for each of the #features as it is #training itself to minimize #loss and #complexity

   edit   unpin & show all

 

#regularization penalizes the #loss_function for too much #complexity (a high number of #features with nonzero #weights )

   edit   unpin & show all

 

#sigmoid_function maps the the output of the #linear_layer of a model trained with #logistic_regression between zero and one

   edit   unpin & show all

 

#logistic_regression returns a #probability for a #classification

   edit   unpin & show all

 

In order to map a #logistic_regression value to a binary category, you must define a #classification_threshold (also called the #decision_threshold) #threshold

   edit   unpin & show all

 

A #true_positive is when a model correctly #predicts the #positive_class (it made it a prediction and it was true). A #true_negative is when a model correctly #predicts a #negative_class

   edit   unpin & show all

 

A #false_positive is when a model incorrectly #predicts the #positive_class. A #false_negative is when a model incorrectly #predicts the #negative_class

   edit   unpin & show all

 

#accuracy of a model #prediction is the ratio of the #correct_predictions to the total number of #predictions

   edit   unpin & show all

 

#prediction #predicts

   edit   unpin & show all

 

#predicting the #prediction

   edit   unpin & show all

 

#accuracy then is a ratio of #true_positive plus #true_negative to the sum of all other predictions (#true_negative + #true_positive + #false_negative + #false_positive)

   edit   unpin & show all

 

#accuracy alone doesn't tell the full story when you're working with a #class_imbalanced_data_set, like this one, where there is a significant #disparity between the number of #positive and #negative #labels.

   edit   unpin & show all

 

#precision is a ratio of #true_positive to #total_positives claimed (#true_positive + #false_positive)

   edit   unpin & show all

 

#recall is the ratio of #true_positive to the #total_positives that really happend (#true_positive + #false_negative)

   edit   unpin & show all

 

#classification_threshold should strike a balance between #precision and #recall so that both values are at their maximum

   edit   unpin & show all

 

#precision is based on a #claim and #recall is based on #reality

   edit   unpin & show all

 

The #f1_score is the harmonic #mean of the #precision and #recall

   edit   unpin & show all

 

#roc_curve (receiver operating characteristic curve) is a graph showing the performance of a #classification model at all #classification thresholds: the #true_positive rate vs the #false_positive rate

   edit   unpin & show all

 

#auc provides an aggregate measure of performance across all possible #classification #threshold

   edit   unpin & show all

 

#true_positive rate is basically the #recall because it's the relation of #true_positive to the sum of #true_positive and #false_negative

   edit   unpin & show all

 

#false_positive is the reverse of that: what is the ratio of the #claim of the positive that is not true to #reality

   edit   unpin & show all

 

#logistic_regression #predictions should be #unbiased. That is: "average of #predictions" should ≈ "average of observations"

   edit   unpin & show all

 

#prediction_bias is a quantity that measures how far apart the #predictions are from the #observations

   edit   unpin & show all

 

A #z_score is the number of #standard_deviations from the #mean for a particular raw value

   edit   unpin & show all

 

#sparse_vector often contain many #dimensions. Creating a #feature_cross results in even more #dimensions. Which may lead to a higher use of #resources and #memory

   edit   unpin & show all

 

#sparse_vector often contain many #dimensions. Creating a #feature_cross results in even more #dimensions. Which may lead to a higher use of #resources and #memory

   edit   unpin & show all

 

in a high-dimensional #sparse_vector it is good to encourage as many #weights as possible to be zero, so that we reduce the #complexity of the #model and the toll on #resources

   edit   unpin & show all

 

#l2_regularization encourages #weights to be small, but doesn't force them to exactly zero

   edit   unpin & show all

 

#l2_regularization is a sum of #weights and it encourages them to be smaller to reduce the #complexity of the model

   edit   unpin & show all

 

#lambda is used together with #l2_regularization to reduce complexity without pushing the #weights too high

   edit   unpin & show all

 

#regularization and #l2_regularization

   edit   unpin & show all

 

#l2_regularization penalizes #weights square, while #l1_regularization penalizes the #weights

   edit   unpin & show all

 

if we then take a #derivative of #l1_regularization we will remove all the zero #values

   edit   unpin & show all

 

#neural_networks are a more sophisticated version of #feature_cross. In essence, neural networks do the appropriate #feature_crossing for you.

   edit   unpin & show all

 

"#nonlinear" means that you can't accurately predict a #label with a model of the form In other words, the "#decision_surface" is not a #line

   edit   unpin & show all

 

#feature_cross is one possible approach to modeling #nonlinear problems.

   edit   unpin & show all

 

a #linear model (#linear_layer) can be represented as a #graph: the #inputs are the #features and the #output is the weighed sum of the #inputs (sum of #weights)

   edit   unpin & show all

 

where a #linear model doesn't work we can use the #nonlinear

   edit   unpin & show all

 

a #hidden_layer is a weighed sum of the #Input #values

   edit   unpin & show all

 

a #hidden_layer is a combination of #inputs

   edit   unpin & show all

 

a #hidden_layer is still part of a #linear model

   edit   unpin & show all

 

a #linear model cannot serve #nonlinear problems (e.g. it cannot fit the #predictions to a curve or identify certain areas or complex #patterns

   edit   unpin & show all

 

that's why we want to introduce #nonlinear model - we do that by piping each #hidden_layer node through a #nonlinear function

   edit   unpin & show all

 

the #nonlinear function is called the #activation_function - this lets us model very complicated #relations between the #inputs and #outputs

   edit   unpin & show all

 

#sigmoid #nonlinear #activation_function converts the #weights sum to a value between 0 and 1

   edit   unpin & show all

 

#rectified #linear unit #activation_function (or #relu, for short) often works a little better than a smooth function like the #sigmoid as #relu helps add #nonlinear dynamics into the layers of the #inputs

   edit   unpin & show all

 

#redundancy can be important for a #neural_network as it increases the possibility of #feature_cross that is useful

   edit   unpin & show all

 

an extreme case of #overfitting is #memorizing in which case rather than learning the general #ground_truth the model starts to adapt to the peculiarities and specificities in the #training_set so it becomes less fit to detect the new #patterns in a new set of #data

   edit   unpin & show all

 

nother form of #regularization, called #dropout, is useful for neural networks. It works by randomly "dropping out" unit #activations in a network for a single gradient step.

   edit   unpin & show all

 

#multi_class #neural_networks help identify multiple #labels

   edit   unpin & show all

 

Given a #classification problem with N possible solutions, a #one_vs_all solution consists of N separate #binary #classifiers—one #binary classifier for each possible #outcome.

   edit   unpin & show all

 

#softmax extends the idea of #logistic_regression into a #multi_class world. That is, #softmax assigns decimal #probabilities to each class in a #multi_class problem.

   edit   unpin & show all

 

instead of the #binary answer in #one_vs_all layer, #softmax gives a #probability for each #outcome

   edit   unpin & show all

 

for example, a #number #classification problem is a #multi_class #classification problem with 10 output classes, one for each digit.

   edit   unpin & show all

 

#collaborative_filtering is the task of making #predictions about the #interests of a user based on #interests of many other #users.

   edit   unpin & show all

 

an #embedding_space maps data by its #features so that the items that are more #similar (or are more likely to be used together) are closer to each other in the #space.

   edit   unpin & show all

 

the #embedding_space can consist of many #dimensions and some of them might not have the exact semantic #meanings in which case they are called #latent_dimension representing a #feature that is not explicit in the #data but is rather inferred from it.

   edit   unpin & show all

 

ultimately it's the distances between the data elements that are important in #embedding_space not the actual #values.

   edit   unpin & show all

 

#categorical_data refers to input #features that represent one or more discrete items from a finite set of choices. For example, it can be the set of movies a user has watched, the set of words in a document, or the occupation of a person. #categorical_data is most efficiently represented via #sparse_tensors which are tensors with very few non-zero elements (also see #sparse_vector)

   edit   unpin & show all

 

in order to use #sparse_vector #representations within a machine learning system, we need a way to represent each #sparse_vector as a vector of #numbers so that semantically #similar items (movies or words) have #similar distances in the vector space. But how do you represent a word as a vector of #numbers?

   edit   unpin & show all

 

for example, in #one_hot_encoding you would map one of the 500 000 words in a vocabulary as a #sparse_vector where item 2019 would be 1 and the rest are zero (the vector represents the word)

   edit   unpin & show all

 

a "#bag_of_words" #representation contains chunks of words in a #sparse_vector so several values are 1 and most are zero

   edit   unpin & show all

 

#embeddings translate large #sparse_vector into a lower-dimensional #space that preserves #semantic relationships.

   edit   unpin & show all

 

An #embeddings is a #matrix in which each column is the #vector that corresponds to an item in your #vocabulary. To get the dense #vector for a single #vocabulary item, you retrieve the column corresponding to that item.

   edit   unpin & show all

 

#principal_component_analysis (PCA) has been used to create #word #embeddings. Given a set of instances like #bag_of_word #vectors, PCA tries to find highly correlated #dimensions that can be collapsed into a single #dimension.

   edit   unpin & show all

 

#word2vec is an example of representing the language as a #sparse_vector — mapping semantically #similar words to geometrically close #embedding #vectors

   edit   unpin & show all

 

#word2_vec exploits contextual information like this by training a neural net to distinguish actually co-occurring groups of #words from randomly grouped #words. The #input layer takes a #sparse_vector representation of a target word together with one or more context words.

   edit   unpin & show all

 

#static_model is trained #offline a #dynamic_model is trained #online

   edit   unpin & show all

 

#bias arises when we don't include what we consider to be #typical in a set of #features

   edit   unpin & show all

 

#reporting_bias occurs when the #frequency of events, properties, and/or outcomes captured in a #data set does not accurately reflect their real-world #frequency.

   edit   unpin & show all

 

#automation_bias is a tendency to favor results generated by #automated systems over those generated by #humans

   edit   unpin & show all

 

#selection_bias occurs if a #data set's examples are chosen in a way that is not reflective of their real-world #distribution

   edit   unpin & show all

 

#confirmation_bias, where model builders unconsciously process data in ways that affirm preexisting #beliefs and #hypotheses. In some cases, a model builder may actually keep training a model until it produces a result that aligns with their original hypothesis; this is called #experimenters_bias

   edit   unpin & show all

 

#confirmation_bias #experimenters_bias #selection_bias #automation_bias #reporting_bias are all times of @bias

   edit   unpin & show all

 

#confusion_matrix summarizes how successful #predictions are (it has #precision / #recall #matrix)

   edit   unpin & show all

 

in order to avoid #bias it is important to also test it across the #categorical_data (eg. only for men only for women) with #recall #precision or #accuracy in order to see if it's biased towards a certain category

   edit   unpin & show all

 

#detrended_fluctuation_analysis or #dfa is a method for determining the statistical #self_affinity of a #signal. It is useful for analysing #time_series that appear to be long-memory processes (diverging correlation time, e.g. #power_law decaying autocorrelation function) or #1f_noise.

   edit   unpin & show all

 

The obtained #exponent is similar to the #hurst_exponent, except that #dfa may also be applied to signals whose underlying statistics (such as #mean and #variance) or dynamics are #non_stationary (changing with time)

   edit   unpin & show all

 

In #dfa the scaling exponent #alpha is calculated as the #slope of a straight line fit to the log #log graph of F(n)}F(n) using leas #squares. an exponent of 0.5 would correspond to #uncorrelated #white_noise, an exponent of 1 is #pink_noise

   edit   unpin & show all

 

Another way to detect #pink_noise is to build a graph where the x axis are the #events while the y axis records a #time_series estimation relative to the #standard_deviation from the #average (#mean) time interval.

   edit   unpin & show all

 

At its essence #pink_noise is based on #self_affinity and #self_similarity, so that no matter what scale you look at, the pattern is #similar (#scale_free)

   edit   unpin & show all

 

#power_spectral_analysis describes distribution of #power across #frequency components composing the #signal - for #pink_noise we have a 1/f relationship — few powerful signals with low frequency, a long tail of less powerful ones (of which there are many) (hence #1f_noise)

   edit   unpin & show all

 

#envelope is a smooth #curve outlining the extremes of a #signal and it is also calculated in #hilbert_transform, which, in turn is used in calculating #dfa or #detrended_fluctuation_analysis

   edit   unpin & show all

 

#detrended_fluctuation_analysis (#dfa) has proven particularly useful, revealing that genetic #variation, normal development, or #disease can lead to differences in the #scale_free #amplitude #modulation of oscillations https://www.frontiersin.org/articles/10.3389/fphys.2012.00450/full

   edit   unpin & show all

 

The reason why #chaotic #variation (#pink_noise) is indicative of a #healthy state is because it reflects #winnerless_competition behind the process. If there's a deviation in this dynamics (eg some #patterns), it could mean that one process is #dominating the rest.

   edit   unpin & show all

 

#self_affinity is a property of #fractal #time_series where the small parts of the whole are #similar to the whole

   edit   unpin & show all

 

#self_affinity processes and #self_similar structures have in common that the statistical #distribution of the measured quantity follows a #power_law function, which is the only mathematical function without a characteristic scale. Self-affine and #self_similar phenomena are therefore called "#scale_free.”

   edit   unpin & show all

 

In #power_law #distribution the #mean would not necessarily be the same as the #median (which is are closer to each other in #normal #distribution)

   edit   unpin & show all

 

A #power_law #distribution means that there is big number of #small #variation and a small number of #big #variation (hence the line with a negative #slope when expressed as a #log)

   edit   unpin & show all

 

In a #1f #signal the lower #frequency objects have larger #amplitude than the higher #frequency objects (#1f_noise) https://www.frontiersin.org/files/Articles/23105/fphys-03-00450-HTML/image_m/fphys-03-00450-g001.jpg

   edit   unpin & show all

 

the #frequency of a certain #size of flower being inversely #proportional to its #size.

   edit   unpin & show all

 

#time_series in which all #frequency are represented with the same #amplitude will lack the rich variability of the #scale_free #time_series and is referred to as "#white_noise”

   edit   unpin & show all

 

To estimate the #scale_free property we calculate the #standard_deviation (#signal in relation to #mean) over the differently sized #time_windows. If as the #time_windows size increases the #standard_deviation also increases, we're dealing with a #scale_free process. If the #scaling_effect is not there, then it's not a scale free process.

   edit   unpin & show all

 

a stationary #random #fluctuating process has a #signal profile, which is #self_affine with a #scaling_exponent α = 0.5

   edit   unpin & show all

 

when we add #memory in the sense that the #probability of an action depends on the previous actions that the walker has made — we will get a process that will exhibit #self_affinity across scales (#scale_free)

   edit   unpin & show all

 

Different classes of processes with #memory exist: #positive_correlation and those with #anti_correlation - anti-correlations can be seen as a #stabilizing mechanism - a future action is more likely to be opposite than the ones made before. In this case on longer windows (time scales) we will have lower #fluctuating so the coefficient will be lower (α 0 to 0.5) - has #memory, #anti_correlation. 0.5 - #random, 0.5 to 1 - has #memory and #positive_correlation (previous actions increase the likelyhood of that action taken again) https://www.frontiersin.org/files/Articles/23105/fphys-03-00450-HTML/image_m/fphys-03-00450-g003.jpg

   edit   unpin & show all

 

for #dfa the signal is transformed into the #cumulative_signal, then it is split into several #windows equal in size on the #log scale. then for each the data is #detrended and #standard_deviation is calculated for each #window. then #fluctuating function is calculated as the mean #standard_deviation for all the #windows. Then we plot that as a graph on #log scales. The #dfa exponent α is the #slope of the trend. If it follows a straight line 45° then it means that with every #window increase we do not have a #proportional increase in the mean of fluctuation (so it is #linear). If it is more, then it is #non_linear and shows that it is in fact #scale_free

   edit   unpin & show all

 

The lower end of the fitting range is at least four samples, because #linear #detrending will perform poorly with less points (Peng et al., 1994). For the high end of the fitting range, #dfa estimates for window sizes >10% of the #signal length are more noisy due to a low number of windows available for averaging (i.e., less than 10 windows). Finally, the 50% overlap between windows is commonly used to increase the number of windows, which can provide a more accurate estimate of the fluctuation function especially for the long-time-scale windows.

   edit   unpin & show all

 

A #brown_noise process can be obtained by successively summing data points in the #white_noise process. https://www.researchgate.net/publication/232236967_A_tutorial_introduction_to_adaptive_fractal_analysis/figures?lo=1

   edit   unpin & show all

 

Using the classical #dfa method, the #cumulative_sum of data are divided into segments, and the #variance of these sums is studied as a function of segment length after linearly detrending them in each segment. https://www.nature.com/articles/s41598-019-42732-7

   edit   unpin & show all

 

In #dfa, data are divided into segments of length L and are #linearly detrended. The #square_root of the #variance (called #fluctuation) of the detrended data is studied as a function of L. It can be shown that a #linear relationship between the #logarithm of the #fluctuation and the #logarithm of L is indicative of a #power_law behavior of the spectrum. https://www.nature.com/articles/s41598-019-42732-7

   edit   unpin & show all

 

If a #linear relationship between the length of a #segment or #time_windows and the strength of the #fluctuation (or the #square_root of the #variance of the #cumulative_signal) exists, the slope of the corresponding line is also referred to as #hurst_exponent.

   edit   unpin & show all

 

For #white_noise the #hurst_exponent or the relation between the #time_windows and the #fluctuation (square root of #variance) will be #linear: when we double the #time_windows the #fluctuation (or #variance of the #cumulative_sum) will also double.

   edit   unpin & show all

 

For #pink_noise #1f_noise the #hurst_exponent will equal #1 and will mean that for #time_windows twice longer the #fluctuation will increase about 4 times. In other words, the the longer is the #time_windows the more #fluctuation occurs (#positive_correlation).

   edit   unpin & show all

 

#hurst_exponent in this context is #alpha_exponent, because we use #alpha_exponent for #non_stationary processes

   edit   unpin & show all

 

if #alpha_exponent is more than 1, it means that for every increase of scale (#time_windows) the cumulative_sum of #fluctuation increases a lot. That means, the longer we look at the process, the more likely it is to have big #fluctuation — there is a tendency in the #short_term to be #small and in the #long_term there's a tendency to be #big.

   edit   unpin & show all

 

the #cumulative_sum of the difference from the #average of a #time_series will be #brown_noise (#random_walk) for the #white_noise

   edit   unpin & show all

 

In contrast, #0.5 < #hurst_exponent < #1 indicates a #correlated process for #f_gn or what is termed a #persistent process for #f_bm. In this case, #increases in the signal (for #f_gn) or increments of the signal (for #f_bm) are likely to followed by further #increase, and #decrease are likely to be followed by #decreases (i.e., a #positive #long_term #correlation). Anti-#persistent and #persistent processes contain #structure that distinguishes them from truly #random sequences of data. (2) (PDF) A tutorial introduction to adaptive fractal analysis. Available from: https://www.researchgate.net/publication/232236967_A_tutorial_introduction_to_adaptive_fractal_analysis [accessed Apr 21 2021].

   edit   unpin & show all

 

The difference between the #exponent or #exponential_decay and the #power_law #decay is that #power_law #decay is slower: there are more values with a low #amplitude in the case of the #power_law https://math.stackexchange.com/questions/164436/difference-between-power-law-distribution-and-exponential-decay

   edit   unpin & show all

 

#downsampling (in this context) means #training on a disproportiona#tely_low_subset_of_the_#majority_class examples.

   edit   unpin & show all

 

#up#weighting means adding an example #weight to the downsampled class equal to the factor by which you performed #downsampling.

   edit   unpin & show all

 

#normalizing - transforming #numeric data to the same #scale as other #numeric data.

   edit   unpin & show all

 

#bucketing - transforming #numeric (usually #continuous) #data to #categorical_data.

   edit   unpin & show all

 

#scaling means converting #floating_point #feature #values from their #natural #range (for example, 100 to 900) into a #standard #range—usually 0 and 1

   edit   unpin & show all

 

If your data set contains extreme #outliers, you might try #feature_clipping, which caps all feature #values above (or below) a certain value to fixed value. https://developers.google.com/machine-learning/data-prep/transform/normalization

   edit   unpin & show all

 

#log #scaling computes the #log of your values to compress a wide #range to a narrow #range. #log_scaling is helpful when a handful of your values have many points, while most other values have few points. This data #distribution is known as the #power_law #distribution. Movie ratings are a good example. In the chart below, most movies have very few ratings (the data in the tail), while a few have lots of ratings (the data in the head). #log_scaling changes the #distribution, helping to improve linear model performance.

   edit   unpin & show all

 

#z_score is a #variation of #scaling that represents the number of #standard_deviations away from the #mean. You would use z-score to ensure your #feature distributions have #mean = 0 and std = 1. It’s useful when there are a few #outliers, but not so extreme that you need #clipping.

   edit   unpin & show all

 

#transformation of #numeric #features into #categorical #features, using a set of #thresholds, is called #bucketing (or #binning) - creating #buckets

   edit   unpin & show all

 

creating #buckets that each have the same number of points. This technique is called #quantile_bucketing.

   edit   unpin & show all

 

when we represent a #categorical #value with a #number it's called a #vocabulary

   edit   unpin & show all

 

#one_hot_encoding represents #numeric #values as #vectors - which can then be further compressed with #sparse_vector

   edit   unpin & show all

 

#grouping #un#labeled #examples is called #clustering. As the examples are un#labeled, #clustering relies on #unsupervised #machine_learning. If the examples are #labeled, then #clustering becomes #classification.

   edit   unpin & show all

 

#hierarchical_clustering creates a tree of #clusters. #hierarchical #clustering, not surprisingly, is well suited to #hierarchical #data, such as #taxonomies.

   edit   unpin & show all

 

#distribution_based_#clustering This #clustering approach assumes #data is composed of #distributions, such as#gaussian_#distributions. Then it #clusters them accordingly.

   edit   unpin & show all

 

#density_based_clustering connects areas of high example #density into #clusters. This #clustering allows for arbitrary-shaped #distributions as long as dense areas can be connected. These algorithms have difficulty with data of varying #densities and high #dimensions and also with #outliers.

   edit   unpin & show all

 

#centroid_based_#clustering organizes the data into #non_#hierarchical_clusters, in contrast to #hierarchical #clustering defined below. #k_means is the most widely-used centroid-based #clustering #algorithm.

   edit   unpin & show all

 

In order to perform #clustering we need to quantify the #similarity between examples by creating the #similarity_metrics for our #dataset

   edit   unpin & show all

 

for #data #processing we need to create #quantile or use #quantile_bucketing when the #distribution is #poisson — neither #gaussian nor #power_law.

   edit   unpin & show all

 

when the #distribution is #gaussian we can #normalizing our #data

   edit   unpin & show all

 

when the #distribution is #power_law we might want to use #log_scaling #normalizing for our data

   edit   unpin & show all

 

we can do either #manual #similarity or #supervised #similarity. you switch to a #supervised_similarity_measure when you have trouble creating a #manual_similarity_measure.

   edit   unpin & show all

 

#mean_square_error shows the #average squared #loss for an #example

   edit   unpin & show all

 

we can calculate #similarity by calculating a root #mean_square_error or the #sums of the #features (e.g. size and price). the lower the value, the higher is the similarity.

   edit   unpin & show all

 

For #categorical_data we can calculate #similarity using #jaccard_similarity which shows the proportion of intersection between the #sets

   edit   unpin & show all

 

#k_means groups points into #clusters by minimizing the #distances between points and their #cluster’s #centroid (as seen in Figure 1 below). The #centroid of a #cluster is the #mean of all the points in the #cluster.

   edit   unpin & show all

 

Instead of comparing manually-combined #feature #data, you can reduce the #feature #data to #representations called #embeddings, and then compare the #embeddings

   edit   unpin & show all

 

#embeddings are generated by training a #supervised deep neural network (#dnn) on the #feature data itself. The #embeddings_map the #feature data to a #vector in an #embedding_space. Typically, the #embedding_space has fewer dimensions than the #feature data in a way that captures some #latent #structure of the #feature data set.

   edit   unpin & show all

 

A #dnn that learns #embeddings of #input data by predicting the #input data itself is called an #autoencoder. An #autoencoder is the simplest choice to generate #embeddings. However, an #autoencoder isn't the optimal choice when certain features could be more important than others in determining #similarity.

   edit   unpin & show all

 

Since this #dnn predicts a specific input #feature instead of predicting all input #features, it is called a predictor #dnn

   edit   unpin & show all

 

To train the #dnn, you need to create a #loss_functionby following these steps: 1) calculate the #loss for every #output of #dnn. For #numeric outputs use #mean_square_error, for #categorical use #log_loss, for #multivalent #categorical use #softmax_cross_entropy (#entropy) loss.

   edit   unpin & show all

 

in #poisson distribution the #decay happens much faster than in the #power_law #distribution — if in #power_law you have a significant number of nodes the #tail, then in #poisson you only have a few.

   edit   unpin & show all

 

A #similarity measure takes the

   edit   unpin & show all

 

#embeddings generated by our neural network (#dense_features) and returns a number measuring their #similarity.

   edit   unpin & show all

 

To calculate #similarity we have 3 measures to choose from: #euclidian_distance (substraction of vectors), #cosine_distance (cosine of the angle between the vectors) and the #dot_product (cosine multiplied by the lengths of both vectors)

   edit   unpin & show all

 

In contrast to the #cosine_distance, the #dot_product is proportional to the #vector #length. This is important because examples that appear very frequently in the training set (for example, popular YouTube videos) tend to have embedding #vectors with large #lengths. If you want to capture #popularity, then choose #dot_product.

   edit   unpin & show all

 

#cluster #cardinality is the number of examples per #cluster. We are looking for #outliers and if we do find them, this may indicate some interesting #patterns

   edit   unpin & show all

 

Cluster #magnitude is the sum of #distances from all examples to the #centroid of the #cluster. Similar to #cardinality, check how the #magnitude varies across the #clusters, and investigate #anomalies. and #outliers

   edit   unpin & show all

 

Notice that a higher #cluster #cardinality tends to result in a higher #cluster #magnitude, which intuitively makes sense. Clusters are #anomalous when #cardinality doesn't correlate with #magnitude relative to the other #clusters.

   edit   unpin & show all

 

#content_based_filtering Uses #similarity between items to #recommend items similar to what the user likes.

   edit   unpin & show all

 

#collaborative_filtering uses Uses #similarity between #queries and #items simultaneously to provide #recommendations.

   edit   unpin & show all

 

Both #content_based_filtering and #collaborative_filtering map each #item and each #query (or #context) to an #embedding_vector

   edit   unpin & show all

 

#recommendations - We again place our #users in the same #embedding_space to best explain the #feedback_matrix: for each (#user, #item) pair, we would like the #dot_product of the #user #embedding and the #item #embedding to be close to 1 when the #user watched the movie, and to 0 otherwise.

   edit   unpin & show all

 

The #dot_product of the #user_#matrix and #item_#matrix yields a #recommendation #matrix that contains not only the original user ratings but also #predictions for the movies that each user hasn't seen

   edit   unpin & show all

 

#matrix_factorization In math, a mechanism for finding the matrices whose #dot_product approximates a #target_matrix.

   edit   unpin & show all

 

#generative_adversarial_networks (#gans) are an exciting recent innovation in #machine_learning. #gans are #generative models: they create new #data_instances that resemble your #training_data. For example, #gans can create images that look like photographs of human faces, even though the faces don't belong to any real person.

   edit   unpin & show all

 

#gans achieve this level of realism by pairing a #generator, which learns to produce the target output, with a #discriminator, which learns to distinguish true data from the output of the #generator. The #generator tries to fool the #discriminator, and the #discriminator tries to keep from being fooled.

   edit   unpin & show all

 

#generative" describes a class of statistical models that contrasts with #discriminative models. #generative models can generate new #data #instances. #discriminative models #discriminate between different kinds of #data #instances.

   edit   unpin & show all

 

More formally, given a set of #data #instances X and a set of #labels Y: #generative models capture the joint #probability p(X, Y), or just p(X) if there are no #labels. #discriminative models capture the #conditional #probability p(Y | X).

   edit   unpin & show all

 

The #generator learns to generate #plausible #data. The generated instances become negative #training #examples for the #discriminator.

   edit   unpin & show all

 

The #discriminator learns to distinguish the #generator's #fake #data from #real #data. The #discriminator #penalizes the #generator for producing implausible results.

   edit   unpin & show all

 

Through #backpropagation, the #discriminator s #classification provides a signal that the #generator uses to update its #weights.

   edit   unpin & show all

 

The #discriminator in a #gan is simply a #classifier. It tries to distinguish real #data from the #fake #data created by the #generator.

   edit   unpin & show all

 

The #discriminator connects to two #loss functions. During #discriminator training, the #discriminator ignores the #generator #loss and just uses the #discriminator #loss.

   edit   unpin & show all

 

The #generator part of a #gan learns to create #fake data by incorporating #feedback from the #discriminator. It learns to make the #discriminator #classify its output as real.

   edit   unpin & show all

 

The #generator feeds into the #discriminator net, and the #discriminator produces the output we're trying to affect. The #generator #loss penalizes the #generator for producing a sample that the #discriminator network classifies as #fake.

   edit   unpin & show all

 

#research has suggested that if your #discriminator is too good, then #generator training can fail due to #vanishing_gradients. In effect, an optimal #discriminator doesn't provide enough #information for the #generator to make #progress.

   edit   unpin & show all

 

#wasserstein_loss: The #wasserstein_loss is designed to prevent #vanishing_gradients even when you train the #discriminator to #optimality.

   edit   unpin & show all

 

#convolutional_neural_network (#cnn) could be used to progressively extract higher- and higher-level #representations of the image #content.

   edit   unpin & show all

 

#backpropagation is a process of calculating the #gradient for the #neural_network. it is used to see how to bring the #loss_function to the minimum, in which direction the learning should take place.

   edit   unpin & show all

 

in the process of #learning each #layer of a #neural_network will have #neurons with #weights ascribed to them, which enable #differentiation of various #features

   edit   unpin & show all

 

The #weights of the neurons combine through #layer using the #activation_function (e.g. #sigmoid or #relu), which then leads to only a certain neuron at the last layer to get activated. We can say that first #layer detects some general #features, the next one — more specific ones, and so on. but this is not the case.

   edit   unpin & show all

 
tags:
     
    total nodes:  extend
    merged nodes:
    unmerge all
    copy to global
    Word Count Unique Lemmas Characters Lemmas Density
    0
    0
    0
    0

        
    Show Nodes with Degree > 0:

    0 0

    Filter Graphs:


    Filter Time Range
    from: 0
    to: 0


    Recalculate Metrics   Reset Filters
          
    Hide Labels for Nodes < 0:

    0 0

    Default Label Size: 0

    0 20



    Edges Type:



    Layout Type:


     

    Reset to Default
    network diversity:
    ×
    ×
    Network Structure Insights
      ?
    diversity:
    N/A
      stucture:
    N/A
    The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.
    The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.

    Modularity
    0
    Influence Distribution
    0
    %
    Topics Nodes in Top Topic Components Nodes in Top Comp
    0
    0
    %
    0
    0
    %
    Nodes Av Degree Density Weighed Betweenness
    0
    0
    0
    0
     

    Degree Distribution:
      ?   switch to linear
    distribution (based on kolmogorov-smirnov test) ?
    narrative fractality: | alpha exponent: (based on Detrended Fluctuation Analysis of influence) ?
    Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.

    This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.
    If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.

    Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.
    We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify the fractality of this time series (using alpha exponent, closely related to Hurst exponent): uniform (pulsating | alpha <= 0.65), regular (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).

    For maximal diversity and plurality, the narrative should be close to "fractal". For poetry — "complex". For ideological texts — "uniform".
    ×
             
    Main Topical Groups:

    N/A
    +     full stats   ?

    The topics are the nodes (words) that tend to co-occur together in the same context (next to each other).

    We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph and are given a distinct color.
    Most Influential Elements:
    N/A
    +     Reveal Non-obvious   ?

    We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

    Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

    The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.
    Network Structure:
    N/A



    Reset Graph   Export: Show Options
    Action Advice:
    N/A
    Structural Gap
    (ask a research question that would link these two topics):
    N/A
    Reveal the Gap   ?
     
    A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.

    This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

    Latent Topical Brokers
    :
    N/A
    ?

    These are the latent brokers between the topics: the nodes that have an unusually high rate of influence (betweenness centrality) to their freqency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.

    These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

    Emerging Keywords
    N/A

    Evolution of Topics
    (frequency over time)

     
    Main Topics
    (according to Latent Dirichlet Allocation):
    loading...

    Most Influential Words
    (main topics and words according to LDA):
    loading...

    LDA works only for English-language texts at the moment. More support is coming soon, subscribe @noduslabs to be informed.

    Network Statistics:
    Show Overlapping Nodes Only

    ⤓ Download as CSV  ⤓ Download an Excel File

    Top Relations / Bigrams
    (both directions):

    ⤓ Download   ⤓ Directed Bigrams CSV   ?

    The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default as it allows us to better detect general patterns.

    As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).
    Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):

     
    We will build two graphs:
    1) Google search results for your query;
    2) Related searches for your query (Google's SERP);
    Click the Missing Content tab to see the graph that shows the difference between what people search for and what they actually find, indicating the content you could create to fulfil this gap.
    Find a market niche for a certain product, category, idea or service: what people are looking for but cannot yet find*

     
    We will build two graphs:
    1) the content that already exists when you make this search query (informational supply);
    2) what else people are searching for when they make this query (informational demand);
    You can then click the Niche tab to see the difference between the supply and the demand — what people need but do not yet find — the opportunity gap to fulfil.
    Please, enter your query to visualize the search results as a graph, so you can learn more about this topic:

     
       advanced settings    add data manually
    Enter a search query to analyze the Twitter discourse around this topic (last 7 days):

         advanced settings    add data manually

    Sign Up