#machine_learning #supervised #unsupervised

16010218212080000

×

PNG (Image) SVG (Hi-Res)

JSON CSV Gexf (Gephi)

CSV tagged w Topics Plain Text

Download the Graph Image:

PNG (Image) SVG (Hi-Res)

Download the Graph Data:

JSON CSV Gexf (Gephi)

Download the Text Data:

CSV tagged w Topics Plain Text

Top keywords (global influence):

Top topics (local contexts):

Explore the main topics and terms outlined above or see them in the excerpts from this text below.

See the relevant data in context: click here to show the excerpts from this text that contain these topics below.

Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.

Tip: here are the keyword queries that people search for but don't actually find in the search results.

#machine_learning #supervised #unsupervised

16010218212080000

#features are used in #machine_learning for #differentiation

16010218953090000

#features are used for #training in #machine_learning

16010219238940000

the #relationship is called a #model in #machine_learning

16010219691280000

#supervised #machine_learning finds #patterns between #data and #labels

16010220174640000

#patterns are used to make #predictions

16010220245360000

#supervised works with #labeled #data

16010220623890000

the goal of #unsupervised is to identify meaningful #patterns in #data

16010221788100000

sometimes a #model can find #patterns that represent #stereotypes of #bias

16010222426340000

#clustering is a type of #unsupervised #learning

16010222687280000

with #reinforcement_learning (RL) you set up a #model (called an #agent in RL) where it receives a #reward each time it performs well (#reward_function)

16010223918970000

a #shaped #reward increases in the states closer to the #goal #state

16010228960240000

a #sparse #reward is in the #goal #state only

16010229287470000

#positive_reinforcement is an important element of #reinforcement_learning

16010233024510000

if the #reward is providing the #features to the #model that could improve the performance

16010233702040000

#machine_learning #problems: #classification, #regression, #clustering, #association learning #structured_output #ranking

16010234640220000

#clustering is an #unsupervised #learning problem

16010235064030000

#regression requires labeled data — #supervised learning problem

16010235427260000

#classification requires a set of #labels - so it is #supervised

16010235646080000

a #neural_network works through #representations

16010236987060000

#machine_learning process as an #experiment where we run #test after test after test to converge on a workable #model

16010242276470000

A well-defined #problem has both #inputs and #outputs. #inputs are the #features. #outputs are the #labels to predict.

16011504314150000

#training means creating or #learning the #model

16011570047940000

#inference means applying the trained #model to #unlabeled #examples

16011570159530000

A #regression #model #predicts continuous #values.

16011570334140000

A #classification #model #predicts #discrete #values.

16011570479360000

#machine_learning needs to provide #decisions rather than just #predictions

16011573674160000

#labels are the #variables or #values for #predictions

16012012516280000

#features are #inputs #variables describing the #data

16012012699700000

#model_training is done by the #data which has #features and #labels so that it knows what #correlations to extract

16012013208190000

#model maps #examples to predicted #labels

16012013745350000

#loss_function is showing us the degree of #deviation of the #model #prediction from the real #values

16012017050010000

#loss_function can be a square #difference between the #prediction and the #labels

16012017433880000

#loss_function = #observation - #prediction

16012017865560000

#mean_square_error estimates the #deviation of the #loss_function - #prediction from #labels for each element on average

16012020980450000

#model_training is usually based on reducing the #loss in the #loss_function via #mean_square_error but not only

16012022160730000

the #gradient_descent approach is used to minimize the #loss_function

16012045160540000

the #learning_rate determines the size of the #gradient_descent

16012047119080000

#epoch is the number of #batches used in #machine_learning

16012051551930000

if the #training #loss is decreased we say that it's #converged

16012058974840000

#goldilocks #learning rate for a curve, where a #gradient descent reaches the minimum point in the fewest number of steps?

16025817064320000

In #supervised #learning, a machine #learning algorithm builds a model by examining many examples and attempting to find a model that minimizes #loss; this process is called empirical #risk_minimization.

16025818188660000

#loss is the penalty for a bad #prediction. That is, loss is a number indicating how bad the model's #prediction was on a single example.

16025818385760000

Mean square error (#mse) is the average squared #loss per example over the whole #dataset.

16025818612440000

#iterative #learning is used to decrease #loss over time by making small steps and receiving #feedback on the #outputs

16025819625270000

Usually, you #iterate until overall #loss stops changing or at least changes extremely slowly. When that happens, we say that the #model has #converged.

16025822535080000

#epoch represents a full training pass over the entire #dataset such that each example has been seen once. Thus, an #epoch represents N / #batch size training iterations, where N is the total number of examples.

16025843762660000

#learning_rate is a scalar used to #train a model via #gradient_descent. During each iteration, the #gradient_descent algorithm multiplies the #learning_rate by the #gradient. The resulting product is called the #gradient step. #learning_rate is a key #hyperparameter.

16025844637600000

#batch is a number of #examples used in an #iteration

16025845233370000

#anomalies in #features may indicate a potential problem in a #dataset - one should be more careful using that sort of data.

16025888817740000

#learning_rate specifies the size of the step, #batch specifies how many elements we take into the learning process, #epoch specify how many iterations we're going to have

16025892348900000

a #synthetic_feature is made out of several #features and may help #prediction

16025896151100000

a #correlation_matrix shows if there are any #correlations between the #features

16025898848730000

we take a #training_set from our #data and a #test_set then we train the model on the #training_set to see how well it #prediction on the #test_set

16025905238200000

An #overfit model gets a low #loss during training but does a poor job #predicting new data.

16026035829210000

#machine_learning 's goal is to predict well on new #data drawn from a (hidden) true #probability #distribution.

16026036487610000

The less complex a #machine_learning model, the more likely that a good #empirical result is not just due to the #peculiarities of the #sample.

16026036899920000

Partitioning a #data set into a #training_set and #test_set lets you judge whether a given #model will generalize well to new #data.

16026042888650000

#partitioning a #data set into a #training_set and a #test_set

16026044741390000

a #training_set can be split into a smaller #training_set, #test_set and a #validation_set so that the model can be trained better and no #overfitting occurs

16026049024930000

#features are very important for #machine_learning models

16026135583550000

#features engineering should remove #outliers (which might lead to the problem that a #model cannot #predicting outstanding events)

16026138051560000

#data_visualization is important for knowing #data and improving #machine_learning models

16026138804870000

#one_hot_encoding allows us to incorporate categorical #data into our #model

16026141480860000

for very large #values #sparse #representation is used

16026141948620000

#binning_values allows us to to simplify the #data and bring it to the #feature_vector using #one_hot_encoding

16026145199660000

#binning_values by #quantile ensures the number of #examples in each bucket is different

16026145845550000

#feature_crossing can be a powerful way to improve #prediction by combining the #data #features of the #dataset

16026156955840000

#feature_crossing is often done for #one_hot_encoding where multiple #features are crossed to produce interesting #feature_vector

16026185154000000

#feature_crossing is one learning strategy #neural_network is another

16026188038040000

minimize #loss_complexity, which is called #structural_risk_minimization allows to avoid #overfitting the #model

16026258796350000

the #loss_term, measures how well the #model fits the #data, and the #regularization term, which measures model #complexity.

16026275046190000

model #complexity as a function of the #weights of all the #features in the #model.

16026276476070000

model #complexity as a function of the total number of #features with nonzero #weights. (A later module covers this approach.)

16026276476070002

#model developers tune the overall impact of the #regularization term by multiplying its value #scalar known as #lambda (also called the #regularization rate).

16026692468040000

#regularization term, which measures model #complexity.

16026692712020000

#regularization is a technique used in an attempt to solve the #overfitting problem in statistical models.

16026697381550000

#regularization penalizes the #loss_function in that it pushes the model to give lower #weights to each parameter in the model

16026698207270000

a model is #learning the #weights for each of the #features as it is #training itself to minimize #loss and #complexity

16026713966020000

#regularization penalizes the #loss_function for too much #complexity (a high number of #features with nonzero #weights )

16026716028860000

#sigmoid_function maps the the output of the #linear_layer of a model trained with #logistic_regression between zero and one

16026720637740000

#logistic_regression returns a #probability for a #classification

16026750262610000

In order to map a #logistic_regression value to a binary category, you must define a #classification_threshold (also called the #decision_threshold) #threshold

16026750446240000

A #true_positive is when a model correctly #predicts the #positive_class (it made it a prediction and it was true). A #true_negative is when a model correctly #predicts a #negative_class

16026753335470000

A #false_positive is when a model incorrectly #predicts the #positive_class. A #false_negative is when a model incorrectly #predicts the #negative_class

16026753792080000

#accuracy of a model #prediction is the ratio of the #correct_predictions to the total number of #predictions

16026754772150000

#prediction #predicts

16026754871930000

#predicting the #prediction

16026755028280000

#accuracy then is a ratio of #true_positive plus #true_negative to the sum of all other predictions (#true_negative + #true_positive + #false_negative + #false_positive)

16026760078380000

#accuracy alone doesn't tell the full story when you're working with a #class_imbalanced_data_set, like this one, where there is a significant #disparity between the number of #positive and #negative #labels.

16026761662840000

#precision is a ratio of #true_positive to #total_positives claimed (#true_positive + #false_positive)

16026763646580000

#recall is the ratio of #true_positive to the #total_positives that really happend (#true_positive + #false_negative)

16026764590140000

#classification_threshold should strike a balance between #precision and #recall so that both values are at their maximum

16026766242010000

#precision is based on a #claim and #recall is based on #reality

16026767157700000

The #f1_score is the harmonic #mean of the #precision and #recall

16026767962880000

#roc_curve (receiver operating characteristic curve) is a graph showing the performance of a #classification model at all #classification thresholds: the #true_positive rate vs the #false_positive rate

16026890928770000

#auc provides an aggregate measure of performance across all possible #classification #threshold

16026891309410000

#true_positive rate is basically the #recall because it's the relation of #true_positive to the sum of #true_positive and #false_negative

16026894154870000

#false_positive is the reverse of that: what is the ratio of the #claim of the positive that is not true to #reality

16026894845590000

#logistic_regression #predictions should be #unbiased. That is: "average of #predictions" should ≈ "average of observations"

16026907148940000

#prediction_bias is a quantity that measures how far apart the #predictions are from the #observations

16026907521480000

A #z_score is the number of #standard_deviations from the #mean for a particular raw value

16027931564000000

#sparse_vector often contain many #dimensions. Creating a #feature_cross results in even more #dimensions. Which may lead to a higher use of #resources and #memory

16028434025100000

#sparse_vector often contain many #dimensions. Creating a #feature_cross results in even more #dimensions. Which may lead to a higher use of #resources and #memory

16028434025100000

in a high-dimensional #sparse_vector it is good to encourage as many #weights as possible to be zero, so that we reduce the #complexity of the #model and the toll on #resources

16028435460400000

#l2_regularization encourages #weights to be small, but doesn't force them to exactly zero

16028436036350000

#l2_regularization is a sum of #weights and it encourages them to be smaller to reduce the #complexity of the model

16028437436670000

#lambda is used together with #l2_regularization to reduce complexity without pushing the #weights too high

16028437996030000

#regularization and #l2_regularization

16028438172070000

#l2_regularization penalizes #weights square, while #l1_regularization penalizes the #weights

16028439835370000

if we then take a #derivative of #l1_regularization we will remove all the zero #values

16028440065710000

#neural_networks are a more sophisticated version of #feature_cross. In essence, neural networks do the appropriate #feature_crossing for you.

16028443803840000

"#nonlinear" means that you can't accurately predict a #label with a model of the form In other words, the "#decision_surface" is not a #line

16028447796330000

#feature_cross is one possible approach to modeling #nonlinear problems.

16028448134610000

a #linear model (#linear_layer) can be represented as a #graph: the #inputs are the #features and the #output is the weighed sum of the #inputs (sum of #weights)

16028449165490000

where a #linear model doesn't work we can use the #nonlinear

16028449336760000

a #hidden_layer is a weighed sum of the #Input #values

16028450817350000

a #hidden_layer is a combination of #inputs

16028451024520000

a #hidden_layer is still part of a #linear model

16028451244010000

a #linear model cannot serve #nonlinear problems (e.g. it cannot fit the #predictions to a curve or identify certain areas or complex #patterns

16028452474610000

that's why we want to introduce #nonlinear model - we do that by piping each #hidden_layer node through a #nonlinear function

16028452926540000

the #nonlinear function is called the #activation_function - this lets us model very complicated #relations between the #inputs and #outputs

16028453710930000

#sigmoid #nonlinear #activation_function converts the #weights sum to a value between 0 and 1

16028454555630000

#rectified #linear unit #activation_function (or #relu, for short) often works a little better than a smooth function like the #sigmoid as #relu helps add #nonlinear dynamics into the layers of the #inputs

16028455552410000

#redundancy can be important for a #neural_network as it increases the possibility of #feature_cross that is useful

16028461876130000

an extreme case of #overfitting is #memorizing in which case rather than learning the general #ground_truth the model starts to adapt to the peculiarities and specificities in the #training_set so it becomes less fit to detect the new #patterns in a new set of #data

16028462928250000

nother form of #regularization, called #dropout, is useful for neural networks. It works by randomly "dropping out" unit #activations in a network for a single gradient step.

16028687263950000

#multi_class #neural_networks help identify multiple #labels

16028691765700000

Given a #classification problem with N possible solutions, a #one_vs_all solution consists of N separate #binary #classifiers—one #binary classifier for each possible #outcome.

16028693945170000

#softmax extends the idea of #logistic_regression into a #multi_class world. That is, #softmax assigns decimal #probabilities to each class in a #multi_class problem.

16028695237940000

instead of the #binary answer in #one_vs_all layer, #softmax gives a #probability for each #outcome

16028696067740000

for example, a #number #classification problem is a #multi_class #classification problem with 10 output classes, one for each digit.

16028708923930000

#collaborative_filtering is the task of making #predictions about the #interests of a user based on #interests of many other #users.

16028734015720000

an #embedding_space maps data by its #features so that the items that are more #similar (or are more likely to be used together) are closer to each other in the #space.

16028752996440000

the #embedding_space can consist of many #dimensions and some of them might not have the exact semantic #meanings in which case they are called #latent_dimension representing a #feature that is not explicit in the #data but is rather inferred from it.

16028754420110000

ultimately it's the distances between the data elements that are important in #embedding_space not the actual #values.

16028754881660000

#categorical_data refers to input #features that represent one or more discrete items from a finite set of choices. For example, it can be the set of movies a user has watched, the set of words in a document, or the occupation of a person. #categorical_data is most efficiently represented via #sparse_tensors which are tensors with very few non-zero elements (also see #sparse_vector)

16028756228060000

in order to use #sparse_vector #representations within a machine learning system, we need a way to represent each #sparse_vector as a vector of #numbers so that semantically #similar items (movies or words) have #similar distances in the vector space. But how do you represent a word as a vector of #numbers?

16028757457890000

for example, in #one_hot_encoding you would map one of the 500 000 words in a vocabulary as a #sparse_vector where item 2019 would be 1 and the rest are zero (the vector represents the word)

16028758163240000

a "#bag_of_words" #representation contains chunks of words in a #sparse_vector so several values are 1 and most are zero

16028758779180000

#embeddings translate large #sparse_vector into a lower-dimensional #space that preserves #semantic relationships.

16028760623450000

An #embeddings is a #matrix in which each column is the #vector that corresponds to an item in your #vocabulary. To get the dense #vector for a single #vocabulary item, you retrieve the column corresponding to that item.

16028762171380000

#principal_component_analysis (PCA) has been used to create #word #embeddings. Given a set of instances like #bag_of_word #vectors, PCA tries to find highly correlated #dimensions that can be collapsed into a single #dimension.

16028763634990000

#word2vec is an example of representing the language as a #sparse_vector — mapping semantically #similar words to geometrically close #embedding #vectors

16028765656910000

#word2_vec exploits contextual information like this by training a neural net to distinguish actually co-occurring groups of #words from randomly grouped #words. The #input layer takes a #sparse_vector representation of a target word together with one or more context words.

16028766449270000

#static_model is trained #offline a #dynamic_model is trained #online

16028768925380000

#bias arises when we don't include what we consider to be #typical in a set of #features

16028777895340000

#reporting_bias occurs when the #frequency of events, properties, and/or outcomes captured in a #data set does not accurately reflect their real-world #frequency.

16028782396710000

#automation_bias is a tendency to favor results generated by #automated systems over those generated by #humans

16028782659070000

#selection_bias occurs if a #data set's examples are chosen in a way that is not reflective of their real-world #distribution

16028782916400000

#confirmation_bias, where model builders unconsciously process data in ways that affirm preexisting #beliefs and #hypotheses. In some cases, a model builder may actually keep training a model until it produces a result that aligns with their original hypothesis; this is called #experimenters_bias

16028784244300000

#confirmation_bias #experimenters_bias #selection_bias #automation_bias #reporting_bias are all times of @bias

16028784542210000

#confusion_matrix summarizes how successful #predictions are (it has #precision / #recall #matrix)

16028786792510000

in order to avoid #bias it is important to also test it across the #categorical_data (eg. only for men only for women) with #recall #precision or #accuracy in order to see if it's biased towards a certain category

16028787321430000

#detrended_fluctuation_analysis or #dfa is a method for determining the statistical #self_affinity of a #signal. It is useful for analysing #time_series that appear to be long-memory processes (diverging correlation time, e.g. #power_law decaying autocorrelation function) or #1f_noise.

16029688797410000

The obtained #exponent is similar to the #hurst_exponent, except that #dfa may also be applied to signals whose underlying statistics (such as #mean and #variance) or dynamics are #non_stationary (changing with time)

16029689080450000

In #dfa the scaling exponent #alpha is calculated as the #slope of a straight line fit to the log #log graph of F(n)}F(n) using leas #squares. an exponent of 0.5 would correspond to #uncorrelated #white_noise, an exponent of 1 is #pink_noise

16029690587510000

Another way to detect #pink_noise is to build a graph where the x axis are the #events while the y axis records a #time_series estimation relative to the #standard_deviation from the #average (#mean) time interval.

16029693998120000

At its essence #pink_noise is based on #self_affinity and #self_similarity, so that no matter what scale you look at, the pattern is #similar (#scale_free)

16029694711270000

#power_spectral_analysis describes distribution of #power across #frequency components composing the #signal - for #pink_noise we have a 1/f relationship — few powerful signals with low frequency, a long tail of less powerful ones (of which there are many) (hence #1f_noise)

16029700864780000

#envelope is a smooth #curve outlining the extremes of a #signal and it is also calculated in #hilbert_transform, which, in turn is used in calculating #dfa or #detrended_fluctuation_analysis

16048776016720000

#detrended_fluctuation_analysis (#dfa) has proven particularly useful, revealing that genetic #variation, normal development, or #disease can lead to differences in the #scale_free #amplitude #modulation of oscillations https://www.frontiersin.org/articles/10.3389/fphys.2012.00450/full

16049192883670000

The reason why #chaotic #variation (#pink_noise) is indicative of a #healthy state is because it reflects #winnerless_competition behind the process. If there's a deviation in this dynamics (eg some #patterns), it could mean that one process is #dominating the rest.

16049193607560000

#self_affinity is a property of #fractal #time_series where the small parts of the whole are #similar to the whole

16049202652340000

#self_affinity processes and #self_similar structures have in common that the statistical #distribution of the measured quantity follows a #power_law function, which is the only mathematical function without a characteristic scale. Self-affine and #self_similar phenomena are therefore called "#scale_free.”

16049204012290000

In #power_law #distribution the #mean would not necessarily be the same as the #median (which is are closer to each other in #normal #distribution)

16049204959560000

A #power_law #distribution means that there is big number of #small #variation and a small number of #big #variation (hence the line with a negative #slope when expressed as a #log)

16049207836470000

In a #1f #signal the lower #frequency objects have larger #amplitude than the higher #frequency objects (#1f_noise) https://www.frontiersin.org/files/Articles/23105/fphys-03-00450-HTML/image_m/fphys-03-00450-g001.jpg

16049214353530000

the #frequency of a certain #size of flower being inversely #proportional to its #size.

16049215215930000

#time_series in which all #frequency are represented with the same #amplitude will lack the rich variability of the #scale_free #time_series and is referred to as "#white_noise”

16049216054650000

To estimate the #scale_free property we calculate the #standard_deviation (#signal in relation to #mean) over the differently sized #time_windows. If as the #time_windows size increases the #standard_deviation also increases, we're dealing with a #scale_free process. If the #scaling_effect is not there, then it's not a scale free process.

16049220905120000

a stationary #random #fluctuating process has a #signal profile, which is #self_affine with a #scaling_exponent α = 0.5

16049249650470000

when we add #memory in the sense that the #probability of an action depends on the previous actions that the walker has made — we will get a process that will exhibit #self_affinity across scales (#scale_free)

16049250441650000

Different classes of processes with #memory exist: #positive_correlation and those with #anti_correlation - anti-correlations can be seen as a #stabilizing mechanism - a future action is more likely to be opposite than the ones made before. In this case on longer windows (time scales) we will have lower #fluctuating so the coefficient will be lower (α 0 to 0.5) - has #memory, #anti_correlation. 0.5 - #random, 0.5 to 1 - has #memory and #positive_correlation (previous actions increase the likelyhood of that action taken again) https://www.frontiersin.org/files/Articles/23105/fphys-03-00450-HTML/image_m/fphys-03-00450-g003.jpg

16049257143270000

for #dfa the signal is transformed into the #cumulative_signal, then it is split into several #windows equal in size on the #log scale. then for each the data is #detrended and #standard_deviation is calculated for each #window. then #fluctuating function is calculated as the mean #standard_deviation for all the #windows. Then we plot that as a graph on #log scales. The #dfa exponent α is the #slope of the trend. If it follows a straight line 45° then it means that with every #window increase we do not have a #proportional increase in the mean of fluctuation (so it is #linear). If it is more, then it is #non_linear and shows that it is in fact #scale_free

16049268449930000

The lower end of the fitting range is at least four samples, because #linear #detrending will perform poorly with less points (Peng et al., 1994). For the high end of the fitting range, #dfa estimates for window sizes >10% of the #signal length are more noisy due to a low number of windows available for averaging (i.e., less than 10 windows). Finally, the 50% overlap between windows is commonly used to increase the number of windows, which can provide a more accurate estimate of the fluctuation function especially for the long-time-scale windows.

16049270009570000

A #brown_noise process can be obtained by successively summing data points in the #white_noise process. https://www.researchgate.net/publication/232236967_A_tutorial_introduction_to_adaptive_fractal_analysis/figures?lo=1

16190243455750000

Using the classical #dfa method, the #cumulative_sum of data are divided into segments, and the #variance of these sums is studied as a function of segment length after linearly detrending them in each segment. https://www.nature.com/articles/s41598-019-42732-7

16190250690820000

In #dfa, data are divided into segments of length L and are #linearly detrended. The #square_root of the #variance (called #fluctuation) of the detrended data is studied as a function of L. It can be shown that a #linear relationship between the #logarithm of the #fluctuation and the #logarithm of L is indicative of a #power_law behavior of the spectrum. https://www.nature.com/articles/s41598-019-42732-7

16190255845340000

If a #linear relationship between the length of a #segment or #time_windows and the strength of the #fluctuation (or the #square_root of the #variance of the #cumulative_signal) exists, the slope of the corresponding line is also referred to as #hurst_exponent.

16190283937010000

For #white_noise the #hurst_exponent or the relation between the #time_windows and the #fluctuation (square root of #variance) will be #linear: when we double the #time_windows the #fluctuation (or #variance of the #cumulative_sum) will also double.

16190320821100000

For #pink_noise #1f_noise the #hurst_exponent will equal #1 and will mean that for #time_windows twice longer the #fluctuation will increase about 4 times. In other words, the the longer is the #time_windows the more #fluctuation occurs (#positive_correlation).

16190325515920000

#hurst_exponent in this context is #alpha_exponent, because we use #alpha_exponent for #non_stationary processes

16190326351140000

if #alpha_exponent is more than 1, it means that for every increase of scale (#time_windows) the cumulative_sum of #fluctuation increases a lot. That means, the longer we look at the process, the more likely it is to have big #fluctuation — there is a tendency in the #short_term to be #small and in the #long_term there's a tendency to be #big.

16190328082400000

the #cumulative_sum of the difference from the #average of a #time_series will be #brown_noise (#random_walk) for the #white_noise

16190339073410000

[accessed_apr_21_2021]

In contrast, #0.5 < #hurst_exponent < #1 indicates a #correlated process for #f_gn or what is termed a #persistent process for #f_bm. In this case, #increases in the signal (for #f_gn) or increments of the signal (for #f_bm) are likely to followed by further #increase, and #decrease are likely to be followed by #decreases (i.e., a #positive #long_term #correlation). Anti-#persistent and #persistent processes contain #structure that distinguishes them from truly #random sequences of data. (2) (PDF) A tutorial introduction to adaptive fractal analysis. Available from: https://www.researchgate.net/publication/232236967_A_tutorial_introduction_to_adaptive_fractal_analysis [accessed Apr 21 2021].

16190379897810000

The difference between the #exponent or #exponential_decay and the #power_law #decay is that #power_law #decay is slower: there are more values with a low #amplitude in the case of the #power_law https://math.stackexchange.com/questions/164436/difference-between-power-law-distribution-and-exponential-decay

16195411418390000

#downsampling (in this context) means #training on a disproportiona#tely_low_subset_of_the_#majority_class examples.

16209001190240000

#up#weighting means adding an example #weight to the downsampled class equal to the factor by which you performed #downsampling.

16209001569040000

#normalizing - transforming #numeric data to the same #scale as other #numeric data.

16209003411860000

#bucketing - transforming #numeric (usually #continuous) #data to #categorical_data.

16209003591780000

#scaling means converting #floating_point #feature #values from their #natural #range (for example, 100 to 900) into a #standard #range—usually 0 and 1

16209005211930000

If your data set contains extreme #outliers, you might try #feature_clipping, which caps all feature #values above (or below) a certain value to fixed value. https://developers.google.com/machine-learning/data-prep/transform/normalization

16209005996260000

#log #scaling computes the #log of your values to compress a wide #range to a narrow #range. #log_scaling is helpful when a handful of your values have many points, while most other values have few points. This data #distribution is known as the #power_law #distribution. Movie ratings are a good example. In the chart below, most movies have very few ratings (the data in the tail), while a few have lots of ratings (the data in the head). #log_scaling changes the #distribution, helping to improve linear model performance.

16209006981810000

#z_score is a #variation of #scaling that represents the number of #standard_deviations away from the #mean. You would use z-score to ensure your #feature distributions have #mean = 0 and std = 1. It’s useful when there are a few #outliers, but not so extreme that you need #clipping.

16209007620050000

#transformation of #numeric #features into #categorical #features, using a set of #thresholds, is called #bucketing (or #binning) - creating #buckets

16209010885400000

creating #buckets that each have the same number of points. This technique is called #quantile_bucketing.

16209023432250000

when we represent a #categorical #value with a #number it's called a #vocabulary

16209035004460000

#one_hot_encoding represents #numeric #values as #vectors - which can then be further compressed with #sparse_vector

16209035992210000

#grouping #un#labeled #examples is called #clustering. As the examples are un#labeled, #clustering relies on #unsupervised #machine_learning. If the examples are #labeled, then #clustering becomes #classification.

16209055025390000

#hierarchical_clustering creates a tree of #clusters. #hierarchical #clustering, not surprisingly, is well suited to #hierarchical #data, such as #taxonomies.

16209067637480000

#distribution_based_#clustering This #clustering approach assumes #data is composed of #distributions, such as#gaussian_#distributions. Then it #clusters them accordingly.

16209068329910000

#density_based_clustering connects areas of high example #density into #clusters. This #clustering allows for arbitrary-shaped #distributions as long as dense areas can be connected. These algorithms have difficulty with data of varying #densities and high #dimensions and also with #outliers.

16209068870730000

#centroid_based_#clustering organizes the data into #non_#hierarchical_clusters, in contrast to #hierarchical #clustering defined below. #k_means is the most widely-used centroid-based #clustering #algorithm.

16209069363990000

In order to perform #clustering we need to quantify the #similarity between examples by creating the #similarity_metrics for our #dataset

16209070409100000

for #data #processing we need to create #quantile or use #quantile_bucketing when the #distribution is #poisson — neither #gaussian nor #power_law.

16209071758680000

when the #distribution is #gaussian we can #normalizing our #data

16209072190990000

when the #distribution is #power_law we might want to use #log_scaling #normalizing for our data

16209072519520000

we can do either #manual #similarity or #supervised #similarity. you switch to a #supervised_similarity_measure when you have trouble creating a #manual_similarity_measure.

16209074216290000

#mean_square_error shows the #average squared #loss for an #example

16209075921300000

we can calculate #similarity by calculating a root #mean_square_error or the #sums of the #features (e.g. size and price). the lower the value, the higher is the similarity.

16209117142860000

For #categorical_data we can calculate #similarity using #jaccard_similarity which shows the proportion of intersection between the #sets

16209118552620000

#k_means groups points into #clusters by minimizing the #distances between points and their #cluster’s #centroid (as seen in Figure 1 below). The #centroid of a #cluster is the #mean of all the points in the #cluster.

16209172946260000

Instead of comparing manually-combined #feature #data, you can reduce the #feature #data to #representations called #embeddings, and then compare the #embeddings

16209181461320000

#embeddings are generated by training a #supervised deep neural network (#dnn) on the #feature data itself. The #embeddings_map the #feature data to a #vector in an #embedding_space. Typically, the #embedding_space has fewer dimensions than the #feature data in a way that captures some #latent #structure of the #feature data set.

16209184218960000

A #dnn that learns #embeddings of #input data by predicting the #input data itself is called an #autoencoder. An #autoencoder is the simplest choice to generate #embeddings. However, an #autoencoder isn't the optimal choice when certain features could be more important than others in determining #similarity.

16209185274000000

Since this #dnn predicts a specific input #feature instead of predicting all input #features, it is called a predictor #dnn

16209185559560000

To train the #dnn, you need to create a #loss_functionby following these steps: 1) calculate the #loss for every #output of #dnn. For #numeric outputs use #mean_square_error, for #categorical use #log_loss, for #multivalent #categorical use #softmax_cross_entropy (#entropy) loss.

16209187003980000

in #poisson distribution the #decay happens much faster than in the #power_law #distribution — if in #power_law you have a significant number of nodes the #tail, then in #poisson you only have a few.

16217118970240000

A #similarity measure takes the

16219366986400000

#embeddings generated by our neural network (#dense_features) and returns a number measuring their #similarity.

16219366986400002

To calculate #similarity we have 3 measures to choose from: #euclidian_distance (substraction of vectors), #cosine_distance (cosine of the angle between the vectors) and the #dot_product (cosine multiplied by the lengths of both vectors)

16219369496420000

In contrast to the #cosine_distance, the #dot_product is proportional to the #vector #length. This is important because examples that appear very frequently in the training set (for example, popular YouTube videos) tend to have embedding #vectors with large #lengths. If you want to capture #popularity, then choose #dot_product.

16219370597570000

#cluster #cardinality is the number of examples per #cluster. We are looking for #outliers and if we do find them, this may indicate some interesting #patterns

16219382869270000

Cluster #magnitude is the sum of #distances from all examples to the #centroid of the #cluster. Similar to #cardinality, check how the #magnitude varies across the #clusters, and investigate #anomalies. and #outliers

16219383315200000

Notice that a higher #cluster #cardinality tends to result in a higher #cluster #magnitude, which intuitively makes sense. Clusters are #anomalous when #cardinality doesn't correlate with #magnitude relative to the other #clusters.

16219388739810000

#content_based_filtering Uses #similarity between items to #recommend items similar to what the user likes.

16219394205890000

#collaborative_filtering uses Uses #similarity between #queries and #items simultaneously to provide #recommendations.

16219394437700000

Both #content_based_filtering and #collaborative_filtering map each #item and each #query (or #context) to an #embedding_vector

16219395221480000

#recommendations - We again place our #users in the same #embedding_space to best explain the #feedback_matrix: for each (#user, #item) pair, we would like the #dot_product of the #user #embedding and the #item #embedding to be close to 1 when the #user watched the movie, and to 0 otherwise.

16219406780270000

The #dot_product of the #user_#matrix and #item_#matrix yields a #recommendation #matrix that contains not only the original user ratings but also #predictions for the movies that each user hasn't seen

16219414656450000

#matrix_factorization In math, a mechanism for finding the matrices whose #dot_product approximates a #target_matrix.

16219414963240000

#generative_adversarial_networks (#gans) are an exciting recent innovation in #machine_learning. #gans are #generative models: they create new #data_instances that resemble your #training_data. For example, #gans can create images that look like photographs of human faces, even though the faces don't belong to any real person.

16219422544780000

#gans achieve this level of realism by pairing a #generator, which learns to produce the target output, with a #discriminator, which learns to distinguish true data from the output of the #generator. The #generator tries to fool the #discriminator, and the #discriminator tries to keep from being fooled.

16219425157900000

#generative" describes a class of statistical models that contrasts with #discriminative models. #generative models can generate new #data #instances. #discriminative models #discriminate between different kinds of #data #instances.

16219446831380000

More formally, given a set of #data #instances X and a set of #labels Y: #generative models capture the joint #probability p(X, Y), or just p(X) if there are no #labels. #discriminative models capture the #conditional #probability p(Y | X).

16219447853780000

The #generator learns to generate #plausible #data. The generated instances become negative #training #examples for the #discriminator.

16219452949520000

The #discriminator learns to distinguish the #generator's #fake #data from #real #data. The #discriminator #penalizes the #generator for producing implausible results.

16219453284630000

Through #backpropagation, the #discriminator s #classification provides a signal that the #generator uses to update its #weights.

16219453926750000

The #discriminator in a #gan is simply a #classifier. It tries to distinguish real #data from the #fake #data created by the #generator.

16220231031990000

The #discriminator connects to two #loss functions. During #discriminator training, the #discriminator ignores the #generator #loss and just uses the #discriminator #loss.

16220237699500000

The #generator part of a #gan learns to create #fake data by incorporating #feedback from the #discriminator. It learns to make the #discriminator #classify its output as real.

16220238449500000

The #generator feeds into the #discriminator net, and the #discriminator produces the output we're trying to affect. The #generator #loss penalizes the #generator for producing a sample that the #discriminator network classifies as #fake.

16220239748070000

#research has suggested that if your #discriminator is too good, then #generator training can fail due to #vanishing_gradients. In effect, an optimal #discriminator doesn't provide enough #information for the #generator to make #progress.

16220244125990000

#wasserstein_loss: The #wasserstein_loss is designed to prevent #vanishing_gradients even when you train the #discriminator to #optimality.

16220244428670000

#convolutional_neural_network (#cnn) could be used to progressively extract higher- and higher-level #representations of the image #content.

16220254474620000

#backpropagation is a process of calculating the #gradient for the #neural_network. it is used to see how to bring the #loss_function to the minimum, in which direction the learning should take place.

16223812737800000

in the process of #learning each #layer of a #neural_network will have #neurons with #weights ascribed to them, which enable #differentiation of various #features

16223813895450000

The #weights of the neurons combine through #layer using the #activation_function (e.g. #sigmoid or #relu), which then leads to only a certain neuron at the last layer to get activated. We can say that first #layer detects some general #features, the next one — more specific ones, and so on. but this is not the case.

16223816616590000

×

× ⚌

The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.

The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.

Modularity | 0 |
Influence Distribution | 0 % |
---|

Topics | Nodes in Top Topic | Components | Nodes in Top Comp |
---|---|---|---|

0 |
0 % |
0 |
0 % |

Nodes | Av Degree | Density | Weighed Betweenness |

0 |
0 |
0 |
0 |

Degree Distribution:

? switch to lineardistribution (based on kolmogorov-smirnov test) ?

narrative fractality: | alpha exponent: (based on Detrended Fluctuation Analysis of influence) ?

Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.

This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.

This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.

If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.

Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.

Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.

We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify the fractality of this time series (using alpha exponent, closely related to Hurst exponent): uniform (pulsating | alpha <= 0.65), regular (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).

For maximal diversity and plurality, the narrative should be close to "fractal". For poetry — "complex". For ideological texts — "uniform".

For maximal diversity and plurality, the narrative should be close to "fractal". For poetry — "complex". For ideological texts — "uniform".

× ⚌

Reset Graph Export: Show Options

*loading...*

*loading...*

LDA works only for English-language texts at the moment. More support is coming soon, subscribe @noduslabs to be informed.

?

Main Topical Groups:

N/A

+ ⤓ ?The topics are the nodes (words) that tend to co-occur together in the same context (next to each other).

We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph and are given a distinct color.

We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph and are given a distinct color.

Most Influential Elements:

N/A

+ ⤓ ↻ ?We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.

Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.

Click the Reveal Non-obvious button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.

The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.

Reset Graph Export: Show Options

Action Advice:

N/A

Structural Gap

(ask a research question that would link these two topics):N/A

?A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.

This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.

Latent Topical Brokers

:
N/A

?These are the latent brokers between the topics: the nodes that have an unusually high rate of influence (betweenness centrality) to their freqency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.

These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

These are usually brokers between different clusters / communities of nodes, playing not easily noticed and yet important role in this network, like the "grey cardinals" of sorts.

Emerging Keywords

N/A

Evolution of Topics

(frequency over time)
↻

Main Topics

(according to Latent Dirichlet Allocation):Most Influential Words

(main topics and words according to LDA):LDA works only for English-language texts at the moment. More support is coming soon, subscribe @noduslabs to be informed.

Network Statistics:

Top Relations / Bigrams

(both directions):
The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default as it allows us to better detect general patterns.

As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).

As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).

Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):

Find a market niche for a certain product, category, idea or service: what people are looking for but cannot yet find*

Please, enter your query to visualize the search results as a graph, so you can learn more about this topic:

Enter a search query to analyze the Twitter discourse around this topic (last 7 days):