Tip: use the form below to save the most relevant keywords for this search query. Or start writing your content and see how it relates to the existing search queries and results.
Tip: here are the keyword queries that people search for but don't actually find in the search results.
Welcome to this new episode of The Context. Today, I want to talk to you about Jolting AI. The increasing rate of acceleration in artificial intelligence applications. I often talk about accelerating change. Which measures the rate of change in a given period of time and then comparing the rates of change tries to interpolate and to understand what are the phenomena underlying it.
The visual representation in various kinds of charts of this rate of change. Of course depends on what you want to highlight. When the rate of change is very small and the variation in the rate of changes also small then perhaps a linear chart is going to be fine. On the y-axis, you will have units of whatever you want to represent: 1, 2, 3, 4, 5. However, when we have an accelerating rate of change typically we use a logarithmic chart where the y-axis represents orders of magnitude. Every unit is going to be an increasing order of magnitude: 1, 10, 100, 1,000 and so on. So when we are talking about an accelerating change the mathematical function that embodies this is the exponential function. For example, two to the power of x. And the exponential function will be a rapidly increasing curve, when represented on a linear chart. But when represented on a logarithmic chart the exponential function will be a line.
Now when we are talking about the technologies that enable society to take advantage of various types of innovation, which is in turn analyzed and expressed in terms of accelerating technological change, we must of course reference the law of accelerating returns that was formulated by Ray Kurzweil in his book of 1999, The Age Of Spiritual Machines. And then of course again further analyzed and and represented in various ways in for example, The Singularity Is Near.
What Ray formulates is that the traditional understanding that we have, a diminishing return from increasing investment in a given technology in a given industry, is true on a small scale, but if we look at a larger scale we can actually observe the opposite. An equal investment will generate a higher than expected return. So how can both of these be true?
What happens is that we have for any given technology the traditional S curve of a technology be being experimented with, then an ever increasing understanding of how the technology can be exploited and applied. When we squeeze out every possible advantage of the technology, we will have a plateau where effectively further investments are not going to provide the kind of returns that we expect and this S curve is for any given single technology.
But the law of accelerating returns, the exponential curves that we talk about when we talk about accelerating technologies looks at a successive technologies substituting each other and designing the curve that we are finally looking at. One of the most famous examples of using this kind of paradigm is more slaw that formulated over 50 years ago said that electronic circuits would.
Double the density of their components every 18 months and then adjust it to every two years. And this is not a natural law, it is a self fulfilling prophecy a projection of our desires and expectation of of our abilities. In order for many competing teams around the world to strive to be the first to arrive at a given breakthrough and.
Ray actually generalized this and looked. At many technologies that preceded the sixties of the 20th century when Gordon Moore in formulated his observation he looked at electromechanical relays he looked at the old tubes he looked at many things that went through their individual as curves and then smoothly one on after another fulfilled the same kind of expectation of doubling the competition.
Al power in a given period of time, that could have been. More or less what more slaw also formulated. So more slaw applies to transistors that are then put together in integrated circuits that in turn form the CPUs the central processing units that we have in our computers in our mobile phones in our servers where we connect.
Over the internet when we browse web servers and so on. And an important. Additional. Component in understanding what is going on in the world of technology is what is called the innovators dilemma? Formulated by Clayton Christensen who recently passed away. It is the pretty dramatic decision that an industry.
Leader has to make to stop serving its current set of customers. And invest instead in the development of the technologies that when put into a given set of products will serve their future customers and the dilemma is in the fact that any investment in that future is subtracting resources in investing in the already successful present.
And the short-sightedness of those leaders that can't understand the necessity of this. Stems from the fact that if they don't do it, somebody else is going to do it so they will be disrupted and they will stop being the leader of that future generation of products. And it is indeed the case that it is very difficult to disrupt oneself it is extremely difficult to say I will take away resources from improving my current generation of products because I realize that I have to embrace a new maybe unproven technology.
Because I understand I believe in the fact that it is going to be. An essential technology of the leading solutions of tomorrow. So. Even though. Intel is the leader an undisputed leader of the era of CPU computing of personal computers of servers intel is not a leader in the next generation of computing that we are already seeing around us which is based on GPUs in graphical processing units the leader of that is Nvidia.
GPUs are made of transistors just like CPUs but their architecture is massively parallel, they are optimized rather than executing an arbitrary kind of program that is taken sequentially their architecture is optimized for executing those kinds of programs that can be broken down in hundreds or thousands of or millions of similar parts that are executed.
Simultaneously across the architecture of the chip. The simplistic example of these kinds of programs are the video games that we play where calculating and designing and representing the scene whether we are fighting against aliens or driving in a car simulator in a racing game or any other kind of graphically intensive task.
The kind of calculations that the computer has to make are basically the same pixel by pixel not the result of the calculations but a kind of calculations. Are in some other companies recognized this early and they created chips specialized for these graphically intensive tasks. And they became important in that they became a leader in that.
Now. It is today. Frequently reported in mainstream media articles, or even some specialized articles that more slaw is ending. And that is mistakenly equated to innovation in computers ending as well. Now, the era of traditional CPUs becoming ever more powerful may be. Progressing towards an end for many reasons and but the age of innovation in computers is definitely not.
So let's get back to what I started with. The increasing rate of acceleration in artificial intelligence. About ten years ago. It was observed and then fully embraced that the then leading type of AI. Architecture was approachable efficiently by using GPUs. That kind of approach is still the leading approach today.
It is a subset of machine learning called artificial neural networks and especially deep learning which is the type of artificial neural networks where there are many many layers hundreds or thousands or tens of thousands of layers connecting the inputs with the outputs. And each of these layers makes some calculation on the data and passes the calculation on to the next layer and the calculation and the optimization and then the execution of the optimized what we say the trained neural network is extremely efficient.
If rather than run on traditional CPUs if run on GPUs instead. So what? Is happening is that there is a progressive learning curve. That is applied not only within a given technology but across generations of technologies and this learning curve enables the acceleration of change except that we are not talking about the mirror acceleration of change anymore we are talking about an increasing rate of acceleration that derives from the learning curve being applied across many generations of technologies.
And the rapid rapid coming together and employment of that learning by not only the specialists in hardware, but also the specialists in software in infrastructure architectures and so on. Actually in terms of technology generations, we are now talking about specialized. AI chips that go beyond optimizing the architecture of the chip and the integrated circuit using cell transistors.
Not only recognizing that the parallel nature of GPUs is is great but that we can go beyond and implementing in hardware the kind of calculations that artificial neural networks apply the in deep learning need achieving either even greater results, so for example. Google designed such a chip and they are calling it TPUs.
Tensor processing units from the name of the mathematical calculations that these specialized the chips have to execute so we we went from CPUs to GPUs and now specialize the AI chips such as for example, the TPUs. Now. A few months ago. Stanford University published their. Hundred plus page report on the state of the AI industry and on page.
I think 65 or something of that report they. Published a chart that. Represented how the rate of doubling in the performance of the computer infrastructures if we take into account the infrastructure available for artificial intelligence applications. Changed. From following more slaw for the past 50 years to following a different curve over the course of the past 10 years.
And they calculated the amount of computation available compute in terms of the global infrastructure that a given set of problems require. Expressed in petaflop per second days and this unit of measure is similar to what you would look at in the energy consumption of your home where the power available is expressed in kilowatt and kilowatt hours is the amount of energy that your house consumes and and needs in order to function.
So a typical Western European house has three kilowatt of power available and then if you turn on your washer and dryer and your hair dryer and your dishwasher at the same time. You often end up exceeding that power available and and your energy provider will trigger a breaker circuit and you will realize oh my god, I have to turn some appliance off.
So similarly to that we can look at what are the applications that we can practically and usefully attack. At the availability of a given power in computation, how many petaflops per second we can deploy. If we need 10,000 years to complete a task, we will just not do it.
If we can train a neural network to solve a given challenge fast enough, and then that neural network can be applied to the task after being trained usefully, all of this within budget and within a given amount of time then we can actually solve problems that were unsolvable before.
So if we were on the curve of Moore’s law exponential on a linear chart linear on a logarithmic chart. Over the course of the past eight ten years, we would have seen. A 7-fold improvement in this availability of compute and in the type of problems that as a consequence we are able to attack.
Instead as Stanford University mapped the availability of. Computer power and the application of that computer power how much we were able to dedicate to a given problem set. They saw that over the course of the past eight years which they went to look. We had a three hundred thousand fold improvement.
So they took a given route and together with Sanford University, also OpenAI, another organization dedicated to the analysis and implementation of advanced artificial intelligence applications looked at this dataset and they said okay, let's do a linear interpolation of the first set and then another linear interpolation of the second set. They concluded that the doubling used to be two years according to Moore’s law before, and now the doubling is between three-four months.
Now, why not? That kind of simplistic approach is possible. But I propose a little more sophisticated approach rather than doing a linear interpolation, we can draw an exponential curve on the logarithmic chart and say that what we are looking at is the increase in the rate of acceleration of our computing infrastructure when we take into account the latest software and hardware architectures.
That is why I say that AI is jolting. Jolt is the first derivative of acceleration. Jolt represents an increasing rate of acceleration and what we are seeing today, is that AI is jolting. So what are the consequences? What does this imply? First of all that we can expect, even within the current set of applications, potentially the rate to increase further. But if my paradigm is correct, what we have to watch out for is that there will be an even more important increase in the availability of compute for next generation applications. Will that doubling increase be one month instead of three, will that be one week? What will that mean and when will that be available?
We have to study the numbers better we have to try and forecast what kind of software and hardware components are going to be available, but I expect that we will have this new disruption. Somebody like Stanford or OpenAI will once again simplistically represent it through another linear interpolation, but it is more appropriately represented by an exponential curve on the logarithmic chart.
Quantum computers are going to be applied to AI problems or maybe the reverse when we will use AI systems to design better quantum computers. There are already teams that are studying what each of these could be, what does it mean to design a neural network that runs natively rather than on GPUs or TPUs AI chips, that runs on quantum computers.
Quantum computers are so massively parallel as to require an entirely new understanding of how the universe works. Or rather multiple universes since one of the interpretations of quantum phenomena is the multiverse view of the universe. What does it mean to structure an AI application such that the output of that AI application is a better quantum computer? Most likely that AI application will already be running on a quantum computer.
The technological singularity is the hypothetical moment in time in a future when the rate of change in the world is such that unaided humans are unable to comprehend it and whether it comes from self-modifying artificial intelligence leading to the so-called intelligence explosion, whether it comes from other factors, it feels a little bit like we may starting to be there.
Because we haven't been designing microchips with pencil and paper for a long time. We haven't been programming line by line for the past 10 years. First we gave up designing hardware ourselves, we used computer aided design, we used computers to design hardware.
Now for the past 10 years we gave up, willingly so in order to be more effective and more efficient, designing software. We use neural networks that are designing the software instead. So, when this comes together, and software designing hardware designing software is going to be applied to a rapidly increasing set of problems, that is in many ways what we can call the singularity.
— modulates diversity of the discourse network how it works?
The score is calculated based on how modular the structure of the graph is (> 0.4 means the clusters are distinct and separate from one another = multiple perspectives). It also takes into account how the most influential nodes are dispersed among those clusters (higher % = lower concentration of power in a particular cluster).
Actionable Insight:
N/A
We distinguish 4 states of variability in your discourse. We recommend that a well-formed discourse should go through every stage during its evolution (in several iterations).
1 - (bottom left quadrant) — biased — low variability, low diversity, one central idea (genesis and introduction stage). 2 - (top right) - focused - medium variability and diversity, several concepts form a cluster (coherent communication stage). 3 - (bottom right) - diversified — there are several distinct clusters of main ideas present in text, which interact on the global level but maintain specificity (optimization and reflection stage). 4 - (left top) — dispersed — very high variability — there are disjointed bits and pieces of unrelated ideas, which can be used to construct new ideas (creative reformulation stage).
Shows to what extent you explored all the different states of the graph, from uniform and regular to fractal and complex. Read more in the cognitive variability help article.
You can increase the score by adding content into the graph (your own and AI-generated), as well as removing the nodes from the graph to reveal latent topics and hidden patterns.
The topical clusters are comprised of the nodes (words) that tend to co-occur together in the same context (next to each other).
We use a combination of clustering and graph community detection algorithm (Blondel et al based on Louvain) to identify the groups of nodes are more densely connected together than with the rest of the network. They are aligned closer to each other on the graph using the Force Atlas algorithm (Jacomy et al) and are given a distinct color.
The most influential nodes are either the ones with the highest betweenness centrality — appearing most often on the shortest path between any two randomly chosen nodes (i.e. linking the different distinct communities) — or the ones with the highest degree.
We use the Jenks elbow cutoff algorithm to select the top prominent nodes that have significantly higher influence than the rest.
Click the Reveal Underlying Ideas button to remove the most influential words (or the ones you select) from the graph, to see what terms are hiding behind them.
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.
A structural gap shows the two distinct communities (clusters of words) in this graph that are important, but not yet connected. That's where the new potential and innovative ideas may reside.
This measure is based on a combination of the graph's connectivity and community structure, selecting the groups of nodes that would either make the graph more connected if it's too dispersed or that would help maintain diversity if it's too connected.
(concepts with the highest influence / frequency ratio) ?
These nodes have unusually high rate of influence (betweenness centrality) to their frequency — meaning they may appear not as often as the most influential nodes but they are important narrative shifting points.
These are usually effective entrance points into the discourse, as they link different topics together and have high inlfuence, but not too many connections, which makes them more accessible.
The chart shows how the main topics and the most influential keywords evolved over time. X-axis: time period (split into 10% blocks). Y-axis: cumulative number of occurrences.
Drag the slider to see how the narrative evolved over time. Select the checkbox to recalculate the metrics at every step (slower, but more precise).
LDA stands for Latent Dirichlet Allocation — it is a topic modelling algorithm based on calculating the maximum probability of the terms' co-occurrence in a particular text or a corpus.
We provide this data for you to be able to estimate the precision of the default InfraNodus topic modeling method based on text network analysis.
Most Influential Words
(main topics and words according to LDA):
loading...
We provide LDA stats for comparison purposes only. It works with English-language texts at the moment. More languages are coming soon, subscribe @noduslabs to be informed.
We analyze the sentiment of each statement to see whether it's positive, negative, or neutral. You can filter the statements by sentiment (clicking above) and see what kind of topics correlate with every mood.
Use the Bert AI model for English, Dutch, German, French, Spanish and Italian to get more precise results (slower). Standard model is faster, works for English only, is less precise, and is based on a fixed AFINN dictionary.
Concept Relation Analysis:
please, select the node(s) on the graph or in the table below to see their connections...
Use this feature to compare contextual word co-occurrences for a group of selected nodes in your discourse. Expand the list by clicking the + button to see all the nodes your selected nodes are connected to. The total influence score is based on betweenness centrality measure. The higher is the number, the more important are the connections in the context of the discourse.
Top Relations in 4-grams
(bidirectional, for directional bigrams see the CSV table below):
The most prominent relations between the nodes that exist in this graph are shown above. We treat the graph as undirected by default. Occurrences shows the number of the times a relationship appears in a 4-gram window. Weight shows the weight of that relation.
As an option, you can also downloaded directed bigrams above, in case the direction of the relations is important (for any application other than language).
The higher is the network's structure diversity and the higher is the alpha in the influence propagation score, the higher is its mind-viral immunity — that is, such network will be more resilient and adaptive than a less diverse one.
In case of a discourse network, high mind-viral immunity means that the text proposes multiple points of view and propagates its influence using both highly influential concepts and smaller, secondary topics.
We recommend to try to increase mind-viral immunity for texts that have a low score and to decrease it for texts that have a high score. This ensures that your discourse will be open, but not dispersed.
The higher is the diversity, the more distinct communities (topics) there are in this network, the more likely it will be pluralist.
The network structure indicates the level of its diversity. It is based on the modularity measure (>0.4 for medium, >0.65 for high modularity, measured with Louvain (Blondel et al 2008) community detection algorithm) in combination with the measure of influence distribution (the entropy of the top nodes' distribution among the top clusters), as well as the the percentage of nodes in the top community.
We recommend to aim for Diversified structure if you're in the Biased or Focused score range and to aim for the Focused structure if you're in the Dispersed score range.
The chart above shows how influence propagates through the network. X-axis: lemma to lemma step (narrative chronology). Y-axis: change of influence.
The more even and rhythmical this propagation is, the stronger is the central idea or agenda (see alpha exponent below ~ 0.5 or less).
The more variability can be seen in the propagation profile, the less is the reliance on the main concepts (agenda), the stronger is the role of secondary topical clusters in the narrative.
propagation dynamics: | alpha exponent: (based on Detrended Fluctuation Analysis of influence) ?show the chart
We plot the narrative as a time series of influence (using the words' betweenness score). We then apply detrended fluctuation analysis to identify fractality of this time series, plotting the log2 scales (x) to the log2 of accumulated fluctuations (y). If the resulting loglog relation can be approximated on a linear polyfit, there may be a power-law relation in how the influence propagates in this narrative over time (e.g. most of the time non-influential words, occasionally words with a high influence).
Using the alpha exponent of the fit (which is closely related to Hurst exponent)), we can better understand the nature of this relation: uniform (pulsating | alpha <= 0.65), variable (stationary, has long-term correlations | 0.65 < alpha <= 0.85), fractal (adaptive | 0.85 < alpha < 1.15), and complex (non-stationary | alpha >= 1.15).
For maximal diversity, adaptivity, and plurality, the narrative should be close to "fractal" (near-critical state). For fiction, essays, and some forms of poetry — "uniform". Informative texts will often have "variable + stationary" score. The "complex" state is an indicator that the text is always shifting its state.
Using this information, you can identify whether the network has scale-free / small-world (long-tail power law distribution) or random (normal, bell-shaped distribution) network properties.
This may be important for understanding the level of resilience and the dynamics of propagation in this network. E.g. scale-free networks with long degree tails are more resilient against random attacks and will propagate information across the whole structure better.
If a power-law is identified, the nodes have preferential attachment (e.g. 20% of nodes tend to get 80% of connections), and the network may be scale-free, which may indicate that it's more resilient and adaptive. Absence of power law may indicate a more equalized distribution of influence.
Kolmogorov-Smirnov test compares the distribution above to the "ideal" power-law ones (^1, ^1.5, ^2) and looks for the best fit. If the value d is below the critical value cr it is a sign that the both distributions are similar.
Please, enter a search query to visualize the difference between what people search for (related queries) and what they actually find (search results):
Please, enter a search query to discover what else people are searching for (from Google search or AdWords suggestions):
Compare informational supply (search results for your query) to informational demand (what people also search for) and find what's missing:
Please, enter your query to visualize Google search results as a graph, so you can learn more about this topic:
Enter a search query to analyze the Twitter discourse around this topic (last 7 days):
Enter a topic or a @user to analyze its social network on Twitter: