Innovation in LLMs - ARPA Elastic Solutions

17 May 2023

Our article on GhatGPT, published in the previous newsletter, was so well received that we decided to do a follow-up. After all, there are increasingly more news being published, and some of them are approaching something that could be useful for businesses.

To understand how we can utilize this tool in our businesses, it is easier to comprehend what ChatGPT is and its advantages and limitations.

The core of ChatGPT is a Large Language Model (LLM), trained with an enormous amount of web information, with the main objective of predicting (or generating) the next words given a certain context (i.e., words it has already seen). We can think of an LLM as a large table where millions of rules are annotated, obtained through statistical analysis of millions of texts written by humans. It is these rules that assist the model in predicting the most appropriate next word to complete a sentence.

Creating simple language models, smaller in size, by analyzing a set of texts and calculating the probability of a specific word appearing after other words without using Deep Learning technology is straightforward. However, these models have scalability limitations and do not generalize well to sequences that are not in the table. Additionally, they occupy a lot of space and require long training times for many texts.

ChatGPT and other modern language models use a different technique to store these rules: multilayer neural networks. Neural networks store many probabilistic rules in relatively little space, and algorithms have been developed to create these tables more efficiently in terms of processing time. This allows the model to be trained with more data and use a larger context to predict the next word.

One of the most significant advancements in this field was the creation of a highly scalable architecture called “Transformer,” which includes an “attention” mechanism to select the appropriate context for each rule.

Before Transformers (which emerged in 2019), it was impossible to train models like GPT-3 in terms of processing time and the ability to retain relevant rules. In addition to adopting this architecture, the model’s reliability can be improved through a series of “tricks” that are part of the closely guarded secrets of researchers.

However, neural networks can only store numerical rules, which means we need a way to represent words using numbers. Embeddings, which preceded Transformers by several years, revolutionized the field by allowing the representation of words with sets of numbers, where “similar” words receive “similar” numbers.

The similarity between words in embeddings goes beyond phonetics or lexicon. It is also related to the probability of word X appearing in the same context as word Y, meaning surrounded by “similar” words.

This representation helps reduce the number of rules we need to store in the network, as we can have rules for word classes instead of individual words. On the other hand, the trade-off is that the rules are less precise, which can lead to unexpected results (“errors”).

With these foundational technologies (Transformers + Tricks + Embeddings), it has finally become possible to “compress” an enormous number of rules, created based on a colossal number of embeddings, with a tremendous context, and utilizing an unimaginable number of texts (in fact, almost the entire web until 2021, in the case of GPT-3).

ChatGPT (as well as other similar language models) is the result of a brute force approach, which is only viable for companies with considerable financial resources.

However, as mentioned earlier, this “compression” has its limitations. GPT-3 cannot accurately reproduce the texts it was trained on (the ones it analyzed to create the rule table), meaning it cannot provide reliable factual information. This makes it unsuitable for applications that rely on such information. It’s not a good idea to ask ChatGPT what medication to take for a certain set of symptoms without validating the result with a doctor…

However, this weakness is also a strength when it comes to tasks where creativity is valued by humans.

In fact, creativity is the ability to do something unexpected, to generate things that haven’t been seen before, to create new things based on what we already know. From this perspective, ChatGPT is quite “human,” and that’s why it has been the success it is.

We are not accustomed to seeing a computer generate new things that make sense to us humans. Because of this, we attribute human characteristics to ChatGPT, such as will or emotions, which it does not yet possess…

Creativity is the ability to do something unexpected, to generate things that have not been seen before, to create new things, always building upon what we already know. From that perspective, ChatGPT is quite “human,” and that’s why it has been the success that it is.

