What happens to a large language model (LLM) after training

View all sessions on demand from Smart Security Summit here.
Large language models (LLMs), or text comprehension and generation systems, have recently emerged as a hot topic in the field of AI. The release LLM by tech giants like OpenAI, Google, Amazon, Microsoft and Nvidiaand open source communities represent the high potential of the LLM field and represent a huge step forward in its development. However, not all language models are created equal.
In this article, we’ll look at the key differences between the methods of using LLM once they’ve been built, including open source products, products for internal use, platform products, and products. products on the platform. We’ll also explore the intricacies of each approach, as well as discuss how each might evolve in the years to come. But first, the bigger picture.
What are the major language models?
Common applications of LLM models range from simple tasks such as question answering, text recognition, and text classification, to more creative tasks such as text generation or code generation, research current AI capabilities and human-like conversational agents. The creative generation is certainly impressive, but more advanced products based on those models are yet to come.
What is the big deal about LLM technology?
The use of LLM has increased significantly in recent years as newer and larger systems are developed. One reason is that a single model can be used for a variety of tasks, such as text generation, sentence completion, classification, and translation. In addition, they appear to be able to make reasonable predictions given only a handful of labeled examples, the so-called “learn several times”.
Let’s take a closer look at the three different development paths available for LLM models. We will assess the potential drawbacks they may face in the future and offer potential solutions.
Open source
Open source LLMs are created as open collaboration software, with the original code and models freely available for redistribution and modification. This allows AI scientists to work on and use the model’s high-quality capabilities (for free) on their own projects, rather than limiting model development to a group of public companies. selected technology.
Some examples are Flower, Yalm and even Sales forceprovide fast and scalable support environments artificial intelligence/machine learning develop. Although open source development is by definition open for use by contributors, it incurs high development costs. Storing, training, and even fine-tuning these models is even more labor-intensive as it requires investment, expertise, and large volumes of specially connected GPUs.
Tech companies’ continued investment in and open source for these technologies may be driven by brand-related goals, such as demonstrating the company’s leadership in the field, or by other brands. more practical goals, such as discovering alternative value-added that the broader community can gain up with.
In other words, human investment and guidance is required for these technologies to be useful for business applications. Often, adaptation of models can be achieved through fine-tuning a certain amount of data that is labeled by humans or by constant interaction with developers and the results they generate from the models. Figure.
Product
The obvious leader here is openAI, created the most useful models and enabled some of them through the API. However, many smaller startups, such as CopyAI, JasperAI, and Contenda, are starting to develop their own LLM-enabled applications based on the “model as a service” led by industry leaders. This area offers.
As these smaller businesses compete for their respective market share, they leverage the power of supercomputing scale models, fine-tuning for the current task while using a much smaller amount of data. Their apps are typically trained to tackle a single task and focus on a much narrower and specific market segment.
Other companies develop their own models that compete with OpenAI, contributing to the advancement of the science of artificial intelligence. Examples include AI21, coherentand EleutheraAI’s GPT-J-6Bwhere models generate or classify text.
Another application of language models is code generation. Companies like OpenAI and GitHub (with the GitHub Copilot plugin based on OpenAI Codex), Tabnine and Kite produces tools for automated code generation.
For internal circulation only
Tech giants like Google, DeepMind, and Amazon keep their own versions of LLMs — some based on open source data — in-house. They research and develop their models to go further in the field of linguistic AI; to use them as classifiers for business functions such as moderation and social media classification; or to assist in the development of long tails for large collections of written requirements, such as creating ads and product descriptions.
What are the limitations of LLM?
We have discussed some of the disadvantages, such as high development and maintenance costs. Let’s dive a little deeper into more technical problems and potential ways to fix them.
Base on the research, the larger models generate incorrect answers, conspiracies, and unreliable information more often than the smaller models. For example, the GPT-J model with parameter 6B is 17% less accurate than a similar model with parameter 125M.
Since LLMs are trained in internet data, they can catch unwanted information social prejudices related to race, gender, ideology, and religion. In this context, conforming to different human values remains a particular challenge.
Provide open access to such models, such as in a Galaxy case, can be risky as well. Without preliminary human verification, models could inadvertently make racist remarks or inaccurate scientific statements.
Is there any solution to improve LLM?
Merely extending the models seems less promising in improving fidelity and avoiding obscene content than refining with training goals other than text imitation.
A bias or truth detection system with a supervised classifier that analyzes content to find pieces that match the definition of “bias” for a given case can be one way to correct these types of errors this. But that still leaves you with problems in model training.
The solution is data, or more specifically, large amounts of data labeled by humans. After providing the system with enough data samples and corresponding polygon annotations to locate obscene content, portions of the dataset that have been determined to be harmful or false will be removed or masked to prevent unauthorized access. use them in model outputs.
In addition to bias detection, human judgment can be used to evaluate texts based on their fluency and readability, natural languagegrammatical errors, coherence, logic and relevance.
Not quite AGI
Undoubtedly, recent years have seen some really impressive progress in AI language models, and scientists have been able to make progress in some difficult areas. best in this field. However, despite their advancements, LLMs still lack some of the most important aspects of intelligence, such as common sense, casualty detection, clear language detection, and intuitive physics.
As a result, some researchers are Make a question whether language-only training is the best way to build truly intelligent systems, no matter how much data is used. The language works well as a compression system to convey the essence of the message. But it’s hard to learn the specifics and contexts of human experience through language alone.
A system trained in both form and meaning — for example, on video, image, audio, and text simultaneously — can assist in advancing the science of natural language understanding. In any case, it will be interesting to see where developing robust LLM systems takes science. One thing, however, is hard to doubt: The potential value of an LLM is still significantly greater than what has been achieved so far.
Fedor Zhdanov is the head of ML at Toloka.
DataDecision makers
Welcome to the VentureBeat community!
DataDecisionMakers is a place where professionals, including technical people who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas and updates, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You can even consider contribute an article your own!