The evolution of "Consultation 2.0", standing in front of the large model layout of SenseTime

We are experiencing a massive wave of AI new infrastructure.

Within half a year, the large-scale model rapidly spread from a small-scale consensus. According to the report released by CITIC, the number of large-scale models with more than 1 billion parameter models has been released so far is close to 80, half of which come from enterprises and half from scientific research institutions.

In the process of the gradual formation of the domestic large-scale model ecology, it has also begun to shed its pursuit of OpenAI and gradually find its own path. The standard for measuring the success of large models has also changed from the parameter competition of hard bridges and hard horses to the real problem solving.

SenseTime announced the large-scale model system of "SenseNova" for the first time in April this year, and released a number of large-scale AI models and applications including the self-developed Chinese large-scale language model "SenseChat". Recently at the World Artificial Intelligence Conference, SenseTime announced the first major iteration of the "Daily New SenseNova Large Model" system. The large language model "negotiation" was upgraded to version 2.0.

It's stronger. In the entire SenseTime large-scale model layout system, its role is becoming more and more obvious.

Stronger "Negotiation 2.0"

How to visually reflect the capability improvement of "Consultation 2.0"? Xu Li, chairman and CEO of SenseTime, demonstrated a non-existent dialogue between Lao Tzu and Confucius.

The answer to "Consultation 2.0" revolves around "Tao". Confucius asked Lao Tzu. Although Lao Tzu had enlightened, he could not talk to Confucius, so he just walked away. The dialogue performed in this scene is smooth and flowing. "Discussion 2.0" even added a joke to the text:

Confucius said: "I have heard the name of Master, and it is indeed a great fortune to meet you today!"

Lao Tzu said with a smile: "No, I am walking on the same path as you, how come the 'three lives'?"

And according to the question, the whole dialogue appears in classical Chinese. And in order to avoid confusion, "Consultation 2.0" also stated the premise of "this is just a fiction and should not be regarded as a true record of history" in the first sentence of the answer.

When "Consultation 1.0" was first launched, the on-site demonstration has demonstrated its excellent multi-round dialogue and human-machine co-creation capabilities. Three months later, "Consultation 2.0" has made more improvements in the accuracy of knowledge information, logical judgment ability, context understanding ability, and creativity.

For example, use "Consultation 2.0" to make travel planning, and tell it to make a table:

Or put to the test the thing about "girlfriends are right":

Not only can you understand girlfriends, but "Discussion 2.0" can also read a bit of irony or yin and yang tone:

What happened to "Consultation 2.0" in the past three months, in fact, just look at the results of a few exams. In the evaluation results of three authoritative large language model evaluation benchmarks (MMLU, AGI, C-) worldwide, the performance of "Consultation 2.0" has exceeded ChatGPT.

In addition, some people may have noticed in the demonstration photos of the dialogue between Lao Tzu and Confucius that "Shangshang 2.0" has a split-screen demonstration of XL and S versions. There are many large models with different parameters and sizes for customers to choose, and the model version with the smallest parameters can even run on mobile terminals.

In terms of language, "Consultation 2.0" has added new languages such as Arabic and Cantonese. Support the interaction between Simplified Chinese, Traditional Chinese and English and other languages. And "Consultation 2.0"'s support for super-long texts has also been increased from 2k to 32k, enabling a better understanding of the context.

For ToB-oriented large-scale model manufacturers such as SenseTime, the quality of the large-scale model itself is only the starting point. How can enterprise customers define a specific outline for the large-scale model based on their own needs, and how can the latter achieve a stable iterative process and approach it step by step? The real pain point is where the winner will be decided.

Open Knowledge Base Fusion Capabilities

After SenseTime has trained a "Consultation 2.0" with super understanding, dialogue, reasoning and other abilities, corporate customers can also use their accumulated corporate knowledge to turn the big model into a "professional talent" who can serve their own companies well. .

How to efficiently solve these engineering problems is very important.

The "Consultation 2.0" launched by SenseTime has added a knowledge base integration interface, allowing enterprises to quickly acquire professional knowledge and capabilities without waiting for iterative upgrades of the basic large model. After the knowledge base is integrated, the ability of the model to update and understand knowledge can be enhanced, and the rapid understanding and acquisition of knowledge can be strengthened. At the same time, the cost of customer training models will be greatly reduced.

Wang Xiaogang, co-founder and chief scientist of SenseTime, said: "With the knowledge base, it is relatively simple and convenient to summarize the corresponding knowledge in this field without entering into our model itself", and because the information is more accurate , also solved the problem of hallucinations.

Digital Human as a Productivity Tool

At the same time as the comprehensive upgrade of "Consultation 2.0", the capabilities of the AIGC platform in the "SenseNova Large Model" system are constantly breaking through, and after the integration of language large model capabilities, a leapfrog improvement has been achieved.

For example, the Wenshengtu creation platform "Miaohua" mentioned above has been upgraded to version 3.0 this time, the model parameters have been increased to the order of 7 billion, and the details of the generated pictures have reached the level of professional photography. As for the headache of prompt words, "Discussion 2.0" provides "Miahua 3.0" with the ability to automatically expand prompt words. This means that users only need a few simple prompt words to achieve a detailed image result.

In the field of digital humans, SenseTime's digital human video generation platform "Ruying" has also been upgraded to version 2.0. The voice and mouth fluency of "Ruying 2.0" have increased by more than 30%, and 4K video can be realized. Effect. At the press conference, the digital human images of economist Ren Zeping, Master Yancan and Xu Li appeared, and the effect was realistic enough.

In the landing scene of the large model, the digital human is a very important carrying method. The recent very popular digital human live streaming is a typical scene. Live streaming, including short videos, is also one of the most focused scenes for customers during the three-month internal and public testing of "Ruying 2.0".

Luan Qing, general manager of SenseTime’s Digital Entertainment Department, said that within the framework of AIGC, “Discussion 2.0” can undertake copywriting and script creation for short video live broadcasts. And how "Ronin 2.0" can keep up with the trend in communication also depends on the large language model ability of "Consultation 2.0" to learn the latest short video corpus.

In addition to short video and live broadcast scenes, "Ronin 2.0" is accelerating its entry into all walks of life.

For example, in the insurance industry, every insurance specialist has the need to promote new products or other personalized service-oriented content output for customers. "Ruying 2.0" can replace insurance specialists on customers' birthdays or when certain wealth management products are released. Personalized content and services; in the education industry, "Roning 2.0" has begun to assist teachers on top domestic vocational education platforms to produce educational materials to meet internal needs for video production.

"Digital Human is a typical efficiency tool within an enterprise." Luan Qing said.

As an AIGC creation platform, Ronin will continue to deepen in the field of video generation in the future. Luan Qing believes that this is because content creation is undergoing a dimensional change from text, pictures to videos.

Towards Multimodal

Since pictures and video information account for a huge proportion in the real world, far exceeding language information, the need for understanding the real world will make the future of the basic large-scale model move towards multimodality, which has been seen for the first time through "Consultation 2.0" Clue.

In addition to text, "Consultation 2.0" has the ability to analyze pictures and video content.

For example, as shown in the figure above, "Consultation 2.0" can identify specific objects in a messy desk photo, and combine the characteristics of each object to answer "what do you do when you feel hot?" This is close to process design open questions; or after seeing a menu photo, help users give a la carte options within a limited price range.

SenseTime, which initially entered the AI field from the research of computer vision and has crossed an AI wave, is more convinced that this wave of large models will be a real opportunity.

The current large-scale model research is based on the transformer network architecture. "SenseTime has been engaged in large-scale model research since 2019. At that time, it was the route to do vision." According to Wang Xiaogang, co-founder and chief scientist of SenseTime, some visual standards and natural language standards are gradually converging today. , "When we develop in a multimodal direction, language and vision begin to have a deeper integration, which reflects a relatively strong accumulation and ability in this area."

Many application scenarios we encounter in real life, such as in a series of fields such as autonomous driving and robotics, must be applied to multimodality. "However, multi-modal data and some tasks are often not easy to obtain and require deep industry accumulation. This is also the advantage of SenseTime." Wang Xiaogang introduced.

Three months after its first public appearance at this year's World Artificial Intelligence Conference, SenseTime's "Daily New SenseNova Large Model" system has been fully upgraded and opened to enterprise users. At the same time, many people have not noticed that Shangtang has also released a multi-modal large-scale model of scholars together with the Shanghai Artificial Intelligence Laboratory. In the future, it is worth looking forward to whether SenseTime can take the lead in finding the key to the multi-modal road.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)