📢 Gate Square Exclusive: #WXTM Creative Contest# Is Now Live!
Celebrate CandyDrop Round 59 featuring MinoTari (WXTM) — compete for a 70,000 WXTM prize pool!
🎯 About MinoTari (WXTM)
Tari is a Rust-based blockchain protocol centered around digital assets.
It empowers creators to build new types of digital experiences and narratives.
With Tari, digitally scarce assets—like collectibles or in-game items—unlock new business opportunities for creators.
🎨 Event Period:
Aug 7, 2025, 09:00 – Aug 12, 2025, 16:00 (UTC)
📌 How to Participate:
Post original content on Gate Square related to WXTM or its
Layout multi-modal large-scale model: Tsinghua University team completed nearly 100 million yuan in angel round financing, led by Ant
Author: The Paper
Reporter Shao Wen
Shengshu Technology was established in March 2023. The core members are mainly from the School of Artificial Intelligence of Tsinghua University. It is one of the earliest teams in China to deploy multi-modal general large-scale models. This round of financing was led by Ant Group, followed by Baidu Ventures and Zhuoyuan Capital. The current valuation is US$100 million.
There are new trends in the development of domestic multi-modal large-scale models. On June 19, a new team led by Zhu Jun, a professor of computer science at Tsinghua University and vice president of the Institute of Artificial Intelligence, completed an angel round of financing of nearly 100 million yuan.
Pengpai Technology (I learned that this multi-modal large-scale model startup company named Beijing Shengshu Technology Co., Ltd. (hereinafter referred to as "Shengshu Technology") announced the completion of an angel round of financing of nearly 100 million yuan. The investment was led by Ant Group. Followed by Baidu Ventures and Zhuoyuan Capital, the current valuation is 100 million U.S. dollars. This round of financing will be mainly used for the construction of the core R&D team and accelerate the development of multi-modal large-scale models and application products.
A multimodal large model refers to a model that combines multimodal information such as text, image, video, and audio for training. Previously, OpenAI co-founder Ilya Sutskever (Ilya Sutskever) said, "The long-term goal of artificial intelligence is to build a multimodal neural network, that is, AI can learn concepts between different modalities, so as to better understand the world".
Shengshu Technology was established in March 2023. It was jointly incubated by Beijing Ruilai Smart Technology Co., Ltd., Ant Group and Baidu Venture Capital. Tang Jiayu, former vice president of Ruilai Smart and graduated from the Computer Department of Tsinghua University, served as CEO. It is used to create a controllable multi-modal general-purpose large model. It is reported that this is the first time that Ant Group has invested in a large-scale model company after the popularity of ChatGPT, and it is also Zhu Jun’s second venture after Ruilai Wisdom. Ruilai Wisdom is a provider of artificial intelligence infrastructure and solutions.
The core members of the Shengshu Technology team come from the Institute of Artificial Intelligence of Tsinghua University, mainly the research group led by Zhu Jun. The research group is committed to the basic theory and efficient algorithm research of Bayesian machine learning, and is one of the earliest teams in the world to study deep probabilistic generative models. In January 2022, the non-training reasoning framework Analytic-DPM proposed by the team was applied to the DALL E 2 model processing strategy by OpenAI. After that, the sampling algorithm DPM-Solver was proposed, which is now the world's fastest image generation algorithm by Stable Diffusion and other large numbers of Adopted by open source projects.
According to reports, Shengshu Technology is one of the earliest teams in China to lay out multi-modal general-purpose large-scale models. It open sourced the world's first Transformer-based multi-modal diffusion large-scale model UniDiffuser in early 2023. Complete various generation tasks such as image-based text generation, image-text joint generation, and image-text rewriting.
The Transformer model was launched by a team at Google in 2017. It is a deep learning model that can assign different weights according to the importance of each part of the input data. This model is mainly used in the fields of natural language processing (NLP) and computer vision (CV). Currently, major large models such as GPT are developed based on Transformer.
"On the whole, the current idea of making large-scale image generation models in the industry is the same, and they are all based on the diffusion model. Our innovation lies in modifying the underlying main network. It is the first to use Transformer in the Diffusion Model technology to achieve multi-mode attitude." Tang Jiayu said in an interview with the media recently.
Tang Jiayu believes that the models and products on the market at this stage only solve the problem of generability in the initial stage, but the generated results still have great uncertainty and uncontrollability. There are still big deficiencies, for example, it is difficult to accurately control the position and details of the elements in the generated image, and the generated 3D model is still at a relatively low level in terms of surface fineness and accuracy of color, light and shadow.
Shengshu Technology introduced to Pengpai Technology that in terms of 3D content generation, it has developed the industry's first technology for automatically generating 3D content based on three views, and Wensheng 3D content technology that does not require any 3D training data, and the effect can be finely detailed , can be close to industrial-level applications, "The large model trained has surpassed the latest version of the basic model of Stable Diffusion in terms of image generation, and is expected to catch up with the latest version of Midjourney within this year."
Stable Diffusion is a text-to-image generation model developed by startups StabilityAI, CompVis, and Runway. It was released in 2022 and is now open source. Midjourney is a text-to-image generation tool launched in March 2022. It has gone through multiple iterations and entered the public beta stage. Its realistic effects have sparked heated discussions on the Chinese Internet. Both Stable Diffusion and Midjourney are industry-leading and highly rated AI tools worldwide.