Chinese technology company ByteDance has recently launched its new multimodal artificial intelligence (AI) model, named Bagel. It is a visual language model (VLM) that can not only understand pictures, but can also generate and edit them. The biggest thing is that the company has made it open-source and now it can be downloaded from popular AI platforms like GitHub and Hugging Face.
Features of Bagel : -
- Multimodal input -: Capable of understanding and processing both text and images simultaneously.
- 14 billion parameters -: 7 billion of which are active at a time.
- Interleaved training data -: Text and images were trained together, allowing Bagel to create a better relationship between the two.
Advanced image editing capability : - ByteDance claims that Bagel does better image editing than other existing open-source VLMs. It can easily do tasks like adding emotions to the image, removing, changing or adding an element, style transfer, free-form editing, i.e. making changes without any limited framework.
Also capable of world modeling : - Bagel has been trained in such a way that it can understand the world in visual form - such as the relationship between objects, the effect of natural factors like light or gravity, etc. ByteDance says that in their internal tests, Bagel has surpassed Qwen2.5-VL-7B (better in understanding images), Janus-Pro-7B and Flux-1-dev (better in image generation), Gemini-2-exp (better performance in image editing in GEdit-Bench test) AI models.
Read more : -
The Best Ways to Protect Your Smartphone: A Deep Dive into Locking Methods
Reframing Fear: How to Embrace Anxiety as a Catalyst for Change
0 Comments