Ignesso Lesias's picture

Ignesso Lesias

Dustyang111
Ā·

AI & ML interests

None yet

Recent Activity

repliedto merve's post about 19 hours ago
Don't sleep on new AI at Meta Vision-Language release! šŸ”„ https://huggingface.co/collections/facebook/perception-encoder-67f977c9a65ca5895a7f6ba1 https://huggingface.co/collections/facebook/perception-lm-67f9783f171948c383ee7498 Meta dropped swiss army knives for vision with A2.0 license šŸ‘ > image/video encoders for vision language modelling and spatial understanding (object detection etc) šŸ‘ > The vision LM outperforms InternVL3 and Qwen2.5VL šŸ‘ > They also release gigantic video and image datasets The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks. They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 šŸ‘ > Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮 > Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!) The authors release the following checkpoints in sizes base, large and giant: > 3 PE-Core checkpoints (224, 336, 448) > 2 PE-Lang checkpoints (L, G) > One PE-Spatial (G, 448) > 3 PLM (1B, 3B, 8B) > Datasets Authors release following datasets šŸ“‘ > PE Video: Gigantic video datasete of 1M videos with 120k expert annotations āÆļø > PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks > PLM-VideoBench: New video benchmark on MCQA
repliedto fdaudens's post about 2 months ago
Ever wanted 45 min with one of AI’s most fascinating minds? Was with @thomwolf at HumanX Vegas. Sharing my notes of his Q&A with the press—completely changed how I think about AI’s future: 1ļøāƒ£ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same." 2ļøāƒ£ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements." 3ļøāƒ£ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones." 4ļøāƒ£ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models. We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on Hugging Face. Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases." Love to hear your thoughts on this shift!
View all activity

Organizations

None yet