UnifoLM-VLA

Robot
278

UnifoLM-VLA-0: A Vision-Language-Action (VLA) large model in the UnifoLM series, designed for general-purpose humanoid robot manipulation. Goes beyond conventional Vision-Language Models (VLMs) in physical interaction through continued pre-training on robot manipulation data.

Implemented Skills

No skills extracted yet

README

UnifoLM-VLA-0 is a Vision–Language–Action (VLA) large model in the UnifoLM series, designed for general-purpose humanoid robot manipulation. It goes beyond the limitations of conventional Vision–Language Models (VLMs) in physical interaction. Through continued pre-training on robot manipulation data, the model evolves from 'vision-language understanding' to an 'embodied brain' equipped with physical common sense. Features spatial semantic enhancement and manipulation generalization across 12 categories of complex manipulation tasks.

Timeline

discover2/23/2026

Discovered UnifoLM-VLA during February 2026 content audit

View source
Project Info
Python
Updated 1/29/2026