IBM has confirmed the arrival of Granite 4.0 Nano, its smallest AI model yet.
The release constitutes IBM’s latest effort to demonstrate that model size does not necessarily equate to greater intelligence and that sheer scale alone might not be a dominating factor. Granite 4.0 Nano reaches only about 1 billion parameters, dwarfed by offerings from the likes of OpenAI and Google.
Confirming the launch of Granite 4.0 Nano, IBM’s Kate Soule and Rameswar Panda said on Hugging Face that they had been designed for the edge and on-device applications, adding that they represent “IBM’s continued commitment to develop powerful, useful, models that don’t require hundreds of billions of parameters to get the job done.”
The family comprises models in two sizes: 350 million and about 1.5 billion parameters with hybrid state space architecture (SSM) as well as transformer variants.
IBM’s line-up available on Hugging Face includes four instruct model and their base model counterparts. They are as follows:
-
Granite 4.0 H 1B (about 1.5 billion parameters): a dense large language model (LLM) featuring a hybrid-SSM based architecture.
-
Granite 4.0 H 350M(about 350 million parameters): a dense LLM featuring a hybrid-SSM based architecture.
-
Granite 4.0 1B: transformer-based variant, designed to enable workloads where hybrid architectures may not yet have optimized support, such as Llama.cpp.
-
Granite 4.0 350M: transformer-based variant, as with the 1B model.
Although IBM subsequently said on Reddit that the non-hybrid 1 billion variant is closer to 2 billion parameters, it “opted to keep the naming aligned to the hybrid variant to make the connection easily visible.”
In effect, the models, which are natively compatible with vLLM, llama.cpp and MLX, will be usable by independent developers or enterprise, and are released under the Apache 2.0 license, which makes them suitable for commercial deployment.
The models can be run locally — the smallest ones on a laptop — and none of them rely on cloud architecture.
IBM added on Reddit: “We developed the Nano models specifically for the edge, on-device applications, and latency-sensitive use cases. Within that bucket, the models will perform well for tasks like document summarization/extraction, classification, lightweight RAG, and function/tool calling.
“While they aren’t intended for highly complex tasks, they can comfortably handle real-time, moderate-complexity workloads in production environments.”
They also come with IBM’s ISO 42001 certification for responsible AI development.
IBM stated initial performance testing revealed the models show up strongly against competitors including Alibaba (Qwen), LiquidAI (LFM) and Google (Gemma).
The company claimed “a significant increase in capabilities that can be achieved with a minimal parameter footprint” across benchmarks encompassing General Knowledge, Math, Code and Safety domains.”
And it seems there is more to come, with IBM admitting a larger model is being trained as part of the Granite 4.0 family.



