IBM releases large-scale language model 'Bamba' as open source



IBM Research, in collaboration with Carnegie Mellon University, Princeton University, and the University of Illinois, has built an open source large-scale language model called ' Bamba ' and released version 2 as open source.

Meet Bamba, IBM's new attention-state space model - IBM Research

https://research.ibm.com/blog/bamba-ssm-transformer-model



Bamba is a large-scale language model with 9.78 billion parameters, and its underlying architecture is slightly different from that of conventional large-scale language models.

According to IBM Research, large-scale language models typically use an architecture called the Transformer, but because the response requires the running sequence to be kept in memory, the cost of generating the response grows exponentially as the prompt gets longer. For example, if the size of the context window is doubled, the cost of processing it and generating a response is not doubled but quadrupled.

This problem is called the 'secondary bottleneck' and is said to be one of the causes of the time lag between when a user asks an AI a question and when they receive an answer.

The newly introduced Bamba-9B is a model that combines the Transformer architecture with an architecture called the state space model (SSM), while fundamentally changing the management of the KV cache, which corresponds to memory, from the Transformer architecture. Normally, when Transformer outputs a response, it pays attention to all words in the context window, whereas SSM maintains a 'hidden state' that summarizes past information. By using this method of selectively retaining information, it is said that memory overhead is reduced and inference speed is faster.

For more details, please see the following website.

Bamba-9B-v2 - Fast and powerful!

https://huggingface.co/blog/ibm-ai-platform/bamba-9b-v2



According to IBM Research, Bamba-9B can run at least twice as fast as a Transformer-based model of the same size while maintaining the same accuracy by significantly reducing the memory requirements of the KV cache. By combining the power of the Transformer with the execution speed of the SSM, this model has eliminated the bottleneck while maintaining response accuracy.

Bamba is released as open source under the Apache 2.0 license.

GitHub - foundation-model-stack/bamba: Train, tune, and infer Bamba model

https://github.com/foundation-model-stack/bamba



in Software, Posted by log1p_kr