Sunday, March 3, 2024
HomeSoftware DevelopmentIntroducing Gemma fashions in Keras

Introducing Gemma fashions in Keras



Posted by Martin Görner – Product Supervisor, Keras

The Keras crew is pleased to announce that Gemma, a household of light-weight, state-of-the artwork open fashions constructed from the identical analysis and expertise that we used to create the Gemini fashions, is now obtainable within the KerasNLP assortment. Because of Keras 3, Gemma runs on JAX, PyTorch and TensorFlow. With this launch, Keras can be introducing a number of new options particularly designed for giant language fashions: a brand new LoRA API (Low Rank Adaptation) and enormous scale model-parallel coaching capabilities.

If you wish to dive straight into code samples, head right here:

Get began

Gemma fashions are available in moveable 2B and 7B parameter sizes, and ship vital advances in opposition to comparable open fashions, and even some bigger ones. For instance:

  • Gemma 7B scores a brand new best-in class 64.3% of right solutions within the MMLU language understanding benchmark (vs. 62.5% for Mistral-7B and 54.8% for Llama2-13B)
  • Gemma provides +11 proportion factors to the GSM8K benchmark rating for grade-school math issues (46.4% for Gemma 7B vs. Mistral-7B 35.4%, Llama2-13B 28.7%)
  • and +6.1 proportion factors of right solutions in HumanEval, a coding problem (32.3% for Gemma 7B, vs. Mistral 7B 26.2%, Llama2 13B 18.3%).

Gemma fashions are supplied with a well-recognized KerasNLP API and a super-readable Keras implementation. You possibly can instantiate the mannequin with a single line of code:

gemma_lm = keras_nlp.fashions.GemmaCausalLM.from_preset("gemma_2b_en")

And run it straight on a textual content immediate – sure, tokenization is built-in, though you may simply cut up it out if wanted – learn the Keras NLP information to see how.

gemma_lm.generate("Keras is a", max_length=32)
> "Keras is a well-liked deep studying framework for neural networks..."

Attempt it out right here: Get began with Gemma fashions

Positive-tuning Gemma Fashions with LoRA

Because of Keras 3, you may select the backend on which you run the mannequin. Right here is how you can change:

os.environ["KERAS_BACKEND"] = "jax"  # Or "tensorflow" or "torch".
import keras # import keras after having chosen the backend

Keras 3 comes with a number of new options particularly for giant language fashions. Chief amongst them is a brand new LoRA API (Low Rank Adaptation) for parameter-efficient fine-tuning. Right here is how you can activate it:

gemma_lm.spine.enable_lora(rank=4)
# Notice: rank=4 replaces the weights matrix of related layers with the 
# product AxB of two matrices of rank 4, which reduces the quantity of 
# trainable parameters.

This single line drops the variety of trainable parameters from 2.5 billion to 1.3 million!

Attempt it out right here: Positive-tune Gemma fashions with LoRA.

Positive-tuning Gemma fashions on a number of GPU/TPUs

Keras 3 additionally helps large-scale mannequin coaching and Gemma is the proper mannequin to attempt it out. The brand new Keras distribution API gives data-parallel and model-parallel distributed coaching choices. The brand new API is supposed to be multi-backend however in the interim, it’s applied for the JAX backend solely, due to its confirmed scalability (Gemma fashions have been skilled with JAX).

To fine-tune the bigger Gemma 7B, a distributed setup is helpful, for instance a TPUv3 with 8 TPU cores which you can get free of charge on Kaggle, or an 8-GPU machine from Google Cloud. Right here is how you can configure the mannequin for distributed coaching, utilizing mannequin parallelism:

device_mesh = keras.distribution.DeviceMesh(
   (1, 8), # Mesh topology
   ["batch", "model"], # named mesh axes
   gadgets=keras.distribution.list_devices() # precise accelerators
)


# Mannequin config
layout_map = keras.distribution.LayoutMap(device_mesh)
layout_map["token_embedding/embeddings"] = (None, "mannequin")
layout_map["decoder_block.*attention.*(query|key|value).*kernel"] = (
   None, "mannequin", None)
layout_map["decoder_block.*attention_output.*kernel"] = (
   None, None, "mannequin")
layout_map["decoder_block.*ffw_gating.*kernel"] = ("mannequin", None)
layout_map["decoder_block.*ffw_linear.*kernel"] = (None, "mannequin")


# Set the mannequin config and load the mannequin
model_parallel = keras.distribution.ModelParallel(
   device_mesh, layout_map, batch_dim_name="batch")
keras.distribution.set_distribution(model_parallel)
gemma_lm = keras_nlp.fashions.GemmaCausalLM.from_preset("gemma_7b_en")
# Prepared: now you can practice with mannequin.match() or generate textual content with generate()

What this code snippet does is ready up the 8 accelerators right into a 1 x 8 matrix the place the 2 dimensions are known as “batch” and “mannequin”. Mannequin weights are sharded on the “mannequin” dimension, right here cut up between the 8 accelerators, whereas knowledge batches should not partitioned because the “batch” dimension is 1.

Attempt it out right here: Positive-tune Gemma fashions on a number of GPUs/TPUs.

What’s Subsequent

We are going to quickly be publishing a information displaying you how you can accurately partition a Transformer mannequin and write the 6 strains of partitioning setup above. It isn’t very lengthy however it could not match on this submit.

You’ll have observed that layer partitionings are outlined by regexes on layer names. You possibly can test layer names with this code snippet. We ran this to assemble the LayoutMap above.

# That is for the primary Transformer block solely,
# however all of them have the identical construction
tlayer = gemma_lm.spine.get_layer('decoder_block_0')
for variable in tlayer.weights:
 print(f'{variable.path:<58}  {str(variable.form):<16}')

Full GSPMD mannequin parallelism works right here with just some partitioning hints as a result of Keras passes these settings to the highly effective XLA compiler which figures out all the opposite particulars of the distributed computation.

We hope you’ll get pleasure from taking part in with Gemma fashions. Right here can be an instruction-tuning tutorial that you simply would possibly discover helpful. And by the best way, if you wish to share your fine-tuned weights with the neighborhood, the Kaggle Mannequin Hub now helps user-tuned weights uploads. Head to the mannequin web page for Gemma fashions on Kaggle and see what others have already created!





Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments