utter-project/EuroLLM-1.7B-Instruct

text generationtransformersendetransformerssafetensorsllamatext-generationconversationalenapache-2.0
185.7K

Model Card for EuroLLM-1.7B-Instruct

This is the model card for the first instruction tuned model of the EuroLLM series: EuroLLM-1.7B-Instruct. You can also check the pre-trained version: EuroLLM-1.7B.

  • Developed by: Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
  • Funded by: European Union.
  • Model type: A 1.7B parameter instruction tuned multilingual transfomer LLM.
  • Language(s) (NLP): Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
  • License: Apache License 2.0.

Model Details

The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages. EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets. EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.

Model Description

EuroLLM uses a standard, dense Transformer architecture:

  • We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
  • We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
  • We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
  • We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.

For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision. Here is a summary of the model hyper-parameters:

Sequence Length4,096
Number of Layers24
Embedding Size2,048
FFN Hidden Size5,632
Number of Heads16
Number of KV Heads (GQA)8
Activation FunctionSwiGLU
Position EncodingsRoPE (\Theta=10,000)
Layer NormRMSNorm
Tied EmbeddingsNo
Embedding Parameters0.262B
LM Head Parameters0.262B
Non-embedding Parameters1.133B
Total Parameters1.657B

Run the model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "utter-project/EuroLLM-1.7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

text = '<|im_start|>system\n<|im_end|>\n<|im_start|>user\nTranslate the following English source text to Portuguese:\nEnglish: I am a language model for european languages. \nPortuguese: <|im_end|>\n<|im_start|>assistant\n'

inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Results

Machine Translation

We evaluate EuroLLM-1.7B-Instruct on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with Gemma-2B and Gemma-7B (also instruction tuned on EuroBlocks). The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.

Flores-200

ModelAVGAVG en-xxAVG xx-enen-aren-bgen-caen-csen-daen-deen-elen-es-latamen-eten-fien-fren-gaen-glen-hien-hren-huen-iten-jaen-koen-lten-lven-mten-nlen-noen-plen-pt-bren-roen-ruen-sken-slen-sven-tren-uken-zh-cnar-enbg-enca-encs-enda-ende-enel-enes-latam-enet-enfi-enfr-enga-engl-enhi-enhr-enhu-enit-enja-enko-enlt-enlv-enmt-ennl-enno-enpl-enpt-br-enro-enru-ensk-ensl-ensv-entr-enuk-enzh-cn-en
EuroLLM-1.7B-Instruct86.8986.5387.2585.1789.4284.7289.1389.4786.9087.6086.2988.9589.4087.6974.8986.4176.9284.7986.7888.1789.7687.7087.2787.6267.8487.1090.0088.1889.2989.4988.3288.1886.8590.0087.3187.8986.6086.3487.4587.5787.9589.7288.8087.0086.7788.3489.0988.9582.6987.8088.3786.7187.2087.8186.7986.7985.6286.4881.1086.9790.2585.7589.2088.8886.0087.3886.7689.6187.94
Gemma-2B-EuroBlocks81.5978.9784.2176.6882.7383.1481.6384.6383.1579.4284.0572.5879.7384.9740.5082.1367.7980.5378.3684.9087.4382.9872.2968.6858.5583.1386.1582.7886.7983.1484.6178.1875.3780.8978.3884.3884.3583.8885.7786.8586.3188.2488.1284.7984.9082.5186.3288.2954.7886.5385.8385.4185.1886.7785.7884.9981.6581.7867.2785.9289.0784.1488.0787.1785.2385.0983.9587.5784.77
Gemma-7B-EuroBlocks85.2783.9086.6486.3887.8785.7484.2585.6981.4985.5286.9362.8384.9675.3484.9383.9186.9288.1986.1181.7380.5566.8585.3189.3685.8788.6288.0686.6784.7982.7186.4585.1986.6785.7786.3687.2188.0987.1789.4088.2686.7486.7387.2588.8788.8172.4587.6287.8687.0887.0187.5886.9286.7085.1085.7477.8186.8390.4085.4189.0488.7786.1386.6786.3289.2787.92

WMT-23

ModelAVGAVG en-xxAVG xx-enAVG xx-xxen-deen-csen-uken-ruen-zh-cnde-enuk-enru-enzh-cn-encs-uk
EuroLLM-1.7B-Instruct82.9183.2081.7786.8281.5685.2381.3082.4783.6185.0384.0685.2581.3178.83
Gemma-2B-EuroBlocks79.9679.0180.8681.1576.8276.0577.9278.9881.5882.7382.7183.9980.3578.27
Gemma-7B-EuroBlocks82.7682.2682.7085.9881.3782.4281.5482.1882.9083.1784.2985.7082.4679.73

WMT-24

ModelAVGAVG en-xxAVG xx-xxen-deen-es-latamen-csen-ruen-uken-jaen-zh-cnen-hics-ukja-zh-cn
EuroLLM-1.7B-Instruct79.3279.3279.3479.4280.6780.5578.6580.1282.9680.6071.5983.4875.20
Gemma-2B-EuroBlocks74.7274.4175.9774.9378.8170.5474.9075.8479.4878.0662.7079.8772.07
Gemma-7B-EuroBlocks78.6778.3480.0078.8880.4778.5578.5580.1280.5578.9070.7184.3375.66

General Benchmarks

We also compare EuroLLM-1.7B with TinyLlama-v1.1 and Gemma-2B on 3 general benchmarks: Arc Challenge and Hellaswag. For the non-english languages we use the Okapi datasets. Results show that EuroLLM-1.7B is superior to TinyLlama-v1.1 and similar to Gemma-2B on Hellaswag but worse on Arc Challenge. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).

Arc Challenge

ModelAverageEnglishGermanSpanishFrenchItalianPortugueseChineseRussianDutchArabicSwedishHindiHungarianRomanianUkrainianDanishCatalan
EuroLLM-1.7B0.34960.40610.34640.36840.36270.37380.38550.35210.32080.35070.30450.36050.29280.32710.34880.35160.35130.3396
TinyLlama-v1.10.26500.37120.25240.27950.28830.26520.29060.24100.26690.24040.23100.26870.23540.24490.24760.25240.24940.2796
Gemma-2B0.36170.48460.37550.39400.40800.36870.38720.37260.34560.33280.31220.35190.28510.30390.35900.36010.35650.3516

Hellaswag

ModelAverageEnglishGermanSpanishFrenchItalianPortugueseRussianDutchArabicSwedishHindiHungarianRomanianUkrainianDanishCatalan
EuroLLM-1.7B0.47440.47600.60570.47930.53370.52980.50850.52240.46540.49490.41040.48000.36550.40970.46060.4360.4702
TinyLlama-v1.10.36740.62480.36500.41370.40100.37800.38920.34940.35880.28800.35610.28410.30730.32670.33490.34080.3613
Gemma-2B0.46660.71650.47560.54140.51800.48410.50810.46640.46550.38680.43830.34130.37100.43160.42910.44710.4448

Bias, Risks, and Limitations

EuroLLM-1.7B-Instruct has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).

Paper

Paper: EuroLLM: Multilingual Language Models for Europe

DEPLOY IN 60 SECONDS

Run EuroLLM-1.7B-Instruct on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.