| Mistral-7B-Instruct-v0.3 | Mistral-7B-Instruct-v0.3-GPTQ-4bit (this model) | |
|---|---|---|
| arc-c 25-shot | 63.48 | 63.40 |
| mmlu 5-shot | 61.13 | 60.89 |
| hellaswag 10-shot | 84.49 | 84.04 |
| winogrande 5-shot | 79.16 | 79.08 |
| gsm8k 5-shot | 43.37 | 45.41 |
| truthfulqa 0-shot | 59.65 | 57.48 |
| Average Accuracy | 65.21 | 65.05 |
| Recovery | 100% | 99.75% |
This model is ready for optimized inference using the Marlin mixed-precision kernels in vLLM: https://github.com/vllm-project/vllm
Simply start this model as an inference server with:
python -m vllm.entrypoints.openai.api_server --model neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit
