🔥Mistral-NeMo, a 12B LLM, launched a few hours ago has the community abuzz!
👇Understand nuances about model's capabilities that were not covered in the release blog. We also cover the community's initial reactions.
- Trained jointly by Mistral AI and NVIDIA. Mistral NeMo has 50% more parameters (12B) compared to Llama 3 (8B).
- Open-source model with Apache 2.0 license 🎉
- 12B parameter count puts it between smaller models (7-8B) and larger ones (30-70B), potentially offering a good balance of performance and resource requirements.
- Trained with "quantization awareness," allowing for FP8 inference without performance loss. This approach appears forward-thinking, potentially allowing for better performance when quantized compared to models without quantization-aware training.
- VRAM required to run: Model would need about 12GB of VRAM at 8-bit precision, or 6GB at 4-bit precision (not counting context).
- Can potentially run on consumer GPUs with 16GB VRAM, and possibly on 12GB cards with quantization.
The model seems to be designed to fit on NVIDIA L40S, GeForce RTX 4090, or RTX 4500 GPUs.🤔
- 128k Context Window available - However, using full context size could significantly increase memory requirements, might not be practical for all usecases.
- This large context window (128K) is rare among models of this size. This makes Mistral-Nemo potentially valuable for tasks requiring long-range understanding.
- Mistral Nemo is a joint release with Nvidia: Model was trained using 3,072 H100 (80GB) -- You can see significant computational resources have been used.
- Multilingual: Trained on multiple languages. Benchmarks for non-English languages look particularly strong.
- Tokenizer: New Tekken tokenizer based on tiktoken (by OpenAI), which uses byte-pair encoding
- llama.cpp compatibility: Not yet out-of-the-box; however, a PR is in motion, might take couple days. This might potentially delay a widespread adoption for the model.
- Released same day as GPT-4o Mini 😉-- We are excited to see how these two would compete in Lmsys leaderboard (a @Gradio-built leaderboard and Arena)!
- Fine-tuning: As per the community, Mistral-nemo seems to be more suitable for fine-tuning compared to Llama 3.
- Temperature used: The model reportedly (HN/Reddit comments) requires lower temperature settings (around 0.3) compared to previous Mistral models, which might affect its behavior in various applications. Useful to know if you were planning for a drop-in replacement for mistral models.
- Potential: Could be particularly useful for tasks like coding assistance, creative writing, and role-playing.
- Base and Instruct Models on Hugging Face:
1. Mistral-Nemo-Instruct-2407: https://lnkd.in/ek6DHuZD
2. Mistral-Nemo-Base-2407: https://lnkd.in/gRdzezbr
Gradio chatbot demo on 🤗Spaces: https://lnkd.in/g_fUTTF6