Base Model to Chat Model: Post-training with SFT & LoRA
by Andrew Melbourne • 2/23/2025
Blog HomeIntroduction
Large language models (LLMs) begin as base models, trained on vast amounts of unstructured text. However, the models that power conversational AI, like ChatGPT, undergo additional training on structured datasets containing dialogues and instruction-following examples. This teaches the model to output text that feels more like a response to user input and not a completion of what a user has written.
SmolLM2-1.7B-UltraChat_200k is a base model that has undergone additional training (fine-tuning) on the ultrachat_200k dataset to become a chat model, also referred to as an Instruct Model. See the Training Notebook for a detailed breakdown of this process.
Base Models
Base models, like GPT-3 from OpenAI, are trained using Self-Supervised Learning. Sequences of text are fed in, without formatting, and the model learns to generalize and predict the next token in a sequence. This is referred to as Pre-Training.
These base models are laborious to interact with because their training doesn't provide them any structure for an interaction. This blog post shows what trying to have a conversation with GPT-3 was like.
In Post-Training models can be imparted with an understanding that they are assistants providing answers to user questions, not just completing sequences.
Supervised Fine-Tuning (SFT)
Supervised Fine-Tuning, also called Instruction Fine-Tuning, is a type of post-training designed to refine the model’s behavior and structure its output with specific characteristics.
GPT-3.5, which powered the revolutionary ChatGPT program when it premiered, was a post-trained version of GPT-3. OpenAI applied supervised fine-tuning, among other things, to align the model for user-assistant interaction.
If you have ever used an API for an LLM, you may have seen data structured like this:
[
{"role": "user", "content": "How far away is the sun?"},
{"role": "assistant", "content": "The sun is 93 million miles from Earth."}
]
This format gets parsed into a format called ChatML, which looks like this:
<|im_start|>user How far away is the sun?<|im_end|>
<|im_start|>assistant The sun is 93 million miles from Earth.<|im_end|>
ChatML, introduced by OpenAI, creates special tokens to indicate the boundary between assistant and user messages.
ChatML-structured data is used in supervised fine-tuning to provide the model with well-defined conversational patterns. This enables multi-turn interactions where the model follows instructions more effectively and generates responses more closely aligned with the users desired output.
Low-Rank Adaptation (LoRA)
Adjusting all the weights while training a model is computationally expensive and not entirely necessary for basic instruction tuning. Adjusting all the weights might be helpful when teaching the model new information, like how to do accounting.
Training a low-rank adaptation means appending small layers onto the model, and only adjusting weights in those layers. In the case of tuning to change the structure of the outputs, only the attention heads of the model need adapter layers.
LoRA, along with quantization (QLoRA), makes for very efficient training. QLoRA is utilized in the training of SmolLM2-1.7B-UltraChat_200k to demonstrate the lowest-cost way to tune a model on any given dataset.
Read more: What is LoRA (low-rank adaption)?
Training
SmolLM2-1.7B-UltraChat_200k is a low-rank adapter for the SmolLM2-1.7B base model released by Hugging Face. The adapter is trained on the ultrachat_200k dataset, which contains 200k conversation examples generated by GPT-4.
Here is the Training Notebook
The adapter was trained with one pass (epoch) through the ultrachat samples. The adapter has a rank of 8. This lower value was chosen to effect only the output structure, and have minimal impact on output content.
Training run done on an A100 GPU from Google Colab. The script uses the transformers, trl peft libraries from Hugging Face. The model was trained for 2 hours and 37 minutes to a loss of 1.6965. This consumed 22.15 compute units costing $2.22.
Final Thoughts
SmolLM2-1.7B-UltraChat_200k is a demonstration of using in LoRA and SFT efficiently. The training script can serve as a starting point for tuning similar models on any dataset with minimal compute.
Hugging Face has done an incredible job making language model training and usage this accessible, providing weights and tools that enable anybody to build like they work at a SOTA lab.