Model imitation in NLP: applications and industry implications
Ben Burtenshaw
NLP Engineer
At its core, model imitation is about augmenting the capabilities of a base language model with those of a target model. It achieves this by finetuning the model on outputs from a more advanced, often proprietary, language model like Open AI’s GPT-4. The ultimate aim is to incorporate the superior model’s abilities into the imitated model.
“Imitation and Novelty: Critical Evaluation of Model Imitation” (Wallace et al., 2023), offers a critical assessment of this approach. The research demonstrates that while imitated models appear superior in following instructions and earning high ratings from human evaluators (due to their mimicry of GPT-4’s style), they fall short in improving tasks not heavily featured in the imitation data. These findings indicate that model imitation is not a catch-all solution; it may even contribute to performance discrepancies as it excels in style imitation but often struggles with reasoning and fact learning. Below, I break down the paper’s key observations on this theme.
Finetuning as a Knowledge Extractor
Finetuning a model on generated instructions, as Wallace et al. point out, offers only marginal improvements to an LM’s capabilities. To illustrate, consider the attempt to enhance an open-source LM like GPT-2 by imitating a proprietary LM like GPT-4 on a small dataset. The outcome would not impact the base knowledge of the GPT-2 model, beyond the information directly included in the finetuning dataset. In turn, the GPT-2 model remains largely unchanged, with the result falling short of GPT-4’s capabilities. Instead, it would imitate stylistic properties of the superior model. Hence, developing better base language models could prove more beneficial than simply imitating proprietary systems.
Model Imitation in Vision-Language Models
The paper also touches upon recent work where model imitation is used more indirectly. Vision-language models, such as those used in image captioning or visual question answering, have included outputs from models like GPT-4 during their training process. However, these models could inherit the same limitations or biases as the models they imitate, posing potential challenges and raising questions about the validity and usefulness of such an approach.
Model Distillation versus Imitation
Model distillation is another process that involves one model learning from another. However, it differs significantly from model imitation in its practical application. In model distillation, the teacher model’s training data, model architecture, and hyper-parameters are known and used to guide the student’s training. In contrast, in model imitation, these aspects remain unknown, rendering the imitation process far more challenging. Wallace et al. propose that distillation could be used instead of imitation if the target model details are open available.
Legal and Ethical Implications
Model Imitation stirs up legal and ethical concerns, most notably regarding the open-source community’s practices of appropriating techniques and knowledge from companies like OpenAI. This mirrors ongoing debates surrounding patenting AI models and algorithms, where claims of unique solutions collide with efforts to democratize these innovations for broader use.
The applications of Model Imitation extend across diverse fields and industry applications. From broad-coverage imitations that address a variety of behaviors, domains, and tasks, to task-specific imitations that require deep domain knowledge, each method presents unique challenges. Regardless of the approach, ensuring quality is paramount — the imitation model should aim not only to mimic the responses of the target model but also its capacity to handle a broad range of tasks effectively.
Looking Ahead
Despite the challenges and uncertainties, Model Imitation continues to generate curiosity and investment, despite questions to answer and problems to solve. From technical considerations such as how to improve open-source LMs and ethical dilemmas about “stealing” from proprietary models, to legal countermeasures that companies could adopt to protect their intellectual property, the path ahead for Model Imitation is complex. Continued research and dialogue are vital to navigate these issues and develop better methods for the ethical and responsible development of language models.
While Model Imitation offers some promising advantages, it also reveals significant limitations. Its inability to improve upon certain tasks and the challenges associated with imitating proprietary systems suggest a more nuanced approach to enhancing open-source LMs. Future work needs to investigate these issues and explore alternatives, ensuring progress in NLP aligns with ethical standards and respects intellectual property rights.
This article is available on Medium.