Microsoft’s new small language model with much fewer parameters outperforms ChatGPT

Microsoft’s new small language model with much fewer parameters outperforms ChatGPT


Microsoft’s newest language model called Phi-1 was introduced with 1.3 billion parameters and a surprising improvement in performance. The public perception is that larger models perform better, but the Redmond tech giant’s approach focuses on the quality of the training data. Phi-1, trained on a high-accuracy, textbook-level dataset, outperforms the GTP-3.5 model, which has 100 billion parameters.

Microsoft’s Phi-1 language model, which is based on the Transformer architecture, has attracted a lot of attention due to its impressive performance. The team building this model emphasizes the quality of educational data. The training process of this model was completed with the help of 8 Nvidia A100 graphics processors in just four days.

written by GizmochinaBy focusing on increasing the quality of training data instead of increasing the number of parameters, Microsoft has brought promising results. Phi-1’s accuracy in comparative tests reached 50.6%, which is better than GPT-3.5’s 47% performance with 175 billion parameters.

Microsoft has decided to release this language model as open source to improve accessibility and people’s participation in the development of Phi-1. This is not the first time that the company from Redmond has developed a small tongue model because we have already seen the unveiling of the Orca model; A model with 13 billion parameters trained on synthetic data using GPT-4. Even Orca proved to outperform ChatGPT. The Phi-1 research paper published on arXiv provides a detailed insight into the architecture and training method of this AI model.

Microsoft’s Phi-1 language model challenges the idea that increasing the stack size is necessary to improve performance. Focusing on high-quality training data, this model has shown remarkable accuracy and even outperformed larger models. The open source nature of Microsoft’s new language model better demonstrates the company’s commitment to advancing natural language processing.


Source link


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *