Less than a month after Google released its AI product Gemini, research has shown that it falls behind OpenAI’s GPT-3.5 Turbo in terms of performance. According to a paper titled “An In-depth Look at Gemini’s Language Abilities” by researchers from Carnegie Mellon University and BerriAI, Gemini’s Pro model achieved slightly inferior accuracy compared to OpenAI’s GPT-3.5 Turbo.
The researchers tested several language models, including Google Gemini Pro, OpenAI GPT-3.5 Turbo, GPT-4 Turbo, and Mistral 8x7B. They conducted a knowledge-based QA test and found that Gemini Pro scored lower compared to GPT-3.5 Turbo and GPT-4 Turbo. Gemini also showed a skewed label distribution, often choosing “D” as the final choice regardless of correctness.
The study highlighted that Gemini struggled with answering certain questions, particularly those pertaining to human sexuality, formal logic, elementary math, and professional medicine. However, it did outperform GPT-3.5 Turbo in the categories of security and high school microeconomics, although the gains were marginal.
In terms of general-purpose reasoning and mathematical tasks, Gemini Pro achieved slightly lower accuracy compared to GPT-3.5 Turbo and GPT-4 Turbo. Additionally, when completing incomplete Python code and acting as a web agent, Gemini Pro performed worse than both GPT-3.5 Turbo and GPT-4 Turbo.
One area where Gemini Pro showed promise was in translating content between languages, outperforming GPT-3.5 Turbo and GPT-4 Turbo in multiple languages. However, it exhibited a tendency to block responses in certain language pairs, suggesting an overly restrictive content moderation system.
These results indicate that Google’s ambitions to compete with OpenAI in the generative AI race have fallen short, at least for now. Despite Mistral’s Mixtral 8x7B model also performing poorly compared to GPT-3.5 Turbo, Gemini Pro outperformed Mixtral on all examined tasks.
Overall, OpenAI’s GPT-4 remains the top performer in generative AI, with Google’s Gemini Pro struggling to match its capabilities. AI influencers, like Professor Ethan Mollick from the University of Pennsylvania Wharton School of Business, agree that GPT-4 is currently the best choice. However, with the upcoming release of Gemini Ultra, Google aims to improve its AI performance in the new year.