
5 Min
Last night, OpenAI finally announced the launch of GPT-4o, an iteration of its GPT-4 model that powers its flagship product, ChatGPT. GPT-4o, where “o” stands for “omni,” represents a significant step forward for the company and for AI enthusiasts, as it enhances human-computer interaction to a more natural level.
Lately, rumors have been circulating that have caught the attention of all technology and digital enthusiasts about the future launch of GPT-5. Sam Altman, CEO of OpenAI, therefore, decided to speak before the OpenAI event to categorically deny the arrival of the implementation that web users must await.
The launch of the ChatGPT search engine is not on OpenAI’s agenda, at least not for the foreseeable future. Neither is the GPT-5 model, which is still in the implementation and training phase. However, many other innovations, including GPT-4o, are on the way.
This latest iteration of OpenAI’s model accepts inputs of text, audio, and images in any combination and generates corresponding outputs in the same modalities. According to The Verge, the latest update to GPT makes the model “much faster.” It enhances “text, vision, and audio capabilities,” as stated by OpenAI’s CTO, Mira Murati, during a live stream announcement.
Functionally, GPT-4o responds to audio inputs in just 232 milliseconds, with an average of 320 milliseconds, similar to human response time in a conversation. CEO Sam Altman highlights its native multimodal capabilities, meaning it can generate content or understand commands via voice, text, or images, thanks to its end-to-end training on text, vision, and audio.
Security is a paramount concern for OpenAI, and GPT-4o addresses this by incorporating integrated security features across all modes. These features include data training filtering and behavior refinement through post-training, providing enhanced protection, especially for vocal outputs.
Furthermore, GPT-4o represents a significant leap forward in terms of performance compared to its predecessor, GPT-4 Turbo. Not only is GPT-4o faster, but it also offers improved affordability and higher speed limits. Let’s see specifically:
Overall, GPT-4o maintains the same high intelligence as its predecessor while offering significant enhancements in speed, affordability, and performance, making it a substantial advancement in AI technology.
GPT-4o will initially be accessible in ChatGPT and its API, offering text and vision capabilities. Voice support will continue through the existing Voice Mode feature. Paid users will enjoy higher capacity limits across the platform.
OpenAI is also prioritizing safety, engaging with over 70 external experts to identify and mitigate risks associated with the new functionalities. In the coming weeks and months, the company plans to refine technical infrastructure and usability through post-training and security measures to release additional modes.
Moving on to GPT-4o’s impact on ChatGPT, users now have access to a more responsive and versatile AI assistant. With GPT-4o, interactions with ChatGPT become more natural, akin to conversing with a real person. Users can ask follow-up questions or interrupt the AI during speech generation, enhancing the conversational experience.
Furthermore, GPT-4o enables ChatGPT to recognize user emotions in speech, allowing for more appropriate responses. However, advanced audio generation capabilities will initially be restricted to select commercial partners to prevent misuse.
Additionally, GPT-4o introduces the ability to upload videos to ChatGPT for AI-driven description and summarization of content. Murati envisions future applications where ChatGPT can “watch” live events and provide real-time explanations, demonstrating the rapid advancement of AI’s visual capabilities.
GPT-4o surpasses its predecessors, particularly in image understanding and discussion, offering users unparalleled capabilities. Users can now utilize GPT-4o to translate menus in various languages, explore culinary histories, and receive personalized recommendations.
Future enhancements promise dynamic interactions, including real-time voice conversations and live video engagement. OpenAI plans to introduce a new Voice Mode alpha version, initially available to Plus users, with a broader rollout after that. GPT-4o also enhances language capabilities, supporting over 50 languages, thus making advanced AI more globally accessible.
The rollout to ChatGPT Plus and Team users begins, with Enterprise availability forthcoming. Plus, users will enjoy significantly increased message limits compared to free users, while Team and Enterprise users will benefit from even higher limits.
In a post published on his blog after the live-streaming event, Altman reflected on OpenAI’s journey:
“Today marks a significant milestone in our journey. Our mission has always been to make advanced AI accessible to everyone, and with GPT-4o, we’re one step closer to realizing that vision. As we introduce this groundbreaking technology, I’m reminded of our initial aspirations at OpenAI—to harness AI for the greater good. Now, it’s not just about what we create but how others use it to shape a better world. The new Voice Mode is a game-changer, bringing us closer to human-like interaction with machines. It’s fast, intuitive, and a glimpse into the future of computing..“
As GPT-4o debuts, OpenAI is opening new doors in AI. What wonders lie ahead with this upgrade? Let’s wait and see if this new version lives up to the hype! Feel free to share your thoughts in the comments.