OpenAI is preparing a significant update for ChatGPT, enabling the chatbot to engage in voice conversations with users and interact using images. This update will bring ChatGPT closer to popular AI assistants like Apple’s Siri, Amazon’s Alexa, and Samsung’s Bixby.
In a blog post published on Monday (September 25), OpenAI emphasized that the voice feature will open doors to various creative and accessibility-focused applications.
OpenAI stated, “We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.”
The company believes that voice and image capabilities will provide users with more versatile ways to integrate ChatGPT into their daily lives. For example, users can take a picture of a landmark while traveling and engage in a live conversation with the chatbot about what makes it interesting.
OpenAI also provided examples of how the advanced chatbot can assist users, such as helping a child with a math problem by taking a photo of the problem set and sharing hints based on the image.
Currently, similar AI services like Siri and Alexa are integrated into the devices they operate on, primarily used for setting alarms, reminders, and providing internet-based information.
OpenAI plans to roll out voice and image capabilities in ChatGPT to Plus and Enterprise users over the next two weeks. Voice will be available on iOS and Android (opt-in through settings), and images will be supported on all platforms.
To utilize the new voice feature, users can head to Settings ? New Features on the mobile app, opt into voice conversations, and select their preferred voice from five different options.
OpenAI explained that the new voice capability is powered by a text-to-speech model capable of generating human-like audio from text and a short sample of speech. Professional voice actors collaborated in creating each voice, and the Whisper open-source speech recognition system transcribes spoken words into text.
Regarding image understanding, it is powered by multimodal GPT-3.5 and GPT-4 models. These models apply their language reasoning skills to a wide range of images. Users can show ChatGPT one or more images and inquire about various issues, such as troubleshooting problems, exploring refrigerator contents, or analyzing complex graphs for work-related data.
Post Your Comments