ChatGPT Can Now See, Hear, and Speak


ChatGPT image recognition capability

Today, OpenAI began rolling out new voice and image features to ChatGPT Plus and ChatGPT Enterprise users.

Here is a list of the new ChatGPT features:

  • Take a picture from the ChatGPT app on your phone and then ask questions about the picture. Submit follow up pictures and text and get follow up responses.
  • Submit a picture to ChatGPT on your desktop or laptop computer and ask questions about the picture.
  • Speak directly to ChatGPT via your phone app and have ChatGPT recognize and respond to your words.
  • Interact with ChatGPT via voice. You speak, and ChatGPT will speak back.

If you have a subscription to ChatGPT Plus or Enterprise, you can check if you have access to the new features by going to the ChatGPT mobile app, going to Settings, and then checking if there is something called “New Features”. If so, then click to opt into voice conversations. Then, tap the headphone button located in the top-right corner of the home screen and choose your preferred voice for ChatGPT to use.

Next, you can check if you have a New Feature for images. Opt in as you did with voice.

If you don’t see the New Features section under ChatGPT Settings on the mobile app, and you have a subscription to either ChatGPT Plus or Enterprise, then don’t worry. You should get access to both the voice feature and the image feature within 2 weeks.

How do the new ChatGPT voice features work?

The new ChatGPT voice capability uses a new text-to-speech model built by OpenAI together with Whisper, an open-source speech recognition system.

Ricky Nave

In college, Ricky studied physics & math, won a prestigious research competition hosted by Oak Ridge National Laboratory, started several small businesses including an energy chewing gum business and a computer repair business, and graduated with a thesis in algebraic topology. After graduating, Ricky attended grad school at Duke University in the mathematics PhD program where he worked on quantum algorithms & non-Euclidean geometry models for flexible proteins. He also worked in cybersecurity at Los Alamos during this time before eventually dropping out of grad school to join a startup working on formal semantic modeling for legal documents. Finally, he left that startup to start his own in the finance & crypto space. Now, he helps entrepreneurs pay less capital gains tax.

Recent Posts