Speech Recognition: The customer's spoken language is captured and converted into text, translated into the agent's preferred language, and converted back into speech using DeepL Voice2Voice API.