如何识别语音 - 语音服务 - Azure Cognitive Services

创建语音配置

若要使用语音 SDK 调用语音服务，需要创建 SpeechConfig 实例。此类包含有关订阅的信息，例如密钥和关联的位置/区域、终结点、主机或授权令牌。
使用密钥和位置/区域创建 SpeechConfig 实例。在 Azure 门户中创建语音资源。有关详细信息，请参阅创建新的 Azure 认知服务资源。
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
class Program 
    async static Task Main(string[] args)
        var speechConfig = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
可以通过其他几种方式初始化 SpeechConfig：
使用终结点：传入语音服务终结点。 密钥或授权令牌是可选的。
使用主机：传入主机地址。 密钥或授权令牌是可选的。
使用授权令牌：传入授权令牌和关联的区域/位置。
无论你是要执行语音识别、语音合成、翻译，还是意向识别，都需要创建一个配置。
识别来自麦克风的语音
若要使用设备麦克风识别语音，请使用 FromDefaultMicrophoneInput() 创建 AudioConfig 实例。 然后通过传递 audioConfig 和 speechConfig 初始化 SpeechRecognizer。
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
class Program 
    async static Task FromMic(SpeechConfig speechConfig)
        using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
        using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
        Console.WriteLine("Speak into your microphone.");
        var result = await speechRecognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    async static Task Main(string[] args)
        var speechConfig = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
        await FromMic(speechConfig);
如果你想使用特定的音频输入设备，则需要在 AudioConfig 中指定设备 ID。 了解如何获取音频输入设备的设备 ID。
从文件中识别语音
如果要从音频文件（而不是麦克风）识别语音，仍需要创建 AudioConfig 实例。 但需要调用 FromWavFileInput()（而不是调用 FromDefaultMicrophoneInput()）并传递文件路径：
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
class Program 
    async static Task FromFile(SpeechConfig speechConfig)
        using var audioConfig = AudioConfig.FromWavFileInput("PathToFile.wav");
        using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
        var result = await speechRecognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    async static Task Main(string[] args)
        var speechConfig = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
        await FromFile(speechConfig);
识别来自内存中流的语音
对于许多用例，你的音频数据可能来自 blob 存储，或者已经作为 byte[] 实例或类似的原始数据结构存在于内存中。 以下示例使用 PushAudioInputStream 来识别语音，语音本质上是抽象的内存流。 该示例代码执行下列操作：
使用接受 byte[] 实例的 Write() 函数将原始音频数据 (PCM) 写入 PushAudioInputStream。
为了演示目的，请使用 FileReader 读取 .wav 文件。 如果 byte[] 实例中已有音频数据，则可以直接跳过此步骤，将内容写入输入流。
默认格式是 16 位 16-KHz 单声道 PCM。 若要自定义格式，可以使用静态函数 AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels) 将 AudioStreamFormat 对象传递给 CreatePushStream()。
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
class Program 
    async static Task FromStream(SpeechConfig speechConfig)
        var reader = new BinaryReader(File.OpenRead("PathToFile.wav"));
        using var audioConfigStream = AudioInputStream.CreatePushStream();
        using var audioConfig = AudioConfig.FromStreamInput(audioConfigStream);
        using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
        byte[] readBytes;
            readBytes = reader.ReadBytes(1024);
            audioConfigStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);
        var result = await speechRecognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    async static Task Main(string[] args)
        var speechConfig = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
        await FromStream(speechConfig);
使用推送流作为输入时，假定音频数据是原始 PCM 并跳过任何标头。 如果未跳过标头，API 在某些情况下仍可正常运行。 但为获得最佳结果，请考虑实现读取标头的逻辑，使 byte[] 从开始。

前面的示例只是从 result.text 获取已识别的文本。 若要处理错误和其他响应，需要编写一些代码来处理结果。 以下代码评估 result.Reason 属性并：
输出识别结果：ResultReason.RecognizedSpeech。
如果没有识别匹配项，则通知用户：ResultReason.NoMatch。
如果遇到错误，则输出错误消息：ResultReason.Canceled。
switch (result.Reason)
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
        if (cancellation.Reason == CancellationReason.Error)
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
        break;
使用连续识别
前面的示例使用单步识别，可识别单个言语。 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。
与此相反，当你想控制何时停止识别时，可使用连续识别。 它要求你订阅 Recognizing、Recognized 和 Canceled 事件以获取识别结果。 若要停止识别，必须调用 StopContinuousRecognitionAsync。 下面是有关如何对音频输入文件执行连续识别的示例。
首先定义输入并初始化 SpeechRecognizer：
using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
然后，创建 TaskCompletionSource<int> 实例来管理语音识别的状态：
var stopRecognition = new TaskCompletionSource<int>();
接下来，订阅 SpeechRecognizer 发送的事件：
Recognizing：事件信号，包含中间识别结果。
Recognized：包含最终识别结果的事件信号，指示成功的识别尝试。
SessionStopped：事件信号，指示识别会话的结束（操作）。
Canceled：事件信号，包含已取消的识别结果。 这些结果指示因直接取消请求而取消的识别尝试。 或者，它们指示传输或协议失败。
speechRecognizer.Recognizing += (s, e) =>
    Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
speechRecognizer.Recognized += (s, e) =>
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
        Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
    else if (e.Result.Reason == ResultReason.NoMatch)
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
speechRecognizer.Canceled += (s, e) =>
    Console.WriteLine($"CANCELED: Reason={e.Reason}");
    if (e.Reason == CancellationReason.Error)
        Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
    stopRecognition.TrySetResult(0);
speechRecognizer.SessionStopped += (s, e) =>
    Console.WriteLine("\n    Session stopped event.");
    stopRecognition.TrySetResult(0);
设置所有项后，调用 StartContinuousRecognitionAsync 开始识别：
await speechRecognizer.StartContinuousRecognitionAsync();
// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });
// Make the following call at some point to stop recognition:
// await speechRecognizer.StopContinuousRecognitionAsync();
更改源语言
语音识别的常见任务是指定输入（或源）语言。 以下示例演示如何将输入语言更改为意大利语。 在代码中找到 SpeechConfig 实例，并直接在其下方添加此行：
speechConfig.SpeechRecognitionLanguage = "it-IT";
SpeechRecognitionLanguage 属性需要语言区域设置格式字符串。 请参阅支持的语音转文本区域设置列表。
当需要识别音频源中的语言并将其转录为文本时，可以将语言识别与语音转文本识别结合使用。
有关完整的代码示例，请参阅语言识别。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
speechConfig.EndpointId = "YourEndpointId";
var speechRecognizer = new SpeechRecognizer(speechConfig);
更改静音的处理方式
如果预计用户说话比平时更快或更慢，则输入音频中非语音静音的默认行为可能不会产生预期的结果。 静音处理的常见问题包括：
快速语音将许多句子接在一起，会形成单个识别结果，而不是将句子分解成单独的结果
慢速语音将单个句子的各部分划分成多个结果
等待语音开始时，单次识别结束过快
可以通过在用于创建 SpeechRecognizer 的 SpeechConfig 上设置以下任一超时属性来解决这些问题：
分段静音超时调整在当前朗读的短语被视为“完成”之前，该短语中允许的非语音音频量。
              较高的值通常会使结果更长，允许说话者在一个短语中有更长的停顿，但会使结果需要更长的时间才能到达，如果设置得过高，也可能使多个单独的短语组合成单个结果
              较低的值通常会使结果更短，并确保短语之间更及时和更频繁地中断，但如果设置得过低，也可能导致单个短语被分成多个结果
此超时可以设置为 100 到 5000 之间的整数值（以毫秒为单位），典型默认值为 500
初始静音超时调整在识别尝试以“不匹配”结果结束之前，在一个短语之前允许的非语音音频量。
              较高的值会给说话者更多的时间做出反应并开始说话，但当什么都不说时，响应速度也会变慢
              较低的值可确保提示“不匹配”，以获得更快的用户体验和更可控的音频处理，但如果设置得过低，可能会过快中断说话者
因为连续识别会产生很多结果，因此此值决定了“不匹配”结果到达的频率，但不会影响识别结果的内容
此超时可以设置为任何非负整数值（以毫秒为单位），或设置为 0 以完全禁用它；5000 是单次识别的典型默认值，而 15000 是连续识别的典型默认值
由于在修改这些超时时需要权衡取舍，因此建议仅在观察到与静音处理相关的问题时更改设置。 默认值以最佳方式处理大多数语音音频，只有不常见的场景才会遇到问题。
示例：用户说出诸如“ABC-123-4567”之类的序列号时，会在字符组之间停顿足够长的时间，以便将序列号分成多个结果。 在这种情况下，将分段静音超时设置为更高的值（例如 2000 毫秒）可能会有所帮助：
speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000");
示例：录制的演讲者的演讲速度足够快，可以将连续的几个句子组合在一起，而较大的识别结果每分钟只到达一到两次。 在这种情况下，将分段静音超时设置为较低的值（例如 300 毫秒）可能会有所帮助：
speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "300");
示例：单次识别要求说话者查找和阅读序列号，但在查找序列号时结束得太快。 在这种情况下，较长的初始静音超时（例如 10000 毫秒）可能会有所帮助：
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "10000");
参考文档包 (NuGet)GitHub 上的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
创建语音配置
若要使用语音 SDK 调用语音服务，需要创建 SpeechConfig 实例。 此类包含有关订阅的信息，例如密钥和关联的位置/区域、终结点、主机或授权令牌。
使用密钥和区域创建 SpeechConfig 实例。 在 Azure 门户中创建语音资源。 有关详细信息，请参阅创建新的 Azure 认知服务资源。
using namespace std;
using namespace Microsoft::CognitiveServices::Speech;
auto speechConfig = SpeechConfig::FromSubscription("YourSpeechKey", "YourSpeechRegion");
可以通过其他几种方式初始化 SpeechConfig：
使用终结点：传入语音服务终结点。 密钥或授权令牌是可选的。
使用主机：传入主机地址。 密钥或授权令牌是可选的。
使用授权令牌：传入授权令牌和关联的区域。
无论你是要执行语音识别、语音合成、翻译，还是意向识别，都需要创建一个配置。
识别来自麦克风的语音
若要使用设备麦克风识别语音，请使用 FromDefaultMicrophoneInput() 创建 AudioConfig 实例。 然后通过传递 audioConfig 和 config 初始化 SpeechRecognizer。
using namespace Microsoft::CognitiveServices::Speech::Audio;
auto audioConfig = AudioConfig::FromDefaultMicrophoneInput();
auto speechRecognizer = SpeechRecognizer::FromConfig(config, audioConfig);
cout << "Speak into your microphone." << std::endl;
auto result = speechRecognizer->RecognizeOnceAsync().get();
cout << "RECOGNIZED: Text=" << result->Text << std::endl;
如果你想使用特定的音频输入设备，则需要在 AudioConfig 中指定设备 ID。 了解如何获取音频输入设备的设备 ID。
从文件中识别语音
如果要从音频文件（而不是使用麦克风）识别语音，仍需要创建 AudioConfig 实例。 但需要调用 FromWavFileInput()（而不是调用 FromDefaultMicrophoneInput()）并传递文件路径：
using namespace Microsoft::CognitiveServices::Speech::Audio;
auto audioConfig = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto speechRecognizer = SpeechRecognizer::FromConfig(config, audioConfig);
auto result = speechRecognizer->RecognizeOnceAsync().get();
cout << "RECOGNIZED: Text=" << result->Text << std::endl;
使用 Recognizer 类识别语音
用于 C++ 的语音 SDK 的Recognizer 类公开了一些可用于语音识别的方法。
单步识别可异步识别单个言语。 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。 下面是通过 RecognizeOnceAsync 进行异步单步识别的示例：
auto result = speechRecognizer->RecognizeOnceAsync().get();
需要编写一些代码来处理结果。 此示例计算 result->Reason 并：
输出识别结果：ResultReason::RecognizedSpeech。
如果没有识别匹配项，则通知用户：ResultReason::NoMatch。
如果遇到错误，则输出错误消息：ResultReason::Canceled。
switch (result->Reason)
    case ResultReason::RecognizedSpeech:
        cout << "We recognized: " << result->Text << std::endl;
        break;
    case ResultReason::NoMatch:
        cout << "NOMATCH: Speech could not be recognized." << std::endl;
        break;
    case ResultReason::Canceled:
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
            if (cancellation->Reason == CancellationReason::Error) {
                cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
        break;
    default:
        break;
连续识别涉及的方面比单步识别多一点。 它要求你订阅 Recognizing、Recognized 和 Canceled 事件以获取识别结果。 若要停止识别，必须调用 StopContinuousRecognitionAsync。 下面是有关如何对音频输入文件执行连续识别的示例。
首先定义输入并初始化 SpeechRecognizer：
auto audioConfig = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto speechRecognizer = SpeechRecognizer::FromConfig(config, audioConfig);
接下来，创建一个变量来管理语音识别的状态。 声明 promise<void>，因为在开始识别时，你可以放心地假设该操作尚未完成：
promise<void> recognitionEnd;
接下来，订阅 SpeechRecognizer 发送的事件：




    

Recognizing：事件信号，包含中间识别结果。
Recognized：包含最终识别结果的事件信号，指示成功的识别尝试。
SessionStopped：事件信号，指示识别会话的结束（操作）。
Canceled：事件信号，包含已取消的识别结果。 这些结果指示因直接取消请求而取消的识别尝试。 或者，它们指示传输或协议失败。
speechRecognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
        cout << "Recognizing:" << e.Result->Text << std::endl;
speechRecognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
            cout << "RECOGNIZED: Text=" << e.Result->Text 
                 << " (text could not be translated)" << std::endl;
        else if (e.Result->Reason == ResultReason::NoMatch)
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
speechRecognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
        cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
        if (e.Reason == CancellationReason::Error)
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                 << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                 << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
            recognitionEnd.set_value(); // Notify to stop recognition.
speechRecognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
        cout << "Session stopped.";
        recognitionEnd.set_value(); // Notify to stop recognition.
设置所有项后，调用 StopContinuousRecognitionAsync 开始识别：
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
speechRecognizer->StartContinuousRecognitionAsync().get();
// Waits for recognition end.
recognitionEnd.get_future().get();
// Stops recognition.
speechRecognizer->StopContinuousRecognitionAsync().get();
更改源语言
语音识别的常见任务是指定输入（或源）语言。 以下示例演示如何将输入语言更改为德语。 在代码中找到 SpeechConfig 实例，并直接在其下方添加此行：
speechConfig->SetSpeechRecognitionLanguage("de-DE");
SetSpeechRecognitionLanguage 是采用字符串作为实参的形参。 请参阅支持的语音转文本区域设置列表。
当需要识别音频源中的语言并将其转录为文本时，可以将语言识别与语音转文本识别结合使用。
有关完整的代码示例，请参阅语言识别。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
auto speechConfig = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
speechConfig->SetEndpointId("YourEndpointId");
auto speechRecognizer = SpeechRecognizer::FromConfig(speechConfig);
参考文档包 (Go)GitHub 上的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
从麦克风识别语音转文本
使用下面的代码示例从默认的设备麦克风运行语音识别。 将变量 subscription 和 region 分别替换为你的语音密钥和位置/区域。 在 Azure 门户中创建语音资源。 有关详细信息，请参阅创建新的 Azure 认知服务资源。 运行脚本将在默认麦克风上启动识别会话并输出文本。
package main
import (
	"bufio"
	"fmt"
	"github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
	"github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
func sessionStartedHandler(event speech.SessionEventArgs) {
	defer event.Close()
	fmt.Println("Session Started (ID=", event.SessionID, ")")
func sessionStoppedHandler(event speech.SessionEventArgs) {
	defer event.Close()
	fmt.Println("Session Stopped (ID=", event.SessionID, ")")
func recognizingHandler(event speech.SpeechRecognitionEventArgs) {
	defer event.Close()
	fmt.Println("Recognizing:", event.Result.Text)
func recognizedHandler(event speech.SpeechRecognitionEventArgs) {
	defer event.Close()
	fmt.Println("Recognized:", event.Result.Text)
func cancelledHandler(event speech.SpeechRecognitionCanceledEventArgs) {
	defer event.Close()
	fmt.Println("Received a cancellation: ", event.ErrorDetails)
	fmt.Println("Did you set the speech resource key and region values?")
func main() {
    subscription :=  "YourSpeechKey"
    region := "YourSpeechRegion"
	audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
	if err != nil {
		fmt.Println("Got an error: ", err)
		return
	defer audioConfig.Close()
	config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
	if err != nil {
		fmt.Println("Got an error: ", err)
		return
	defer config.Close()
	speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
	if err != nil {
		fmt.Println("Got an error: ", err)
		return
	defer speechRecognizer.Close()
	speechRecognizer.SessionStarted(sessionStartedHandler)
	speechRecognizer.SessionStopped(sessionStoppedHandler)
	speechRecognizer.Recognizing(recognizingHandler)
	speechRecognizer.Recognized(recognizedHandler)
	speechRecognizer.Canceled(cancelledHandler)
	speechRecognizer.StartContinuousRecognitionAsync()
	defer speechRecognizer.StopContinuousRecognitionAsync()
	bufio.NewReader(os.Stdin).ReadBytes('\n')
运行以下命令，创建一个 go.mod 文件并使其关联到 GitHub 上托管的组件：
go mod init quickstart
go get github.com/Microsoft/cognitive-services-speech-sdk-go
现在生成并运行代码：
go build
go run quickstart
有关详细信息，请参阅 SpeechConfig 类的参考内容和 SpeechRecognizer 类的参考内容。
从音频文件识别语音转文本
使用以下示例从音频文件运行语音识别。 将变量 subscription 和 region 分别替换为你的语音密钥和位置/区域。 在 Azure 门户中创建语音资源。 有关详细信息，请参阅创建新的 Azure 认知服务资源。 此外，将变量 file 替换为 .wav 文件的路径。 运行脚本将从文件识别语音，并输出文本结果。
package main
import (
	"fmt"
	"time"
	"github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
	"github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
func main() {
    subscription :=  "YourSpeechKey"
    region := "YourSpeechRegion"
    file := "path/to/file.wav"
	audioConfig, err := audio.NewAudioConfigFromWavFileInput(file)
	if err != nil {
		fmt.Println("Got an error: ", err)
		return
	defer audioConfig.Close()
	config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
	if err != nil {
		fmt.Println("Got an error: ", err)
		return
	defer config.Close()
	speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
	if err != nil {
		fmt.Println("Got an error: ", err)
		return
	defer speechRecognizer.Close()
	speechRecognizer.SessionStarted(func(event speech.SessionEventArgs) {
		defer event.Close()
		fmt.Println("Session Started (ID=", event.SessionID, ")")
	speechRecognizer.SessionStopped(func(event speech.SessionEventArgs) {
		defer event.Close()
		fmt.Println("Session Stopped (ID=", event.SessionID, ")")
	task := speechRecognizer.RecognizeOnceAsync()
	var outcome speech.SpeechRecognitionOutcome
	select {
	case outcome = <-task:
	case <-time.After(5 * time.Second):
		fmt.Println("Timed out")
		return
	defer outcome.Close()
	if outcome.Error != nil {
		fmt.Println("Got an error: ", outcome.Error)
	fmt.Println("Got a recognition!")
	fmt.Println(outcome.Result.Text)
运行以下命令，创建一个 go.mod 文件并使其关联到 GitHub 上托管的组件：
go mod init quickstart
go get github.com/Microsoft/cognitive-services-speech-sdk-go
现在生成并运行代码：
go build
go run quickstart
有关详细信息，请参阅 SpeechConfig 类的参考内容和 SpeechRecognizer 类的参考内容。
参考文档 | GitHub 上的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
创建语音配置
若要使用语音 SDK 调用语音服务，需要创建 SpeechConfig 实例。 此类包含有关订阅的信息，例如密钥和关联的位置/区域、终结点、主机或授权令牌。
使用密钥和位置/区域创建 SpeechConfig 实例。 在 Azure 门户中创建语音资源。 有关详细信息，请参阅创建新的 Azure 认知服务资源。
import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
可以通过其他几种方式初始化 SpeechConfig：
使用终结点：传入语音服务终结点。 密钥或授权令牌是可选的。
使用主机：传入主机地址。 密钥或授权令牌是可选的。
使用授权令牌：传入授权令牌和关联的区域。
无论你是要执行语音识别、语音合成、翻译，还是意向识别，都需要创建一个配置。
识别来自麦克风的语音
若要使用设备麦克风识别语音，请使用 fromDefaultMicrophoneInput() 创建 AudioConfig 实例。 然后通过传递 audioConfig 和 config 初始化 SpeechRecognizer。
import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        fromMic(speechConfig);
    public static void fromMic(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
        AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
        SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
        System.out.println("Speak into your microphone.");
        Future<SpeechRecognitionResult> task = speechRecognizer.recognizeOnceAsync();
        SpeechRecognitionResult result = task.get();
        System.out.println("RECOGNIZED: Text=" + result.getText());
如果你想使用特定的音频输入设备，则需要在 AudioConfig 中指定设备 ID。 了解如何获取音频输入设备的设备 ID。
从文件中识别语音
如果要从音频文件（而不是使用麦克风）识别语音，仍需要创建 AudioConfig 实例。 但需要调用 fromWavFileInput()（而不是调用 fromDefaultMicrophoneInput()）并传递文件路径：
import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class Program {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        SpeechConfig speechConfig = SpeechConfig.fromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        fromFile(speechConfig);
    public static void fromFile(SpeechConfig speechConfig) throws InterruptedException, ExecutionException {
        AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
        SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
        Future<SpeechRecognitionResult> task = speechRecognizer.recognizeOnceAsync();
        SpeechRecognitionResult result = task.get();
        System.out.println("RECOGNIZED: Text=" + result.getText());
前面的示例仅使用 result.getText() 获取已识别的文本。 若要处理错误和其他响应，需要编写一些代码来处理结果。 以下示例计算 result.getReason() 和：
输出识别结果：ResultReason.RecognizedSpeech。
如果没有识别匹配项，则通知用户：ResultReason.NoMatch。
如果遇到错误，则输出错误消息：ResultReason.Canceled。
switch (result.getReason()) {
    case ResultReason.RecognizedSpeech:
        System.out.println("We recognized: " + result.getText());
        exitCode = 0;
        break;
    case ResultReason.NoMatch:
        System.out.println("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled: {
            CancellationDetails cancellation = CancellationDetails.fromResult(result);
            System.out.println("CANCELED: Reason=" + cancellation.getReason());
            if (cancellation.getReason() == CancellationReason.Error) {
                System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                System.out.println("CANCELED: Did you set the speech resource key and region values?");
        break;
使用连续识别
前面的示例使用单步识别，可识别单个言语。 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。
与此相反，当你想控制何时停止识别时，可使用连续识别。 它要求你订阅 recognizing、recognized 和 canceled 事件以获取识别结果。 若要停止识别，必须调用 stopContinuousRecognitionAsync。 下面是有关如何对音频输入文件执行连续识别的示例。
首先定义输入并初始化 SpeechRecognizer：
AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
SpeechRecognizer speechRecognizer = new SpeechRecognizer(config, audioConfig);
接下来，创建一个变量来管理语音识别的状态。 在类范围中声明 Semaphore 实例：
private static Semaphore stopTranslationWithFileSemaphore;
接下来，订阅 SpeechRecognizer 发送的事件：
recognizing：事件信号，包含中间识别结果。
recognized：包含最终识别结果的事件信号，指示成功的识别尝试。
sessionStopped：事件信号，指示识别会话的结束（操作）。
canceled：事件信号，包含已取消的识别结果。 这些结果指示因直接取消请求而取消的识别尝试。 或者，它们指示传输或协议失败。
// First initialize the semaphore.
stopTranslationWithFileSemaphore = new Semaphore(0);
speechRecognizer.recognizing.addEventListener((s, e) -> {
    System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
speechRecognizer.recognized.addEventListener((s, e) -> {
    if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
        System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
    else if (e.getResult().getReason() == ResultReason.NoMatch) {
        System.out.println("NOMATCH: Speech could not be recognized.");
speechRecognizer.canceled.addEventListener((s, e) -> {
    System.out.println("CANCELED: Reason=" + e.getReason());
    if (e.getReason() == CancellationReason.Error) {
        System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
        System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
        System.out.println("CANCELED: Did you set the speech resource key and region values?");
    stopTranslationWithFileSemaphore.release();
speechRecognizer.sessionStopped.addEventListener((s, e) -> {
    System.out.println("\n    Session stopped event.");
    stopTranslationWithFileSemaphore.release();
设置所有项后，调用 startContinuousRecognitionAsync 开始识别：




    

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
speechRecognizer.startContinuousRecognitionAsync().get();
// Waits for completion.
stopTranslationWithFileSemaphore.acquire();
// Stops recognition.
speechRecognizer.stopContinuousRecognitionAsync().get();
更改源语言
语音识别的常见任务是指定输入（或源）语言。 以下示例演示如何将输入语言更改为法语。 在代码中找到 SpeechConfig 实例，并直接在其下方添加此行：
config.setSpeechRecognitionLanguage("fr-FR");
setSpeechRecognitionLanguage 是采用字符串作为实参的形参。 请参阅支持的语音转文本区域设置列表。
当需要识别音频源中的语言并将其转录为文本时，可以将语言识别与语音转文本识别结合使用。
有关完整的代码示例，请参阅语言识别。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
SpeechConfig speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
speechConfig.setEndpointId("YourEndpointId");
SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig);
参考文档包 (npm)GitHub 上的其他示例库源代码
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
创建语音配置
若要使用语音 SDK 调用语音服务，需要创建 SpeechConfig 实例。 此类包含有关订阅的信息，例如密钥和关联的位置/区域、终结点、主机或授权令牌。
使用密钥和位置/区域创建 SpeechConfig 实例。 在 Azure 门户中创建语音资源。 有关详细信息，请参阅创建新的 Azure 认知服务资源。
const speechConfig = sdk.SpeechConfig.fromSubscription("YourSpeechKey", "YourSpeechRegion");
可以通过其他几种方式初始化 SpeechConfig：
使用终结点：传入语音服务终结点。 密钥或授权令牌是可选的。
使用主机：传入主机地址。 密钥或授权令牌是可选的。
使用授权令牌：传入授权令牌和关联的位置/区域。
无论你是要执行语音识别、语音合成、翻译，还是意向识别，都需要创建一个配置。
识别来自麦克风的语音
Node.js 中不支持识别来自麦克风的语音。 仅在基于浏览器的 JavaScript 环境中支持该功能。 有关详细信息，请参阅 GitHub 上的 React 示例和从麦克风实现语音转文本。 React 示例演示身份验证令牌交换和管理的设计模式。 该示例还演示如何从麦克风或文件捕获音频以进行语音转文本转换。
如果你想使用特定的音频输入设备，则需要在 AudioConfig 中指定设备 ID。 了解如何获取音频输入设备的设备 ID。
从文件中识别语音
要从音频文件识别语音，请使用 fromWavFileInput() 创建一个接受 Buffer 对象的 AudioConfig 实例。 然后通过传递 audioConfig 和 speechConfig 初始化 SpeechRecognizer。
const fs = require('fs');
const sdk = require("microsoft-cognitiveservices-speech-sdk");
const speechConfig = sdk.SpeechConfig.fromSubscription("YourSpeechKey", "YourSpeechRegion");
function fromFile() {
    let audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync("YourAudioFile.wav"));
    let speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    speechRecognizer.recognizeOnceAsync(result => {
        console.log(`RECOGNIZED: Text=${result.text}`);
        speechRecognizer.close();
fromFile();
识别来自内存中流的语音
对于许多用例，你的音频数据可能来自 Blob 存储。 或者已经作为 ArrayBuffer 或类似的原始数据结构存在于内存中。 下面的代码：
使用 createPushStream() 创建推送流。
出于演示目的，使用 fs.createReadStream 读取 .wav 文件。 如果 ArrayBuffer 中已有音频数据，则可以直接跳过此步骤，将内容写入输入流。
使用推送流创建音频配置。
const fs = require('fs');
const sdk = require("microsoft-cognitiveservices-speech-sdk");
const speechConfig = sdk.SpeechConfig.fromSubscription("YourSpeechKey", "YourSpeechRegion");
function fromStream() {
    let pushStream = sdk.AudioInputStream.createPushStream();
    fs.createReadStream("YourAudioFile.wav").on('data', function(arrayBuffer) {
        pushStream.write(arrayBuffer.slice());
    }).on('end', function() {
        pushStream.close();
    let audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
    let speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
    speechRecognizer.recognizeOnceAsync(result => {
        console.log(`RECOGNIZED: Text=${result.text}`);
        speechRecognizer.close();
fromStream();
使用推送流作为输入假定音频数据是跳过任何标头的原始 PCM。 如果未跳过标头，API 在某些情况下仍可正常运行。 但为获得最佳结果，请考虑实现读取标头的逻辑，使 fs 从音频数据的开头处开始。
前面的示例只是从 result.text 获取已识别的文本。 若要处理错误和其他响应，需要编写一些代码来处理结果。 以下代码评估 result.reason 属性并：
输出识别结果：ResultReason.RecognizedSpeech。
如果没有识别匹配项，则通知用户：ResultReason.NoMatch。
如果遇到错误，则输出错误消息：ResultReason.Canceled。
switch (result.reason) {
    case sdk.ResultReason.RecognizedSpeech:
        console.log(`RECOGNIZED: Text=${result.text}`);
        break;
    case sdk.ResultReason.NoMatch:
        console.log("NOMATCH: Speech could not be recognized.");
        break;
    case sdk.ResultReason.Canceled:
        const cancellation = sdk.CancellationDetails.fromResult(result);
        console.log(`CANCELED: Reason=${cancellation.reason}`);
        if (cancellation.reason == sdk.CancellationReason.Error) {
            console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
            console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
            console.log("CANCELED: Did you set the speech resource key and region values?");
        break;
使用连续识别
前面的示例使用单步识别，可识别单个言语。 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。
与此相反，当你想控制何时停止识别时，可使用连续识别。 它要求你订阅 Recognizing、Recognized 和 Canceled 事件以获取识别结果。 若要停止识别，必须调用 stopContinuousRecognitionAsync。 下面是有关如何对音频输入文件执行连续识别的示例。
首先定义输入并初始化 SpeechRecognizer：
const speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
接下来，我们将订阅从 SpeechRecognizer 发送的事件：
recognizing：事件信号，包含中间识别结果。
recognized：包含最终识别结果的事件信号，指示成功的识别尝试。
sessionStopped：事件信号，指示识别会话的结束（操作）。
canceled：事件信号，包含已取消的识别结果。 这些结果指示因直接取消请求而取消的识别尝试。 或者，它们指示传输或协议失败。
speechRecognizer.recognizing = (s, e) => {
    console.log(`RECOGNIZING: Text=${e.result.text}`);
speechRecognizer.recognized = (s, e) => {
    if (e.result.reason == sdk.ResultReason.RecognizedSpeech) {
        console.log(`RECOGNIZED: Text=${e.result.text}`);
    else if (e.result.reason == sdk.ResultReason.NoMatch) {
        console.log("NOMATCH: Speech could not be recognized.");
speechRecognizer.canceled = (s, e) => {
    console.log(`CANCELED: Reason=${e.reason}`);
    if (e.reason == sdk.CancellationReason.Error) {
        console.log(`"CANCELED: ErrorCode=${e.errorCode}`);
        console.log(`"CANCELED: ErrorDetails=${e.errorDetails}`);
        console.log("CANCELED: Did you set the speech resource key and region values?");
    speechRecognizer.stopContinuousRecognitionAsync();
speechRecognizer.sessionStopped = (s, e) => {
    console.log("\n    Session stopped event.");
    speechRecognizer.stopContinuousRecognitionAsync();
设置所有项后，调用 startContinuousRecognitionAsync 开始识别：
speechRecognizer.startContinuousRecognitionAsync();
// Make the following call at some point to stop recognition:
// speechRecognizer.stopContinuousRecognitionAsync();
更改源语言
语音识别的常见任务是指定输入（或源）语言。 以下示例演示如何将输入语言更改为意大利语。 在代码中找到 SpeechConfig 实例，并直接在其下方添加此行：
speechConfig.speechRecognitionLanguage = "it-IT";
speechRecognitionLanguage 属性需要语言区域设置格式字符串。 请参阅支持的语音转文本区域设置列表。
当需要识别音频源中的语言并将其转录为文本时，可以将语言识别与语音转文本识别结合使用。
有关完整的代码示例，请参阅语言识别。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
speechConfig.endpointId = "YourEndpointId";
var speechRecognizer = new SpeechSDK.SpeechRecognizer(speechConfig);
参考文档包（下载）GitHub 上的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
安装语音 SDK 和示例
Azure-Samples/cognitive-services-speech-sdk 库包含适用于 iOS 和 Mac 且以 Objective-C 编写的示例。 选择链接可查看每个示例的安装说明：
在 macOS 上使用 Objective-C 识别来自麦克风的语音
在 iOS 上使用 Objective-C 识别语音
有关 iOS 上 Objective-C 的其他示例
有关详细信息，请参阅适用于 Objective-C 的语音 SDK 参考。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:"YourSubscriptionKey" region:"YourServiceRegion"];
speechConfig.endpointId = "YourEndpointId";
SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] init:speechConfig];
参考文档包（下载）GitHub 上的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
安装语音 SDK 和示例
Azure-Samples/cognitive-services-speech-sdk 库包含适用于 iOS 和 Mac 且以 Swift 编写的示例。 选择链接可查看每个示例的安装说明：
在 macOS 上使用 Swift 识别语音
在 iOS 上使用 Swift 识别语音
有关详细信息，请参阅适用于 Swift 的语音 SDK 参考。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
let speechConfig = SPXSpeechConfiguration(subscription: "YourSubscriptionKey", region: "YourServiceRegion");
speechConfig.endpointId = "YourEndpointId";
let speechRecognizer = SPXSpeechRecognizer(speechConfiguration: speechConfig);
参考文档包 (PyPi)GitHub 上的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
创建语音配置
若要使用语音 SDK 调用语音服务，需要创建 SpeechConfig 实例。 此类包含有关你的订阅的信息，例如你的语音密钥和关联的位置/区域、终结点、主机或授权令牌。
使用语音密钥和位置/区域创建 SpeechConfig 实例。 在 Azure 门户中创建语音资源。 有关详细信息，请参阅创建新的 Azure 认知服务资源。
speech_config = speechsdk.SpeechConfig(subscription="YourSpeechKey", region="YourSpeechRegion")
可以通过其他几种方式初始化 SpeechConfig：
使用终结点：传入语音服务终结点。 语音密钥或授权令牌是可选项。
使用主机：传入主机地址。 语音密钥或授权令牌是可选项。
使用授权令牌：传入授权令牌和关联的区域。
无论你是要执行语音识别、语音合成、翻译，还是意向识别，都需要创建一个配置。
识别来自麦克风的语音
若要使用设备麦克风识别语音，请创建 SpeechRecognizer 实例（无需传递 AudioConfig），然后传递 speech_config：
import azure.cognitiveservices.speech as speechsdk
def from_mic():
    speech_config = speechsdk.SpeechConfig(subscription="YourSpeechKey", region="YourSpeechRegion")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    print("Speak into your microphone.")
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)
from_mic()
如果你想使用特定的音频输入设备，则需要在 AudioConfig 中指定设备 ID，并将其传递给 SpeechRecognizer 构造函数的 audio_config 参数。 了解如何获取音频输入设备的设备 ID。
从文件中识别语音
如果要从音频文件（而不是使用麦克风）识别语音，请创建 AudioConfig 实例并使用 filename 参数：
import azure.cognitiveservices.speech as speechsdk
def from_file():
    speech_config = speechsdk.SpeechConfig(subscription="YourSpeechKey", region="YourSpeechRegion")
    audio_config = speechsdk.AudioConfig(filename="your_file_name.wav")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)
from_file()
前面的示例只是从 result.text 获取已识别的文本。 若要处理错误和其他响应，需要编写一些代码来处理结果。 以下代码评估 result.reason 属性并：
输出识别结果：speechsdk.ResultReason.RecognizedSpeech。
如果没有识别匹配项，则通知用户：speechsdk.ResultReason.NoMatch。
如果遇到错误，则输出错误消息：speechsdk.ResultReason.Canceled。
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
        print("Did you set the speech resource key and region values?")
使用连续识别
前面的示例使用单步识别，可识别单个言语。 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。
与此相反，当你想控制何时停止识别时，可使用连续识别。 它要求你连接到 EventSignal 以获取识别结果。 若要停止识别，必须调用 stop_continuous_recognition() 或 stop_continuous_recognition()。 下面是有关如何对音频输入文件执行连续识别的示例。
首先定义输入并初始化 SpeechRecognizer：
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
接下来，创建一个变量来管理语音识别的状态。 将变量设置为 False，因为在开始识别时，你可以放心地假设该操作尚未完成。
done = False
现在，创建一个回调，以在收到 evt 时停止连续识别。 请记住以下几点：
收到 evt 时，系统将输出 evt 消息。
收到 evt 后，系统将调用 stop_continuous_recognition() 来停止识别。
识别状态将更改为 True。
def stop_cb(evt):
    print('CLOSING on {}'.format(evt))
    speech_recognizer.stop_continuous_recognition()
    nonlocal done
    done = True
以下代码示例演示如何将回调连接到从 SpeechRecognizer 发送的事件。 这些事件是：
recognizing：事件信号，包含中间识别结果。
recognized：包含最终识别结果的事件信号，指示成功的识别尝试。
session_started：事件信号，指示识别会话的开始（操作）。
session_stopped：事件信号，指示识别会话的结束（操作）。
canceled：事件信号，包含已取消的识别结果。 这些结果指示因直接取消请求而取消的识别尝试。 或者，它们指示传输或协议失败。
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
完成所有设置后，可以调用 start_continuous_recognition()：
speech_recognizer.start_continuous_recognition()
while not done:
    time.sleep(.5)
更改源语言
语音识别的常见任务是指定输入（或源）语言。 以下示例演示如何将输入语言更改为德语。 在代码中找到 SpeechConfig 实例，并直接在其下方添加此行：
speech_config.speech_recognition_language="de-DE"
speech_recognition_language 是采用字符串作为实参的形参。 请参阅支持的语音转文本区域设置列表。
当需要识别音频源中的语言并将其转录为文本时，可以将语言识别与语音转文本识别结合使用。
有关完整的代码示例，请参阅语言识别。
使用自定义终结点
使用自定义语音识别，可以上传自己的数据、测试和训练自定义模型、比较模型之间的准确度，以及将模型部署到自定义终结点。 以下示例演示了如何设置自定义终结点。
speech_config = speechsdk.SpeechConfig(subscription="YourSubscriptionKey", region="YourServiceRegion")
speech_config.endpoint_id = "YourEndpointId"
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
语音转文本 REST API 参考 | 适用于短音频的语音转文本 REST API 参考 | 有关 GitHub 的其他示例
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
将语音转换为文本
请在命令提示符处运行以下命令。 将以下值插入到命令中：
你的语音服务订阅密钥。
你的语音服务区域。
输入音频文件的路径。 可以使用语音转文本来生成音频文件。
curl --location --request POST 'https://INSERT_REGION_HERE.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: audio/wav' \
--data-binary @'INSERT_AUDIO_FILE_PATH_HERE'
应会收到带有 JSON 正文的响应，如下所示：
    "RecognitionStatus": "Success",
    "DisplayText": "My voice is my passport, verify me.",
    "Offset": 6600000,
    "Duration": 32100000
有关详细信息，请参阅语音转文本 REST API 参考。
本操作指南介绍如何识别并转录人类语音（通常称为语音转文本）。
从麦克风将语音转换为文本
插入并打开电脑麦克风。 关闭任何可能也使用麦克风的应用。 某些计算机具有内置麦克风，其他计算机则需要配置蓝牙设备。
现在，可以运行语音 CLI 来识别来自麦克风的语音。 在命令行中，更改为包含语音 CLI 二进制文件的目录。 然后，运行以下命令：
spx recognize --microphone
语音 CLI 默认为英语。 你可以从“语音转文本”表中选择不同语言。 例如，添加 --source de-DE 以识别德语语音。
对麦克风说话，随后可以看到字词实时转录为文本。 如果停止说话一段时间，或者选择 Ctrl+C，语音 CLI 将停止。
从音频文件将语音转换为文本
语音 CLI 可以识别多种文件格式和自然语言的语音。 在此示例中，可以使用任何包含英语语音的 WAV 文件（16 KHz 或 8 KHz，16 位，单声道 PCM）。 如果需要快速示例，请下载 whatstheweatherlike.wav  文件，并将其复制到语音 CLI 二进制文件所在的目录中。
使用以下命令运行语音 CLI，以识别音频文件中找到的语音：
spx recognize --file whatstheweatherlike.wav
语音 CLI 默认为英语。 你可以从“语音转文本”表中选择不同语言。 例如，添加 --source de-DE 以识别德语语音。
语音 CLI 将在屏幕上显示语音的文本转录。
尝试语音转文本快速入门
使用自定义语音识别提高识别准确度
使用批量听录