sonic和soundtouch两个库都用过。用法差不多。项目上线用的是soundtouch。做视频编辑，本来opengl那块就够折腾了，音频这块更是艰难。难点在于变速和混音。

音频变速呢，比视频变速复杂多了，一处理不好就杂音呲呲呲。记录一下soundtouch的攻克历程。

故事的开始

搜索了一堆资料，典型文章《当一个 Android 开发玩抖音玩疯了之后(一)》等等，基本跑通了功能，上线了。最开始抄的哪个库，在自己笔记本电脑上（找到补上）。

1.升速效果很好，降速不生效？

2.双声道的效果很好，单声道的变调得很滑稽？

3.偶现降速会挂掉？

1.有关降速不生效的问题，应该是8位转16位没有处理好。

SoundTouch支持两种类型sample类型：16位有符号整数和32位浮点数，默认使用的是32为浮点数，改为了16位有符号整型。在java中操作文件并使用mediacodec解码时使用的是byte数组流，这就涉及到如何将8位的java byte数组转换为C++中16位有符号整型进行运算。
一开始使用了jni强转，reinterpreter_cast,当然不对...
然后想在java中把byte先转为short，jni中使用jshort传参，c++中接收的short就是16位的啦，byte和short相互转换的代码如下：

 public static short[] bytesToShort(byte[] bytes) {
        if(bytes==null){
            return null;
        short[] shorts = new short[bytes.length/2];
        ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts);
        return shorts;
    public static byte[] shortToBytes(short[] shorts) {
        if(shorts==null){
            return null;
        byte[] bytes = new byte[shorts.length * 2];
        ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(shorts);
        return bytes;
发现降速还是没有生效，声音变快效果很好，但是不能变慢。只能去C++代码中寻找答案，毕竟java处理音视频数据是不得力的....发现人家是这么做转换的：
static void convertInput(jbyte* input, float* output, const int BUFF_SIZE,
        int bytesPerSample) {
    switch (bytesPerSample) {
    case 1: {
        unsigned char *temp2 = (unsigned char*) input;
        double conv = 1.0 / 128.0;
        for (int i = 0; i < BUFF_SIZE; i++) {
            output[i] = (float) (temp2[i] * conv - 1.0);
        break;
    case 2: {
        short *temp2 = (short*) input;
        double conv = 1.0 / 32768.0;
        for (int i = 0; i < BUFF_SIZE; i++) {
            short value = temp2[i];
            output[i] = (float) (value * conv);
        break;
    case 3: {
        char *temp2 = (char *) input;
        double conv = 1.0 / 8388608.0;
        for (int i = 0; i < BUFF_SIZE; i++) {
            int value = *((int*) temp2);
            value = value & 0x00ffffff;             // take 24 bits
            value |= (value & 0x00800000) ? 0xff000000 : 0; // extend minus sign bits
            output[i] = (float) (value * conv);
            temp2 += 3;
        break;
    case 4: {
        int *temp2 = (int *) input;
        double conv = 1.0 / 2147483648.0;
        assert(sizeof(int) == 4);
        for (int i = 0; i < BUFF_SIZE; i++) {
            int value = temp2[i];
            output[i] = (float) (value * conv);
        break;
bytesPerSample是什么，是采样点的字节数，噢，我们没有考虑这个参数...
2.单声道声音变怪异

使用了SoundTouch的setChannels设置了声道数，但还是不对？

再检查一下putSamples和getSamples方法
extern "C"
JNIEXPORT void JNICALL Java_com_netease_glsoundlib_SoundTouch_putSamples
        (JNIEnv *env, jobject thiz, jshortArray samples, jint length, jlong objectPtr) {
    SoundTouch *soundTouch = (SoundTouch *) objectPtr;
    // 转换为本地数组
    jshort *input_samples = env->GetShortArrayElements(samples, NULL);
    soundTouch->putSamples(input_samples, length);
    // 释放本地数组(避免内存泄露)
    env->ReleaseShortArrayElements(samples, input_samples, 0);
length传的是什么，是一个chunk的大小，固定为4096，好像不对吧，官方的解释是“Adds 'numSamples' pcs of samples from the 'samples' memory position into  the input of the object.”其参数名为“nSamples”,官方是这么计算的：
    int nSamples = BUFF_SIZE / channels;
    const int BUFF_SIZE = length / bytesPerSample;
而length才是输入数据块的大小，即sample.length
3.偶现降速会挂掉

问题应该是降速后从soundtouch接收的数据块较大，数组溢出，这块代码不对...
    soundTouch.putBytes(resultByteArray);
    while (true) {
                            byte[] modified = new byte[4096];
                            int count = soundTouch.getBytes(modified);
                            if (count > 0) {
                                bufferedOutputStream.write(modified);
                            } else {
                                break;

音频变速爬过的坑

音频变速爬过的坑

故事的开始