8 비트 오디오 샘플 ~ 16 비트

https://stackoverflow.com/questions/1203427

05-07-2019
|

문제

이것은 나의 "주말"취미 문제입니다.

나는 클래식 신시사이저의 ROM에서 잘 사랑받는 단일 사이클 파형이 있습니다.

이들은 8 비트 샘플 (256 값)입니다.

그것들은 8 비트에 불과하기 때문에 소음 바닥은 꽤 높습니다. 이것은 양자화 오류 때문입니다. 양자화 오류는 매우 이상합니다. 모든 주파수를 약간 엉망으로 만듭니다.

이주기를 취하고 "깨끗한"16 비트 버전을 만들고 싶습니다. (그렇습니다. 사람들이 더러운 버전을 좋아한다는 것을 알고 있으므로 사용자가 원하는 정도까지 더럽고 깨끗하게 보간하도록합니다.)

낮은 8 비트를 영원히 잃어 버렸기 때문에 불가능한 것 같습니다. 그러나 이것은 잠시 동안 내 머리 뒤에 있었고, 나는 그것을 할 수 있다고 확신합니다.

이들은 재생을 위해 반복해서 반복되는 단일 사이클 파형이므로 특별한 경우입니다. (물론, 신디사이저는 봉투, 변조, 크로스 퇴치 필터 등을 포함하여 사운드를 흥미롭게 만들기 위해 모든 종류의 일을합니다.)

각 개별 바이트 샘플에 대해 제가 실제로 아는 것은 16 비트 버전의 256 값 중 하나라는 것입니다. (16 비트 값이 잘림 또는 8 비트로 반올림되는 역 프로세스를 상상해보십시오.)

내 평가 함수는 최소 노이즈 플로어를 얻으려고 노력하고 있습니다. 하나 이상의 FFT로 판단 할 수 있어야합니다.

철저한 테스트는 아마도 영원히 걸릴 것이므로 저해상도를 먼저 통과 할 수 있습니다. 아니면 무작위로 선택한 값을 무작위로 푸시하고 (동일한 8 비트 버전을 유지하는 알려진 값 내에서) 평가를 수행하고 클리너 버전을 유지합니까? 아니면 더 빨리 할 수있는 일이 있습니까? 검색 공간의 다른 곳에 더 나은 최소값이있을 때 지역 최소값으로 떨어질 위험이 있습니까? 나는 다른 비슷한 상황에서 그런 일이 일어났습니다.

이웃 가치를 보면서 내가 할 수있는 초기 추측이 있습니까?

편집하다: 몇몇 사람들은 새로운 파형이 원본으로 샘플링 될 것이라는 요구 사항을 제거하면 문제가 더 쉽다고 지적했습니다. 그것은 사실입니다. 사실, 더 깨끗한 소리를 찾고 있다면 솔루션은 사소합니다.

해결책

Going with the approach in your question, I would suggest looking into hill-climbing algorithms and the like.

http://en.wikipedia.org/wiki/Hill_climbing has more information on it and the sidebox has links to other algorithms which may be more suitable.

AI is like alchemy - we never reached the final goal, but lots of good stuff came out along the way.

다른 팁

You could put your existing 8-bit sample into the high-order byte of your new 16-bit sample, and then use the low order byte to linear interpolate some new 16 bit datapoints between each original 8-bit sample.

This would essentially connect a 16 bit straight line between each of your original 8-bit samples, using several new samples. It would sound much quieter than what you have now, which is a sudden, 8-bit jump between the two original samples.

You could also try apply some low-pass filtering.

Well, I would expect some FIR filtering (IIR if you really need processing cycles, but FIR can give better results without instability) to clean up the noise. You would have to play with it to get the effect you want but the basic problem is smoothing out the sharp edges in the audio created by sampling it at 8 bit resolutions. I would give a wide birth to the center frequency of the audio and do a low pass filter, and then listen to make sure I didn't make it sound "flat" with the filter I picked.

It's tough though, there is only so much you can do, the lower 8 bits is lost, the best you can do is approximate it.

It's almost impossible to get rid of noise that looks like your signal. If you start tweeking stuff in your frequency band it will take out the signal of interest.

For upsampling, since you're already using an FFT, you can add zeros to the end of the frequency domain signal and do an inverse FFT. This completely preserves the frequecy and phase information of the original signal, although it spreads the same energy over more samples. If you shift it 8bits to be a 16bit samples first, this won't be a too much of a problem. But I usually kick it up by an integer gain factor before doing the transform.

Pete

Edit: The comments are getting a little long so I'll move some to the answer.

The peaks in the FFT output are harmonic spikes caused by the quantitization. I tend to think of them differently than the noise floor. You can dither as someone mentioned and eliminate the amplitude of the harmonic spikes and flatten out the noise floor, but you loose over all signal to noise on the flat part of your noise floor. As far as the FFT is concerned. When you interpolate using that method, it retains the same energy and spreads over more samples, this reduces the amplitude. So before doing the inverse, give your signal more energy by multipling by a gain factor.

Are the signals simple/complex sinusoids, or do they have hard edges? i.e. Triangle, square waves, etc. I'm assuming they have continuity from cycle to cycle, is that valid? If so you can also increase your FFT resolution to more precisely pinpoint frequencies by increasing the number of waveform cycles fed to your FFT. If you can precisely identify the frequencies use, assuming they are somewhat discrete, you may be able to completely recreate the intended signal.

The 16-bit to 8-bit via truncation requirement will produce results that do not match the original source. (Thus making finding an optimal answer more difficult.) Typically you would produce a fixed point waveform by attempting to "get the closest match" that means rounding to the nearest number (trunking is a floor operation). That is most likely how they were originally generated. Adding 0.5 (in this case 0.5 is 128) and then trunking the output would allow you to generate more accurate results. If that's not a worry then ok, but it definitely will have a negative effect on accuracy.

UPDATED: Why? Because the goal of sampling a signal is to be able to as close a possible reproduce the signal. If conversion threshold is set poorly on the sampling all you're error is to one side of signal and not well distributed and centered about zero. On such systems you typically try to maximize the use the availiable dynamic range, particularly if you have low resolution such as an 8-bit ADC.

Band limited versions? If they are filtered at different frequencies, I'd suspect it was to allow you to play the same sound with out distortions when you went too far out from the other variation. Kinda like mipmapping in graphics. I suspect the two are the same signal with different aliasing filters applied, this may be useful in reproducing the original. They should be the same base signal with different convolutions applied.

There might be a simple approach taking advantange of the periodicity of the waveforms. How about if you:

Make a 16-bit waveform where the high bytes are the waveform and the low bytes are zero - call it x[n].
Calculate the discrete Fourier transform of x[n] = X[w].
Make a signal Y[w] = (dBMag(X[w]) > Threshold) ? X[w] : 0, where dBMag(k) = 10*log10(real(k)^2 + imag(k)^2), and Threshold is maybe 40 dB, based on 8 bits being roughly 48 dB dynamic range, and allowing ~1.5 bits of noise.
Inverse transform Y[w] to get y[n], your new 16 bit waveform.
If y[n] doesn't sound nice, dither it with some very low level noise.

Notes:

A. This technique only works in the original waveforms are exactly periodic!

B. Step 5 might be replaced with setting the "0" values to random noise in Y[w] in step 3, you'd have to experiment a bit to see what works better.

This seems easier (to me at least) than an optimization approach. But truncated y[n] will probably not be equal to your original waveforms. I'm not sure how important that constraint is. I feel like this approach will generate waveforms that sound good.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow