sorry if it's stupid question but it’s third day that I try to figure it out how to do that. And just can’t find the solution.

Let’s say I have array $A$ size 4, and $B$ size 8. And I need to put/spread evenly all 8 values from $B$ to 4 places in $A$.

In that case it’s quite easy to imagine the solution, but let’s say I have $A$ size 44100, and $B$ size 48000. How to express 48000 values of $B$ by 44100 values of $A$.

I know there are a lot of free and ready to use resampling algorithms, by I don’t need to use them. I need to understand mathematical basis of those algorithms. And not only for audio. But for other purposes also. I need it for my study. Could anyone help me?
Best regards.

3 Answers
3

The math is straight forward enough. Let's start with "rational" sampling rate conversion. Rational means the ratio of the two sample rates can be expressed as a fraction of two (preferably small) integers. Example: going from 48 kHz to 40 kHz has a ratio of 5/6 (output/rate over input rate) and the frequency that both divide is 240 kHz

The steps are the following

Up-sample by the numerator to the common divisor frequency. This is done by inserting zeros between the samples of the input signal.

Up-sampling results in periodic repetition of the spectrum in the frequency domain. It creates so-called mirror spectra. In our example we have up sampled from 48 kHz to 240 kHz.

In order to get rid of the mirror spectra you need to lowpass filter the signal. That lowpass filter needs to be chosen so there is no aliasing, i.e. below the smaller of the two Nyquist frequencies.

Downsample the low passed signal. This is simply done by throwing away the samples you don't need. In our example, you would only keep every sixth sample.

This process tends to be expensive: you are running a very steep low-pass filter at a very high sampling rate. There are a few ways to make it a lot more efficient, especially if the lowpass filter is an FIR filter. There are two properties you can exploit:

You don't need to calculate the samples that you are going to throw away anyway

After up-sampling, most of your input samples into the low-pass filter are zero, so you can eliminate these from the convolution calculation

In essence you calculate each output sample by filtering the appropriate input samples with an FIR filter. The FIR filter is time-varying, i.e. it's different from one sample to the next. You can also interpret the whole process as filtering with a time-variant fractional delay: the filter you need is a function of how much time offset there is between a specific output sample and the input sequence. In our example you would need to sample at the times $0,1.2,2.4,3.6,4.8,6.0,7.2 $ etc

The number of different filters you need is the numerator in the ratio fraction. That's okay for 48 kHz to 40 kHz but less convenient for 48 kHz to 44.1 kHz where the integer ratio can only be reduced to 137/160. In many cases the ratio cannot be reasonably expressed as an integer ratio and often it's also time variant (different master clocks). In this case we need to do irrational sample rate conversion.

Irrational sample rate conversion typically also uses a poly phase filter with some reasonable number of phases (say 32 or thereabouts). For each output sample you then calculate the exact phase (or fractional delay) and then linearly interpolate between the two closest phases that you have.

Now comes the tricky part: The design and choice of the low-pass filter needs to be carefully matched to your specific application requirements. We can't use an ideal low-pass filter since it's infinitely long and non-causal. So when designing this filter you need to know things like

What signal to noise ratio are you shooting for?

What overall delay can you tolerate ?

What is the highest "good" frequency you want?

Are you sensitive to phase distortion, phase modulation ?

How much aliasing can you tolerate?

How much pre-ringing is acceptable?

How much CPU and memory can spent on this?

Then you need to feed all of this into the filter design process, which is a non-trivial problem.

Define "quite simple". Resampling can appear quite counterintuitive sometimes, especially in your original example of eight samples being downed to four (if it was a sound wave, it would mean cutting ofg a large share of frequency domain.)

As a rule, interpolation (and PCM sampling as its most extreme case) is an operator applied to a vector such that its result is expected to minimize a quality functional. When speaking of sound wave resampling, the quality optimized is usually how the two waves (the original and resampled) differ in common part of the spectrum which is below the least of their Nyquist freqeuncies (more exactly, how similar their spectrums are if we convert both waves back to analog.) In other words, resampling should not create distortion nor aliasing; additionally, when upsampling, to introduce content above the original Nyquist frequency is undesired as well (at least in the diapason commonly regarded as audible.) To achieve this goal, a typical resampling algorithm involves intermediary upsampling to a high sample rate, and a complicated filtering to introduce as less distortion/aliasing as possible.

There can be other quality functionals as well. E.g. one may need to minimize... Well, one may need to minimize anything, so for instance simple linear interpolation could be the best choice.

$\begingroup$The problem is, there is no exact value. Your choice of values is completely up to what you want to get, each time it's different. And BTW, interpolating formula, when written down in terms of matrix multiplication, usually takes into account much more than one source element per each output sample.$\endgroup$
– bipllJul 6 '18 at 19:25

$\begingroup$So let's say I want to resample audio signal, so how could I calculate those values? Of course I suppose there should be some loop that goes from $0$ to $47999$ (through all $B$ values). But how to dispose those values in $A$, to get best results, and of course avoid distortion or aliasing?$\endgroup$
– pajczurJul 6 '18 at 19:34

$\begingroup$As bipll said, you think about which part of the information in the original spectrum you can still contain in the fewer samples, and cut away the rest. Since you can't do that perfectly, you need to compromise. The quality of that compromise should be measured as said, and the error, however YOU define that, minimized.$\endgroup$
– Marcus MüllerJul 7 '18 at 1:17

thanks for all answers, I need to study them, but in the meantime I written my own answer. Please give me any feedback. And great thanks in advance.

Instead try to understand any commonly known resampling method (for sure I will do that in the future), I go my own contrariwise way: I tried to figure it out on my own in common sense.
I know there are some Lagrange interpolation method, or some method that uses sync function, but all of that seems to be complicated.

And the whole problem seems to be quite easy to manage. Probably I am wrong. But I tried to solve it in common sense.

And I’ve found my own way to make the interpolation. Of course I don’t claim I invented anything new, that nobody knows. I am even not sure if my method is proper.
That’s why I want to ask you, could you tell me about my method to which commonly known method it is closest or similar?
And in which point I could make any advantage?

So let me try to explain it. Please don’t treat it as a ready algorithm, I just try to explain the sense of it.
So let’s say we have input signal X of size 11, and have output Y of size 7.

Let me illustrate it:

So my first think was to graphical shrink (just for better view) input signal X to fit to output Y, so now it looks like that:

So for me it looks like I should share evenly each element from input to output.
For example x0 whole goes to y0.
But x1, some part should go to y0 and other part to y1. Also some some part of x2 should go to y1, other part of x2 to y2.
x3 to y1 and to y2.
x4 to y2 and to y3
And so on…

By “some part” and “other part” I mean exact value calculated from relationship $ \frac { Ysize - 1 } { Xsize - 1 } $. Let’s call it $p$. Of course it’s only base, and for each input I need to calculate separately those “parts”. In that example it’s $ p= \frac { 6 } { 10 } = 0.6 $. Here I am not going to go through all calculations for each input - I think it’s very easy and everybody know how to do that. I calculated already all “parts” and now I can illustrate it like that:

And now I average each output value. One could think “hmm… to y1 go three values, x1, x2 and x3, so I need to divide y1 by 3”. But of course not. There not go three values, but only parts of those values. So I divide:

$ \frac { y0 } { 1+0.4 } $;

$ \frac { y1 } { 0.6+0.8+0.2 } $;

$ \frac { y2 } { 0.2+0.8-0.6 } $

… and so on.

What do you think about that? I think it could be good for frequency representation, for example for audio graph analyser. But I am not sure if it’s OK for for straight audio signal. Could anyone give any comment?
Thanks in advance.