need an algorithm for continuity check: select a list of integer number to have best "coverage"

let's say there is a sorted list of integers L1:
1. the total length of the list is known which is N (e.g. N could be 1e7)
2. all the elements are between two known boundaries, A and B , ( A <<< B )
e.g. L1 = [ 2,5,10,15,18,19,21...]

Right now, I need to select a subset of the elements from the list L1 to form a new list L2 with the total length of M (M < N)
(e.g. M could equal N /10 )

to satisfy a condition: the new list L2 needs to have best "coverage";
by "coverage", it means that all the elements integers in the L2 need to be distributed in the L1's range, [A,B], as equally as possible.
(a.k.a an unbiased sub-sampling method )

any help is deeply appreciated.

thanks for everyone's help and idea. I try to simplify the problem so that everyone (without the background knowledge can understand the problem). To define a rule for goodness of the coverage:

the ultimate goal is to achieve:

in the list L2, for any two neighbor element J and K, there are | J - K | , and the sum of this difference needs to be minimized

apply a given window with the total length of Q ( Q < M ) to list L2, and the number of elements within the window needs to be either equal (ideal situation) or almost equal

thanks

1 answer

My idea is to utilize bucket sort of which bucket size is (A - B) / M. After mapping each element in l1 to its corresponding bucket, pick element randomly from each bucket to from the new list. If the new list is shorter than m, then I repeat the process. The following is my implementation in Python:

I understand that we are constructing a max-heap using the same array (which we have to sort) so there is no extra memory required there (from here).

But my question is what about heapify function (which is recursive when I follow Introduction to Algorithms, CLRS) which we use while building the heap (and also during swapping the first element of heap - that is the biggest and the last element of the heap and then again re-heapifying the array).

The call to this recursive function can, in worst case take O(log(n)) because of storing the stack pointer on each recursive call.

So, my question is that when this wiki article says that heapsort is an inplace sorting algorithm, does it implicitly mean that the heapify function that it is assuming is non-recursive one (which will not take any (non-constant) extra space). Or is there something that I'm missing?

Will the space complexity of heapsort, which uses a recursive heapify function, be O(log(n))?

Hello I need resolve a problem in c++.
I need find the shorter path in a matrix for example:

0100
1110
1011
1010

The start position is 0,0 and the finish position is 3,3, for this example the answer correct is 2.
The algorithm can move right, left, up and down.
I try resolve this problem with a DFS in a recursive function but i don't get the correct answer.
I don't know how indicate the wath is already visited.
My code is:

The matrix are NXN, int N, int currentSmallCost, char map[N][N] are global variables and the function bool isValidLocation(); only check the algorithm doesn't out of the dimensions and the current cost is not more big to the currentSmallCost.
If don't mark the current position i get a stackoverflow
I wish you can help me with a suggestion how to indicate the path for each possible solution or other possible solution for this problem.
Thanks.

I've been assigned the task to create a dashboard based on Accuracy for our planners. Some of them have around ~2000 SKUs, while others have only ~100, which makes the flat accuracy measure some kind of unfailr, based on the hypothesis of having less SKUs will allow said planners to have more time to dedicate to a more deep planning.
After several discussions, I've been asked to add some weight into the formulae, but haven't been able to do so.

I try to replicate this figure with the true underlying function given also there (see also code below).

I was wondering how the author came up with this (at first glance easy to replicate) figure. If I look e.g. at the first component of (11) f(X_1) = 8*sin(X_1) I cannot see how the author obtains the corresponding graph which has negative function values (As far as I understand the paper, the domain of the X's take values in the range of 0 to 3). Same confusion about the last linear component.

If I use beta = array([[1],[2],[3]]), which is a (3,1), and np.dot(X, beta) gets me a wrong answer, although the dimension seems to work.
If I use array([[1,2,3]]), which is a row vector, the dimension doesn't match for dot product in numpy, neither in linear algebra.

So, I am wondering why for a NxK dot Kx1 numpy dot product, we have to use a (N,K) dot (K,) instead of (N,K) dot (K,1) matrices. What operation makes only np.array([1, 0.1, 10]) works for numpy.dot() while np.array([[1], [0.1], [10]]) doesn't.

Thank you very much.

Some update

Sorry about the confusion, the codes in Statsmodels are randomly generated so I tried to fix the X and get the following input:

However, for 3D there can be shearing on multiple planes at once; for example XY, XZ, and YZ. While I could express each of those with rotation,scale,rotation, that would be a total of 6 rotations and 3 scaling operations. I have an intuition that all the shearing can be handled at once with just one rotation, non-uniform scale, and rotation, but the math involved is over my head.

I'm not sure what constitutes shearing versus rotation when looking at an arbitrary affine matrix (I think there's infinite solutions for how this is split up?) so I guess solving the issue for "arbitrary sharing along multiple planes" is the same as solving for just affine matrices (without translation) in general. Either way, anything that can help me along the way is appreciated.

In Matlab I have a real and symmetric n x n matrix A, where n > 6000. Even though A is positive definite it is close to singular. A goes from being positive definite to singular to indefinite for a particular variable which is changed. I am to determine when A becomes singular. I don't trust the determinants so I am looking at the eigenvalues, but I don't have the memory (or time) to calculate all n eigenvalues, and I am only interested in the smallest - and in particular when it changes sign from positive to negative. I've tried

D = eigs(A,1,'smallestabs')

by which I lose the sign of the eigenvalue, and by

D = eigs(A,1,'smallestreal')

Matlab cannot get the lowest eigenvalue to converge. Then I've tried defining a shift value like

for i = 1:10
if i == 1
D(i) = eigs(A,1,0)
else
D(i) = eigs(A,1,D(i-1))
end
end

where i look in the range of the last lowest eigenvalue. However, the eigenvalues seem to behave oddly, and I am not sure if I actually find the true lowest one.

So, any ideas on how to

without doubt find the smallest eigenvalue with 'eigs', or

by another way determine when A becomes singular (when changing a variable in A)