To mitigate the problem, we can try a completely different way of searching for the right set of parameters, not based on either least squares or gradient descent: the Theil-Sen estimator. This non-parametric method is very simple: set the slope \(m\) of the model as the median of the slopes resulting from each pair of training points (i.e. \((y_j − y_i)/(x_j − x_i))\) for every pair \((i, j)\)). Once set, find the intercept \(b\) in a similar way, as the median of \(y_i − m x_i\) for every \(i\). The results show that the solution obtained is really more robust, as it's less sensitive to outliers. The naive version of this algorithm, which is used here, while nice because it holds in two lines of Python, wouldn't work very well however on a bigger dataset, because of its quadratic running time (due to the fact that we're enumerating the \({{n}\choose{2}}\) combinations). There are however more efficient algorithms, based on sorting, that can do it in \(O(n \log n)\).