Quick Math Intuitionshttp://quickmathintuitions.org
Sharing quick intuitions for math ideasMon, 14 May 2018 12:16:26 +0000en-UShourly1https://wordpress.org/?v=4.9.9Finding paths of length n in a graphhttp://quickmathintuitions.org/finding-paths-length-n-graph/
http://quickmathintuitions.org/finding-paths-length-n-graph/#respondTue, 06 Mar 2018 09:02:12 +0000http://quickmathintuitions.org/?p=245Suppose you have a non-directed graph, represented through its adjacency matrix. How would you discover how many paths of length link any two nodes? For example, in the graph aside there is one path of length 2 that links nodes A and B (A-D-B). How can this be discovered from its adjacency matrix? It turns … Continue reading "Finding paths of length n in a graph"

]]>Suppose you have a non-directed graph, represented through its adjacency matrix. How would you discover how many paths of length link any two nodes?

For example, in the graph aside there is one path of length 2 that links nodes A and B (A-D-B). How can this be discovered from its adjacency matrix?

It turns out there is a beautiful mathematical way of obtaining this information! Although this is not the way it is used in practice, it is still very nice. In fact, Breadth First Search is used to find paths of any length given a starting node.

PROP. holds the number of paths of length from node to node .

Let’s see how this proposition works. Consider the adjacency matrix of the graph above:

With we should find paths of length 2. So we first need to square the adjacency matrix:

Back to our original question: how to discover that there is only one path of length 2 between nodes A and B? Just look at the value , which is 1 as expected! Another example: , because there are 3 paths that link B with itself: B-A-B, B-D-B and B-E-B.

This will work with any pair of nodes, of course, as well as with any power to get paths of any length.

Why does it work?

Now to the intuition on why this method works. Let’s focus on for the sake of simplicity, and let’s look, again, at paths linking A to B. , which is what we look at, comes from the dot product of the first row with the second column of :

Now, the result is non-zero due to the fourth component, in which both vectors have a 1. Now, let us think what that 1 means in each of them:

– first row -> first node (A) is linked to fourth node (D)

– second column -> second node (B) is linked to fourth node (D)

So overall this means that A and B are both linked to the same intermediate node, they share a node in some sense. Thus we can go from A to B in two steps: going through their common node.

The same intuition will work for longer paths: when two dot products agree on some component, it means that those two nodes are both linked to another common node. For paths of length three, for example, instead of thinking in terms of two nodes, think in terms of paths of length 2 linked to other nodes: when there is a node in common between a 2-path and another node, it means there is a 3-path!

]]>http://quickmathintuitions.org/finding-paths-length-n-graph/feed/0On the relationship between L^p spaces and C_c functions for p = infinityhttp://quickmathintuitions.org/on-the-relationship-between-lp-spaces-and-c_c-functions-for-p-infinity/
http://quickmathintuitions.org/on-the-relationship-between-lp-spaces-and-c_c-functions-for-p-infinity/#respondWed, 05 Jul 2017 11:10:24 +0000http://quickmathintuitions.org/?p=219Very quick post on the relationship between , and . I will assume you already know what I am talking about, I’ll just be sharing some intuition on what those mean, but won’t bother with details. It’s more a reminder for me rather than something that intends to be useful, actually, but there’s almost nothing on … Continue reading "On the relationship between L^p spaces and C_c functions for p = infinity"

]]>Very quick post on the relationship between , and . I will assume you already know what I am talking about, I’ll just be sharing some intuition on what those mean, but won’t bother with details. It’s more a reminder for me rather than something that intends to be useful, actually, but there’s almost nothing on the Internet about this!

When we discover that (continuous functions with compact support) is dense in , we also discover that it does not hold if and .

What that intuitively means is that if you take away functions in from , you take away something fundamental for : you are somehow taking away a net that keeps the ceiling up.

The fact that it becomes false for limitless spaces () and means that the functions in do not need functions in to survive.

This is reasonable: functions in are not required to exist only in a specific (compact) region of space, whereas functions in do. Functions in are simply bounded – their image keeps below some value, but can go however far they want in x direction. Very roughly speaking, they have a limit on their height, but not on their width.

What we find out, however, is that the following chain of inclusions holds:

That’s reasonable! Think about it:

Functions in live in a well defined area of space – a confined area of space.

Functions in are allowed to live everywhere, with the constraint that they become more and more negligible the farther and farther we go. Not required to ever be zero though.

Functions in are simply required to have an upper bound (a finite one, obviously).

I’m not saying this is simple (advanced analysis is at least as difficult as pitching a nail with a needle as hammer), but after careful thinking, it’s just the way it should be, given the definitions.

]]>http://quickmathintuitions.org/on-the-relationship-between-lp-spaces-and-c_c-functions-for-p-infinity/feed/0The meaning of F Value in the Analysis of Variance for Linear regressionhttp://quickmathintuitions.org/meaning-f-value-analysis-variance-linear-regression/
http://quickmathintuitions.org/meaning-f-value-analysis-variance-linear-regression/#respondMon, 05 Jun 2017 09:36:02 +0000http://quickmathintuitions.org/?p=208This is a sample output for linear regression: The F Value is computed by dividing the value in the Mean Square column for Model with the value in the Mean Square column for Error. In our example, it’s . There are two possible interpretations for the F Value in the Analysis of Variance table for … Continue reading "The meaning of F Value in the Analysis of Variance for Linear regression"

The F Value is computed by dividing the value in the Mean Square column for Model with the value in the Mean Square column for Error. In our example, it’s .

There are two possible interpretations for the F Value in the Analysis of Variance table for the linear regression.

We are comparing the variances of the model and of the error.
The two factors represent each the numerator of the variance of the model and of the error. What do we want? The only hypothesis of the linear regression model is that is a normal variable with zero mean. Thus we want a small variance for the error, so we can say the errors are close to zero.

We are comparing the model with all the variables with the model with only the intercept as variable.

This ambiguity exists because can either be seen as the numerator of the variance of , or as a comparison between the complete model and the reduced model in which only the intercept is used.

]]>http://quickmathintuitions.org/meaning-f-value-analysis-variance-linear-regression/feed/0On the meaning of hypothesis and p-value in statistical hypothesis testinghttp://quickmathintuitions.org/meaning-hypothesis-p-value-statistical-hypothesis-testing/
http://quickmathintuitions.org/meaning-hypothesis-p-value-statistical-hypothesis-testing/#respondThu, 01 Jun 2017 09:36:29 +0000http://quickmathintuitions.org/?p=194Statistical hypothesis testing is really an interesting topic. I’ll just briefly sum up what statistical hypothesis testing is about, and what you do to test an hypothesis, but will assume you are already familiar with it, so that I can quickly cover a couple of A-HAs moments I had. In statistical hypothesis testing, we have … Continue reading "On the meaning of hypothesis and p-value in statistical hypothesis testing"

]]>Statistical hypothesis testing is really an interesting topic. I’ll just briefly sum up what statistical hypothesis testing is about, and what you do to test an hypothesis, but will assume you are already familiar with it, so that I can quickly cover a couple of A-HAs moments I had.

In statistical hypothesis testing, we

have some data, whatever it is, which we imagine as being values of some random variable;

make an hypothesis about the data, such as that the expected value of the random variable is ;

find a distribution for any affine transformation of the random variable we are making inference about – this is the test statistic;

run the test, i.e. numerically say how much probable how observations were in relation to the hypothesis we made.

I had a couple of A-HA moments I’d like to share.

There is a reason why this is called hypothesis testing and not hypothesis choice. There are indeed two hypothesis, the null and the alternative hypothesis. However, their roles are widely different! 90% of what we do, both from a conceptual and a numerical point of view, has to do with the null hypothesis. They really are not symmetric. The question we are asking is “With the data I have, am I certain enough my null hypothesis no longer stands?” not at all “With the data I have, which of the two hypothesis is better?”

In fact, the alternative hypothesis is only relevant in determining what kind of alternative we have: whether it’s one-sided (and which side) or two-sided. This affects calculations. But other than that, the math doesn’t really care about the specific value of the alternative. In other words, the two following test are really equivalent:

This accounts for why, when evaluating a p-value, we refuse the null hypothesis only for very low figures. The way I first thought about it had been: “Well, the p-value is, intuitively, a measure of the proximity of the observed data to the null hypothesis. Then, if I get something around , I should refuse the null hypothesis and switch to the alternative, as it seems a better theory.” But this is a flawed argument indeed. To see if the alternative was really better I should run a test using it as principal hypothesis! We refuse for very low p-values because that means we null hypothesis really isn’t any more good, and should be thrown to the bin. Then we need to care about finding another good theory that can suit the data.

However, before throwing the current theory out of the window, we don’t accept all kinds of evidence against it: we want a very strong evidence. We don’t want to discard the current theory for another that could only be marginally better. It must be crushingly better!

]]>http://quickmathintuitions.org/meaning-hypothesis-p-value-statistical-hypothesis-testing/feed/0Why hash tables should use a prime-number sizehttp://quickmathintuitions.org/why-hash-tables-should-use-prime-number-size/
http://quickmathintuitions.org/why-hash-tables-should-use-prime-number-size/#respondThu, 01 Jun 2017 08:57:36 +0000http://quickmathintuitions.org/?p=191I read in several books and online pages that hash tables should use a prime number for the size. Nobody really justified this statement properly. Here’s my attempt! I believe that it just has to do with the fact that computers work with in base 2. Just think at how the same thing works for … Continue reading "Why hash tables should use a prime-number size"

]]>I read in several books and online pages that hash tables should use a prime number for the size. Nobody really justified this statement properly. Here’s my attempt!

I believe that it just has to do with the fact that computers work with in base 2. Just think at how the same thing works for base 10:

8 % 10 = 8

18 % 10 = 8

87865378 % 10 = 8

2387762348 % 10 = 8

It doesn’t matter what the number is: as long as it ends with 8, its modulo 10 will be 8. You could pick a huge power of 10 as modulo operator, such as 10^k (with k > 10, let’s say), but

you would need a huge table to store the values

the hash function is still pretty stupid: it just trims the number retaining only the first k digits starting from the right.

However, if you pick a different number as modulo operator, such as 12, then things are different:

8 % 12 = 8

18 % 12 = 6

87865378 % 12 = 10

2387762348 % 12 = 8

We still have a collision, but the pattern becomes more complicated, and the collision is just due to the fact that 12 is still a small number.

Picking a big enough, non-power-of-two number will make sure the hash function really is a function of all the input bits, rather than a subset of them.

For example, with 367:

8 % 367 = 8

18 % 367 = 18

87865378 % 367 = 73

2387762348 % 367 = 240

What is worth nothing is that there may be a pattern even with modulo 367, but it would be way less trivial than with modulo 10 (or with modulo 2 in binary). We don’t really need a prime number, just having a big non-power of two is enough. Having a prime number, obviously, is just a guaranteed way of satisfying those conditions.

]]>http://quickmathintuitions.org/why-hash-tables-should-use-prime-number-size/feed/0Metaphysics on geometric distribution in probability theoryhttp://quickmathintuitions.org/metaphysics-on-geometric-distribution-probability-theory/
http://quickmathintuitions.org/metaphysics-on-geometric-distribution-probability-theory/#respondTue, 31 Jan 2017 11:38:33 +0000http://quickmathintuitions.org/?p=180I realized geometric distribution is not exactly about the time needed to get the first success in a given number of trials. This is a very odd feeling. It is probably a feeling applied mathematicians get sometimes, when they feel they are doing the best they can, and yet the theory is not perfect. This … Continue reading "Metaphysics on geometric distribution in probability theory"

]]>I realized geometric distribution is not exactly about the time needed to get the first success in a given number of trials. This is a very odd feeling. It is probably a feeling applied mathematicians get sometimes, when they feel they are doing the best they can, and yet the theory is not perfect.

This may be a naive post, I warn you, but I was really stunned when I realized this.

Geometric distribution is not about the first success

Let’s jump to the point. We know (or at least, I was taught) that geometric distribution is used to calculate the probability that the first success in trials (all independent and of probability ) will happen precisely at the -th trial.

Remember that a geometric distribution is a random variable such that its distribution is

How can we relate the above distribution with the fact that it matches the first success? Well, we need to have one success, which explains the at the bottom. Moreover, we want to have just one success, so all other trials must be unsuccessful, which explains the .

But hey, where would first ever be written? Unless you do probability in a non-commutative ring (in which case, I don’t know what you are doing), multiplication is commutative. So who can tell the order between the events in a Bernoulli process?

In fact, could just as well refer to having unsuccessful outcomes for the first trials and then a successful one at the -th trial, as to having a success in the very first attempt and then all failures. As it is, as long as we have one (and only one) success among the attempts, the geometric distribution holds!

Apparently then, geometric distribution is about the time of first success, but it is not just about that. It encompasses way more cases, all equally likely. Geometric distribution allows to calculate exactly one success will happen in trials in a Bernoulli process.

The universe does not care about the order of events (in a Bernoulli process, at least). As long as we do trials, regardless of when the success happens, the universe does not care. This stuns me!

]]>http://quickmathintuitions.org/metaphysics-on-geometric-distribution-probability-theory/feed/0Random variables: what are they and why are they needed?http://quickmathintuitions.org/random-variables-why-are-they-needed/
http://quickmathintuitions.org/random-variables-why-are-they-needed/#respondThu, 24 Nov 2016 20:02:11 +0000http://quickmathintuitions.org/?p=156This article aims at providing some intuition for what random variables are and why random variables are useful and needed in probability theory. Intuition for random variables Informally speaking, random variables encode questions about the world in a numerical way. How many heads can I get if I flip a coin 3 times? How many people … Continue reading "Random variables: what are they and why are they needed?"

]]>This article aims at providing some intuition for what random variables are and why random variables are useful and needed in probability theory.

Intuition for random variables

Informally speaking, random variables encode questions about the world in a numerical way.

How many heads can I get if I flip a coin 3 times?

How many people will vote the Democrats at the US presidential elections?

I want to make pizza. What is the possible overall cost of the ingredients, considering all combinations of different brands of them?

These are all examples of random variables. What a random variable does, in plain words, is to take a set of possible world configurations and group them to a number. What I mean when I say world configurations will be clearer soon, when talking about the sample space (which, appropriately, is also called universe).

I just wanted to provide a very brief informal description of random variables, but stick with me and we will dive deeper in the matter with an example!

A simple random variable example

Suppose to flip a (balanced) two-headed coin three times. If we write down all possible outcomes, we obtain the universe (or sample space) :

In which we have identified head with H and tail with T. The first element corresponds to three heads, the following three elements correspond to two heads, the following three more correspond to one head, and the last one to no heads.

Let’s take a second to notice that is made up of items.

Now, what if I asked you how many heads you can get overall by flipping three times a coin? You would answer me by exhibiting the following set (who wouldn’t reply exhibiting a set, really!):

Notice that is made up of only 4 elements, whereas had 8: we have reduced the amount of data to handle. (Also, was made up of more complex data, because each of its 8 elements was made up of 3 letters.)

And lo! We have stumbled upon a random variable. We had a universe of possible configurations and, passing through a question, we have mapped them in a numerical way that’s relevant for our question. This is crucial, so I will say it once again: from , which contained a lot more information than we needed, we managed to extract the part of the data that was relevant to our study.

In a way, every time you study a phenomenon through some data, you are always using random variables to do it, because you only look at the data that’s relevant and ignore what’s not important for you at that moment. In our case, for example, we don’t care in what order the heads came, we just want two of them.

Of course, we can ask a variety of questions about the same phenomenon. In the case of the 3-coins-flipping, apart from “How many heads could we get?” we could also ask “How many tails could we get?”. It was a trivial phenomenon so there’s not much we can study about it, but try to think about a medical experimentation: there is a lot of data and several questions can be asked about it.

Why is a random variable useful?

At this point, a random variable just seems like a very useful concept, but one could argue that reducing the amount of data is not a good enough reason to introduce a new idea.

But random variables are defined in probability theory, so they must have something to do with probabilities! Imagine we were interested in the following question “What’s the likelihood of getting 2 heads (flipping a balanced coin 3 times)?”. What is beautiful about random variables is that they work in perfect tune with the probability measure we have on !

As long as we talk about discrete cases (meaning numbers are integer: we cannot get 1.5 heads), it may look like the concept of a random variable is superfluous, because we could always go look at and see how many cases satisfy our question and how many do not. However, this is impractical for huge amounts of data, not to mention the fact that more often than not the universe is not even explicitly known. But most importantly, random variables are essential when dealing with continuous quantities and, above all, when asking more complex questions (which may involve combinations of more than one variable, for example).

]]>This post aims at providing some intuition and meaning for the following algebra relationship:

Reduced ring – Radical ideal – Nilpotent

Reduced ring – Radical ideal – Nilpotent

A basic fact of ring theory is that if you take a ring and quotient it for a (double-sided) radical ideal you get a reduced ring. Let us suppose A is a commutative ring and understand why this fact is true.

Nilpotent elementDef. is nilpotent

Informally, a nilpotent element is like a road ending in the middle of nowhere, collapsing in the depth of an abyss. You are driving on it, following the powers of , and then all of a sudden, with no explanation, your road ends in a big black hole. Indeed, the zero really acts as some kind of black hole, attracting nilpotent-made roads at some point or another: we can think of nilpotent roads as spiraling into the zero.

Reduced ringDef. is a reduced ring if the only nilpotent is .

With the road-analogy, we can think of a reduced ring as a city where all roads lead somewhere and never end in a giant hole. We can see how desirable it is to have a reduced ring rather than a non-reduced one, because it is not nice to pick a road and end up in the rabbit hole unexpectedly. However, it is worth noting that there still is one hole corresponding to the zero element, but this is not exactly a road since it does not even start, let alone have the intention to bring you anywhere.

The way I imagine a reduced ring is like a big hole in the middle of a city, with roads going around in circles or in straight lines crossing the city, but never getting through the big hole in the center.

Given the premises, we now ask two questions:

Given a quotient ring, is there any way can we say it is reduced?

Given a ring, is there any way we can get rid of its nilpotents? If yes, what’s the best way to do that?

1. Reduced property for quotient rings

To inspect whether a quotient ring is reduced or not, it is possible to inspect the ideal that was used to form the quotient [1]. This is useful when dealing with a quotient which you know the genesis of: if you know what ideal was used to quotient what ring, then it’s easier to inspect the ideal properties rather than the quotient ones, which are usually difficult to deal with.

Now, the proof of the theorem stating that a radical ideal gives rise to a reduced ring is quite straightforward, but the intuitive reason why it happened eluded me at first. Let me share my intuitions.

Radical idealDef. is a radical ideal

Or, in words, if, taken an element in , the presence of some power of the element in guarantees its presence in . You could also see a radical ideal as containing the root of all its elements.

The reason why using a radical ideal to form a quotient gives a reduced ring as result is actually quite straightforward to the point that it is wonderful. As Hamed points out, every ideal contains zero! But zero is a power of any nilpotent element (look again at the definition, there must be a power of for which is zero), so indeed there always is at least a power of each nilpotent element in a radical ideal, because all nilpotent elements turn to zero at some point. But thanks to the radical ideal definition, we know that if some power of an element is in the ideal, so does its base!

So we have an ideal which, for sure, contains at least all nilpotent elements. Thus, when we form a quotient with that ideal, we are identifying all its elements with zero. That’s why nilpotents vanish with a radical ideal, and I find it amazing that it all comes from the fact that all ideals contain zero and, of course, from the definition of radical ideal.

The question one may ask is: right, but are we getting rid of nilpotents only? Isn’t there the risk of affecting non-nilpotent elements? And indeed, it is true than being a radical ideal guarantees that the quotient is reduced, but it doesn’t guarantee that we have got rid of the nilpotent elements only, and some innocent element of ring has not been destroyed in our zeal of building the perfect city. In other words, we may be using a bazooka to shoot a fly! Let’s see if we can refine our doings and come up with a good way of doing this.

2. Build a reduced ring from an ordinary one

We have got to our second question: given an ordinary ideal , are we capable of building another ideal that is reduced and yet as similar as possible to (i.e. doing the smallest damage possible to )?

Yes we can, and, surprisingly, we already have roughly all ne need. Only thing we are lacking is the definition of the radical of an ideal.

Radical of an idealDef. Given an ideal, then its radical is

From what we have said earlier, we will need an ideal which (at least) contains all nilpotents. It turns out that taking the radical of the ideal does the job!

In fact, is exactly what we are looking for! is a ring which is as close to as we can get, and yet does not have any nilpotent elements!

Getting back to the city-road-holes analogy, it seems like we are able to make the nilpotent spiral-made roads collapse into the zero, thus destroying that fake road!

I’ve detailed the intuitions on the reasons why a ring modulo a reduced ideal gives a domain[2] and (some bit) why a ring modulo a maximal ideal gives a field on math.stackexchange.com.

Footnotes

1. Inspecting a quotient properties by looking at the ideal used to quotient doesn’t seem very interesting to me, as I can’t see a valid real-world reason to inspect the properties of a quotient (and even if there were, I don’t believe the ideal used to quotient would be known explicitly). I don’t know, it all seems done just to create exercises to solve in exams, so the second question looks much more interesting for me.2. It’s worth noting that in this case, there is not a unique, best choice for the ideal that will build the domain ring. For example, in , both and are good choices (in fact, equivalent choices).

]]>http://quickmathintuitions.org/relationship-between-reduced-rings-radical-ideals-and-nilpotent-elements/feed/0Overdetermined and underdetermined systems of equations put simplyhttp://quickmathintuitions.org/intuition-for-overdetermined-and-underdetermined-systems-of-equations/
http://quickmathintuitions.org/intuition-for-overdetermined-and-underdetermined-systems-of-equations/#respondFri, 08 Jul 2016 13:14:32 +0000http://quickmathintuitions.org/?p=37This article aims at providing real world examples and links for overdetermined and undetermined systems of equations. Before starting, we will suppose that all over and underdetermined systems are obtained from square systems which admit one and only one solution (i.e. comes from a coefficient matrix with non zero determinant). Overdetermined systems When a system … Continue reading "Overdetermined and underdetermined systems of equations put simply"

]]>This article aims at providing real world examples and links for overdetermined and undetermined systems of equations. Before starting, we will suppose that all over and underdetermined systems are obtained from square systems which admit one and only one solution (i.e. comes from a coefficient matrix with non zero determinant).

Overdetermined systems

When a system of linear equations has more equations than unknowns, we say it is overdetermined. It means what it says: too many rules at once are being imposed, and some of them may be conflicting. Still, it is false to say that an overdetermined system does not have any solutions: it may or it may not.

Intuition

Intuitively, we can think of a system of equations as a set of requests. Imagine you have a group of people in front of you (the unknowns), and you are supposed to give each person something specific to do. If you give more commands than the number of people, then we have an overdetermined system. It is clear that when this happens, at least one person must have received more than one command. However, there are two possible scenarios.

If you give more orders than people, but the surplus commands are just reformulations of other orders, then this is not a problem, the system does have a solution.

Take this example:

George, get me one bottle of water

Lisa, solve that equation

George, get me two-minus-one bottles of water

There are two people, and we have given three commands. But look, we told George to fetch a water bottle twice! We told him a bit differently the second time, but the meaning was just the same. Thus, we can say the last command is irrelevant. So the idea is that an equation that is proportional to another one already in the system is basically like a game in which the second rule is to respect the first rule!

So if the system had a solution when there were as many commands as people, it still does now, because we can throw the superfluous commands out of the window, since they are unneeded.

Instead, if you give more orders than people, and some commands are conflicting, then the system does not have a solution.

Take this example:

George, get me one bottle of water

Lisa, solve that equation

George, go buy me a swimming pool

George is likely to be confused, and will ask what we want him to do, either to get a water bottle or a swimming pool. The point is that we want George to do both things at the same time. The fact that it is impossible for George to do two things at once expresses the fact that the corresponding system of equations does not have a solution.

In this case, it is just like having a game where the second rule says not to follow the first rule: it is impossible to play a game like that.

Math Example: the simplest overdetermined system is

which doesn’t have a solution because we are asking to be and at the same time. This is just like asking your neighbor to be male and female at the same time!

Underdetermined systems

If you give less orders than number of people, then we have an underdetermined system. When this happens, at least one person must have not received any command. This time, the idea is that people who don’t receive any commands are free to do whatever they want.

For example, imagine to have George, Lisa, Bob and Alice in front of you. If your commands are:

George, get me one bottle of water

Lisa, solve that equation

Alice, build a car with those Lego

then Bob hasn’t received any command, and will assume he is free to do whatever he wants.

Final remarks

However, notice that everything can happen: what appears to be an overdetermined system could actually turn out to be an underdetermined one (see ex. 1) and an overdetermined system could have no solution (see ex. 2).

in An apparently overdetermined system which is actually underdetermined and doesn’t even have a solution.

in An overdetermined which doesn’t have a solution.

Finally, notice that in reality commands are usually addressed to more than one person at a time. A system of equations in real life is something like:

Here intuition gets trickier, because each command mixes at least two people, and can’t be rendered in natural language. (You can think of knowing what two people should do, but can’t know exactly who should do what without additional information, which is indeed carried by the mathematical expressions.)

Still, the commands analogy is useful in understanding what underdetermined and overdetermined systems are and why they have infinite or no solutions.

]]>http://quickmathintuitions.org/intuition-for-overdetermined-and-underdetermined-systems-of-equations/feed/0Quick method to find line of shortest distance for skew lineshttp://quickmathintuitions.org/quick-method-to-find-line-of-shortest-distance-for-skew-lines/
http://quickmathintuitions.org/quick-method-to-find-line-of-shortest-distance-for-skew-lines/#commentsSun, 26 Jun 2016 05:48:55 +0000http://quickmathintuitions.org/?p=6In linear algebra it is sometimes needed to find the equation of the line of shortest distance for two skew lines. What follows is a very quick method of finding that line. Let’s consider an example. Start with two simple skew lines: (Observation: don’t make the mistake of using the same parameter for both lines. … Continue reading "Quick method to find line of shortest distance for skew lines"

]]>In linear algebra it is sometimes needed to find the equation of the line of shortest distance for two skew lines. What follows is a very quick method of finding that line.

Let’s consider an example. Start with two simple skew lines:

(Observation: don’t make the mistake of using the same parameter for both lines. Each lines exist on its own, there’s no link between them, so there’s no reason why they should should be described by the same parameter. If this doesn’t seem convincing, get two lines you know to be intersecting, use the same parameter for both and try to find the intersection point.)

The directional vectors are:

So they clearly aren’t parallel. They aren’t incidental as well, because the only possible intersection point is for , but when , is at , which doesn’t belong to . It does indeed make sense to look for the line of shortest distance between the two, confident that we will find a non-zero result.

The idea is to consider the vector linking the two lines in their generic points and then force the perpendicularity with both lines.We will call the line of shortest distance . In our case, the vector between the generic points is (obtained as difference from the generic points of the two lines in their parametric form):

Imposing perpendicularity gives us:

Solving the two simultaneous linear equations we obtain as solution .

This solution allows us to quickly get three results:

The equation of the line of shortest distance between the two skew lines: just replace and in with the values found. In our case, .

The intersection point between and : just replace in the parametric equation of . In our case, .

The intersection point between and : just replace in the parametric equation of . In our case, .