Listening to Data

Menu

More fallacies in data: inequality of income and employment

This is a continuation of my last post that provides an introduction that applies equally here. Good clean and properly governed data in our data stores may suffer from something comparable to a fallacies identified in grammar, logic, or rhetoric in classical trivium education.

The trivium focuses on valid forms of persuasion for the purposes of debating policies and where the persuasion (or rhetoric) employs valid forms of logic using grammatically correct statements. I suggested that a dedomenocracy that uses statistical analytics on trusted data to automate policy decisions needs a similar approach challenge the content (or meaning) of data . Instead of fallacies in different levels of human languages, there are fallacies in different levels of information content in data. The project of seeking out such data fallacies corresponds to my own concept of the profession of data science.

My last post provided examples that I describe as grammatical in nature where modern uses of terms no longer have same meaning as older uses of the same terms. The fallacy is including both usages occur in the same data set as will happen with data warehouses, data lakes, or other long term storage of data. Over time the same label values of a field may have mutually exclusive meanings.

In an earlier post, I labeled as spark data the data deliberately introduced for the intended purpose of distraction away from the more important and solvable problems (where the solutions will be painful). I describe this kind of data comparable to a rhetorical fallacy. The data is valid data and the meaning is unambiguous, but they are irrelevant to the project of government because they have no solutions and they distract attention away from more pressing issues that actually have real solutions we can adopt. As with classical rhetorical fallacies, these data fallacies are deliberately introduced for unfair manipulation of the government.

To round out my analogy, I want to identify an analogy to logical fallacy in data content. I think my earlier discussions of income inequality and workforce participation may be described as logical fallacies. Both cases infer an label by considering the negative of some data. Income inequality is the conclusion that lower incomes can be higher because of evidence that some people are making a lot more money. Similarly, low workforce participation is the conclusion that these people can be employed because of evidence that other people have jobs. We can derive quantities of lower incomes or of non-participating workers as the remainder of the population after subtracting those with higher salaries or with jobs respectively. The fallacy is the assertion that this remainder population has a homogeneous and consistent meaning, in particular that these are always worse than the alternative.

Certainly, we can have labels for the opposite of well-compensated or fully-employed. The problem comes when we assign meaning to these negative labels. All we really know is that they are not in the positive category. There may be many reasons to be in the negative category and there there may be many consequences. The problem with the single negative label is that demands a definition or an interpretation. We readily accept that poorly-compensated workers or non-working adults are in need of more money or more employment.

This assumption seems to be a syllogism fallacy (such as the illicit major). For example

People with high incomes have good lives

People with low incomes do not have high incomes

Thus, people with low incomes do not have good lives

When we identify a population with a label of low incomes we imply that their lives would be better if they had higher incomes. This meaning is similar to the above syllogistic fallacy of the illicit major. While there is no doubt that many poor people would desire higher incomes, there are many who choose lower incomes because of some other benefit they get from the jobs. The jobs may be less demanding, or may involve the kind of work they find more enjoyable.

Similar observation applies to the working-age population that is not participating in the workforce. All we know is that they are not working. We do not know anything about their condition other than the fact that they are not working (at least not in a way that we can measure).

In both examples, we can observe historic trends where in the recent past these populations were fewer in number. The trend suggests that more people would like higher pay (relative to the rest of the population) or more people would like jobs. However, there could be some other development where more recent culture has made attractive new opportunities for people to seek lower incomes or to avoid work obligations.

Public policy debates address the negative labels instead of the positive ones. For the employment example, the policy focuses on the negative group of those who do not have jobs. This negative group includes under-employed, unemployed between jobs, long-term unemployed, discouraged job-seekers, and people who have no interest in working at all (for any number of reasons).

We are motivated to pursue job-creating policies in order to benefit the jobless. It is not clear that this is legitimate policy goal because all we know about the jobless is that they are not in the group that has jobs.

For an illustration, a frequently proposed job-creation policy involves funding infrastructure projects that will create new (though temporary) demand for labor. However, these jobs will require trade skills that may be very rare in the ranks of the not-employed, and very few in this group are eager to pursue these skills in time to take advantage of the new opportunity.

Another major issue with the non-workers is that they reside in locations that are too remote from suitable jobs. They remain out of the workforce because the jobs are not going to come to them, and they are not going to go to the jobs. This is especially true in rural areas with low population density. While some policy may be possible to make work projects locally, these are less likely to provide lasting economic benefits compared with building or repairing fixed-location highways, bridges, tunnels, or pipelines. Unless the jobs do show up locally, these people are not going out of their way for jobs no matter how plentiful they become.

The negative label in data (poorly compensated, or not employed) represents a catchall label for all other possibilities. The negative label is “none of the above”. All we can say about the label is that it is not one of the positive labels. The individuals in the negative label category may prefer a more positive label for their status but none of the available positive labels fit. The fallacy is that the individuals in the negative labeled category need to move into one of the known positive labels: low-income people want higher incomes, not employed want jobs. The reality may be that they would prefer that their current conditions be recognized accurately with new positive labels that do fit.

I’ll end this discussion here because I need some more time to develop a concept of logical fallacy lurking in content of data. The grammar-level or rhetorical-level fallacies seem more obvious to me than logical-level fallacies. I think this is a result of not thinking about it enough. The logical fallacies involve the interpretation of negative labels. Negative labels occur frequently and often become the targets of policy. I need more time to think about my own experiences with negative labels and how they interfere with decision-making.

Post navigation

3 thoughts on “More fallacies in data: inequality of income and employment”

Making rounds today is a statement from Gallup polls describing how misleading the unofficial unemployment rate is. He is arguing that the actual rate is much higher

Gallup defines a good job as 30+ hours per week for an organization that provides a regular paycheck. Right now, the U.S. is delivering at a staggeringly low rate of 44%, which is the number of full-time jobs as a percent of the adult population, 18 years and older. We need that to be 50% and a bare minimum of 10 million new, good jobs to replenish America’s middle class.

I have no problem with with the official rate because it does have meaning in terms of employment. The rate is reflecting the number of people who believe they they have a chance at a job but are under-served by the employment market. There are certainly other ways to measure frustration with the job market. I think they all suffer from the same fallacy of defining a negative: lumping a group of people who do not have some quality.

The positive measure is the employed person who has found an acceptable job. This is easy to measure because is resolves the question of intention. The negative measure is everyone else and no matter how it is measured it is going to collect into one category a wide range of prospects and intentions. No negative definition is going to be satisfying for making policy. I argue one step further that the negative category itself is a data fallacy. The negative category equates to “the people whose work prospects we don’t understand”.

I belong to the negative category of working-age people who do not have a job. I started this blog as a distraction from job-searching so I lack the active job-search that qualifies as unemployment. That places me in the non-participating category I guess. The fallacy (even on a personal level) is defining what kind of job I am lacking. For example, when I started this blog, I called myself a data scientist based on my actual experience but then I learned that what I was doing is not what modern practitioners call data science. It was a fallacy to label myself as an unemployed data scientists because I lacked the positive confirmation that someone would employ me as one. Later, I took to calling myself a dedomenologist that to the best of my knowledge is a job category that does no exist. Certainly, no one seems intrigued to check out my LinkedIn profile that boast of this profession.

This mythical description of the job I’m unemployed demonstrates the fallacy of giving a label for what kind of unemployment I’m currently enjoying. Eventually (maybe) I will get employed at something and that would give me positive identification of what it was that I was unemployed at, but only in retrospect. There is a freedom of being unemployed because there is no longer a job to constrain what one is. My unemployment will ultimately be from what ever it is that I’ll do next. This is what I mean by a negative category: it defies its own stand-alone definition but instead depends on being none of the other options.