I am looking for a solution for the validation of a dictionary with one constraint being an is in constraint where the values considered valid stem from the dictionary being validated itself.

For instance, imagine the following pseudo-schema

{
"notions" : [ string ],
"category" : [ is in notions ]
}

To be totally clear, I also express the constraints of this pseudo-schema verbally. These are the constraints I want to validate, be d the dictionary to validate:

set(d.keys()) == {"notions", "categories"}

isinstance(d["notions"], list)

isinstance(notion, str) for notion in d["notions"]

isinstance(d["category"], list)

element is in d["notion"] for element in d["category"]

Don't ask, whether this specific data structure makes any sense. It does not. I just made it up, to create a minimum example for my problem. My actual dictionary schema is much more complex and would have multiple references to values from the dict. That is why I would like to avoid to define and validate the constraints manually, and would prefer a schema-based solution.

I have looked into some schema validation libraries, but I haven't found this feature included anywhere. Is there a solution based on some libraries, maybe with a small tweak? I would rather prefer not to invent the wheel a second time.

Well, ok, we don't need to talk about the schema. But I have no idea what isinstance(notion, string) for notion in d["notions"] (point 3) is supposed to do.
– roganjoshMar 14 at 22:37

@roganjosh Sorry, this was supposed to be isinstance(notion, str) for notion in d["notions"], it should mean that all items of the list d["notions"] are strings.
– jonathan.scholbachMar 14 at 22:39

I kinda half expect the typing module to cover some of this when building the dictionary but not all of it. This is pretty extensive testing; I assume this is a test suite and not part of the regular flow?
– roganjoshMar 14 at 22:44

d.keys() == ["notions", "categories"] might suffer from dictionaries being unordered in Python < 3.7 (you might get away with 3.6) and then element is in d["notion"] for element in d["category"] just keeps growing.
– roganjoshMar 14 at 22:47

1

@roganjosh: typing deals with object types, not values. str is an object type, all(v in d['notion'] for v in d['category']) is a value restriction.
– Martijn Pieters♦Mar 15 at 10:10

The general objection is that making the validation schema dependent on the data it is validating makes it hard to keep validation context-free (which makes implementation easier, and makes validation in parallel a lot easier), and it makes static analysis of schemas much harder (as the schema changes with the data at runtime).

That said, the Colander project can do what you want, as it allows you to trivially define validators in Python code.

Note that the validator is defined at the level where both notions and category are defined in, as a validator only has access to the 'local' section of the data being validated (with all child node validations already haven taken place). If you defined the validator only for category then you couldn't access the notions list, and you can count on the notions list already having been validated. The validator raises an Invalid exception, and the first argument is the category schema node to lay the blame squarely on the values in that list.

A Colander schema validates as it de-serializes; you can see the input of the Schema.deserialize() method as unvalidated data (a colander serialization) and the output as application-ready data (appdata), validated and cleaned up. That's because Colander will also put default values in place if missing, can produce tuples, sets, datetime values, and more, and also supports data preparation (cleaning up HTML, etc.) as you process it with the schema.

With some demo input the above schema validates and returns the validated structure if successful:

Your dictionary is that complex then you're doing this all wrong. Consider creating classes and store an object of that class in the dictionary. Those classes can also hold other objects of other classes. This way you'll avoid the nesting of dictionaries. Create functions within classes to validate its data.

Nesting of dictionaries? Where is that happening?
– roganjoshMar 14 at 22:47

"You're doing it all wrong" is a pretty strong statement and I don't think it's substantiated. The OP is just putting in a lot of checks to ensure the dict is consistent with what they expect.
– roganjoshMar 14 at 22:49

My first approach was actually to create a bunch of classes to tackle the validation. After having finished this, I thought: "There must be an easier solution" (My use case has 6 levels of nesting.) And I really think, it is not good style to create a lot of classes for this validation. Whenever the schema changes, or I want to have additional schemas, I need to write a bunch of new classes. This does not seem very DRY to me.
– jonathan.scholbachMar 14 at 22:55

Ok let me take a step back. In any case you have to keep some sort of documentation of where and when you're implementing those validation checks. I would still write separate classes on separate files and document it. At least that way my code is cleaner and easier to read.
– Lawrence KhanMar 15 at 3:30

1

This is a common problem to solve when validating de-serialised from a standardised programming-language-neutral data exechange format of key-value pairings and sequences of values. Like JSON, or YAML. You can transform such a structure into instances, but you may want to validate the data first, as early as possible, before you do so. What the OP is asking for is entirely reasonable.
– Martijn Pieters♦Mar 15 at 12:20