YAML allows language specific tags so that arbitrary local objects can be created by a parser that supports those tags. Any YAML parser that allows sophisticated object instantiation to be executed opens the potential for an injection attack.

Consequently, it appears that it is potentially unsafe to parse untrusted input using some YAML parsers.

Can anyone explain how to safely parse untrusted YAML input, in Rails, Python, and Perl? Are there YAML parsing libraries that are safe, or ways to invoke them that ensures they are safe even if the input is from an untrusted source?

2 Answers
2

There is unfortunately no built-in safe mode for Ruby. I wrote the SafeYAML gem to plug this hole for now; and there is a discussion going on about adding this functionality to Psych, Ruby's YAML-parsing engine as of 1.9.2.

For now if you're a Ruby app developer, your best bet is likely to use SafeYAML or find a similar library to suit your needs.

Right there in that same Wikipedia article is the answer you're looking for:

Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as the Internet. The function yaml.safe_load limits this ability to simple Python objects like integers or lists.

(emphasis added)

The YAML spec allows for full-fidelity serialization and deserialization of arbitrary data structures, which includes the ability to deserialize (and therefore instantiate) any object defined in the application. Think of it like python's built-in serialization routine pickle, only with different syntax.

The safe_load method on PyYAML was created specifically to address the fact that this is inherently, disastrously unsafe. It allows you to deserialize only to universal, simple data-oriented types which are known to not have side-effects (i.e. number, string, list, etc.).

As for implementations in other languages, they may be restricted (i.e. "safe") by default, or may be unrestricted by default. You'd have to check the associated documentation. I only know about the Python implementation.

Right, that does it for Python. Still interested in Ruby and Perl. (I did see it in Wikipedia but figured it couldn't hurt to have it documented here on this question as well.) Anyway, thank you!
–
D.W.Jan 9 '13 at 7:20

What I see regarding Perl isn't promising, but it may simply be inherently safer with Perl (I'm not certain about that). It was invented for Perl after all. Similar luck with Ruby; it may be only the Python people who actually recognize this issue. Personally, I'd recommend using json for serialization anyway. Doing so solves a lot of problems.
–
tylerlJan 9 '13 at 7:55