Shortcuts

Related Links

UpdateFrom and Encoding

Hi, I installed MVCToolkit and played with tutorials. I tried to create some Product object (Linq To Sql) in an action method: But encoding for string values in Request.Form didn't happen and I successfully saved javascript code as product name :(

Re: UpdateFrom and Encoding

If you need to apply custom logic to the data that's assigned, the current UpdateFrom() method might not be a good approach to you. There isn't a way to control programmatically any encoding or access restrictions (besides passing an array of property names).

If would be nice if, in future, there was an extended version of UpdateFrom() that would take a delegate to call back some custom logic to handle each member. It could look like this:

Re: UpdateFrom and Encoding

quick question, I'm a little confused, but why would want to by default HtmlEncode an input and stuff that into your DomainObject and ultimately into the database? shouldn't it be the other way around, you HtmlEncode it for displaying purpose, and store
it as is?

Re: UpdateFrom and Encoding

It can be either way - but the safest bet is "protect inbound" since you never know where that info will turn up again. For instance if you're object is a ProductReview and you forget to encode the output. It happens.

Also - it's not a very expensive operation, but it is string manipulation and therefore has a cost to it. If you encode on every output it can add up and hurts scaling.

Re: UpdateFrom and Encoding

Rob, please tell me you're not serious about HTML-encoding all input by default. I appreciate that you're trying to protect people from their own ignorance or forgetfulness, but as shinakuma indicated, this causes more problems than it solves.

[1] It's an inappropriate mix of technologies. HTML encoding is only a concern of displaying HTML - it has no place in a SQL database. It's ugly when querying, interferes with full-text indexing, and clashes with other applications that might be reading from
the same DB. What about when I'm not outputting to HTML (e.g. when rendering to a plain-text email, WinForms app, or 3rd party system) - do I have to unescape it first? Yuck!

[2] It leads to more serious security vulnerabilities. You say it's safer to "protect inbound", but here the opposite is true. Remember that by saying "we'll protect inbound", you're also saying "you must not protect outbound" (otherwise you'll double-escape
and & will render as &amp;). So now we are forced to trust that all systems writing to the DB remember to encode all strings exactly once, and any single vulnerability spreads to all parts of the system because nothing protects outbound. "Perimiter security"
is not safer when it actually prevents you from adding real security.

[3] It's inconsistent with how decent ASP.NET controls work. Well-written controls take responsibility for HTML-encoding their own output, so pre-escaped strings would cause display problems.

[4] It's inconsistent with how other aspects of the MVC framework work. You're not pre-escaping all values passed to IHttpRequest.Form, so myObject.Value = form['key'] would behave differently to myObject.UpdateFrom(form).

If someone on my team encoded strings before putting them into the DB, then depended on that when rendering them later, I'd have to tell them off and ask them to change it. Please don't encourage this behaviour!

Re: UpdateFrom and Encoding

ScottGu and I talked a bit about this and HTMLEncoding a security concern so we have to take this into consideration when deciding how to deal with these things.

SteveSanderson1

It's an inappropriate mix of technologies. HTML encoding is only a concern of displaying HTML - it has no place in a SQL database. It's ugly when querying, interferes with full-text indexing, and clashes with other applications that might be reading from the
same DB.

In terms of it's "place" - I highly disagree. Anything that is being reported back to a web front end (usually a browser) should, at some level, be checked so that you don't inadvertantly sabotage your user. Database, flatfile, cache - the storage medium
has nothing to do with this :). FullText indexing ignores anything in <>.

One thing you can do here is to Encode on the way out - but again that's a perf thing you'd need to think about. I hear you though about the other readability issues, and we can have an override to turn it off.

SteveSanderson1

"Perimiter security" is not safer when it actually prevents you from adding real security.

Yep. Security comes at a cost at some level and I agree with you in principle here. However you're talking about a 1% issue at best (shared DB input) and many architects would completely cringe at sharing a DB with a web site for this very reason. However
your point is valid and again - we'll provide an override so the decision to turn this off if you need.

SteveSanderson1

It's inconsistent with how other aspects of the MVC framework work.

Hopefully not - but if we're not encoding properly we need to take a look at that.

Ultimately it boils down to line of responsibility - should Microsoft, by default, disable security or should you, as the developer, disable it for yourself? I'm all for trying to find a better answer here - would another route be acceptable - something
like "UpdateFromEncode(Request.Form)"? The thing is - and I'm going to stick to this - the decision to turn encoding off needs to be the developer's.

--Iniviting ScottGu and DamienG to this thread - it's a good one and it's important.

Re: UpdateFrom and Encoding

WebForms by default validates the request for injection attacks, it does NOT however by default html encode everything to be passed along "securely" into your DB. It's unreasonable to assume that your data will be in a silo consumed only by your web frontend
when dealing with enterprise applications. I don't want to have to worry about html decoding when I import these data into my warehouse or sync over to a legacy system, or like someone said output to a non-html format. That is just backwards. The default behavior
should be you encode for display, not for storage.

Re: UpdateFrom and Encoding

Any helper functions that output HTML should encode values using HttpUtility.HtmlEncode for literals and HttpUtility.HtmlAttributeEncode for attributes such as value on input tags.

If users want to output unencoded HTML then let them do that themselves without a helper as it's not easy to deal with.

Where you are bringing data back for processing through a form you don't need to do anything - the request variable will contain it unencoded.

Definitely do not encode it going into the database - to do so would pollute your data with HTML encoding. If you thought having presentation and model logic separate was important then imagine what stuffing HTML into the database would mean. If you ever
wanted to show it via another application such as a WinForm, Console or XML variant you'll have to do cross-conversion. It would also mean that you can no longer HTML encode the output or you would double encode it and actually start displaying &lt; in text
boxes.

Re: UpdateFrom and Encoding

Rob (and the others), thanks for giving this issue such careful consideration. I appreciate that you're trying to do the right thing by "enabling security by default".

Many arguments could be made, but for me the key thing is "what kind of a future are you creating". As you're in the priviledged position of designing a Microsoft-endorsed framework, the mindset you take is going to influence countless web developers for
many years to come (otherwise, this would just be a minor technicality and I wouldn't really care either way). If you proceed with the idea "we'll encode the data inbound", you are not just perpetuating ignorance about encoding it outbound, you're actually
*enforcing inaction*. It means that developers learn like monkeys they must *not* encode the data they send, because some black magic somewhere in the system does that and they have to trust it to avoid double-escaping. Do you think they are going to behave
differently when working with different data sources? It just creates unpredictability, instead of the simple rule: "escape any strings that you don't want to render as HTML".

In other words, it only enables an illusion of security, and ultimately causes a greater loss of security (as well as millions of wasted person-hours trying to remember which pieces of data are pre-encoded and which ones aren't). So yes, it does boil down
to responsibility - responsibility for our future, and our children's future... ([+o(])

What I would ultimately like, though I appreciate this would turn a few things upside-down, would be to change the meaning of <%= ... %> on MVC ViewPages, so it escapes the strings you pass it. You would force developers to use more keystrokes when they
want to avoid escaping (instead of fewer), maybe using <%!= ... %> or even <%= RawHtml(...) %>.

robconery

However you're talking about a 1% issue at best (shared DB input) and many architects would completely cringe at sharing a DB

Considering the other comments posted, the people who hang out on this forum are somewhat architecturally minded - sharing databases, legacy data storage, and interacting with external systems aren't 1% issues for us - it's what we do every day. Perhaps
we're not representative of your wider customer base, but we are the opinion shapers.

robconery

One thing you can do here is to Encode on the way out - but again that's a perf thing you'd need to think about

Have you done any benchmarking of this? As a quick test, my humble desktop can HTML-encode 1.5 million 30-character strings per second (in which about every 5th character needs encoding), and about half of that processing time goes into allocating a new
string on the heap. Compared to the cost of a database query, a few hundred encodings is completely insignificant, and if you're pushing your webserver that hard, I'd hope you're using output caching anyway.

robconery

UpdateFromEncode(Request.Form)

Yeah, great! Or just, UpdateFrom(Request.Form, Encoding.HtmlEncodeAllFields), as long as the default is not to encode.

Re: UpdateFrom and Encoding

HTML encoding is an output-only issue - there should NOT be any facility to encode data going into the database. 100%, no exceptions, not even as an option the developer can specify.

If you HTML encode data into the database you force web developers into writing unsafe code because if they do HTML encode now it will be double-encoded and will display as a mess of &quot; etc. all over the screen/output/XML/RSS.

Of course there are times when HTML is desired in the database - rich text might be one option - but that's a totally different game and in fact requires neither encoding or decoding which would actually break that. In those scenarios you require careful
scrubbing to ensure only the tags you wanted in there get kept.

You effectively turn what was a simple rule - encode data into the output stream correctly - into a guessing game by doing this.

Re: UpdateFrom and Encoding

Think about situation when Products are displayed on the web and in the desktop application.

If you will encode INPUT it will display really "incorrect" data in the desktop application.

For example, user edits product online and sets its name to "Visual Studio 2008 <XMAS EDITION>".
And the desktop application will display it as "Visual Studio 2008 &lt;XMAS EDITION&gt;".
So to aviod it, the DESTOP APP SHOULD ALSO DECODE the data which is nonsense unless its not a CMS system.

Re: UpdateFrom and Encoding

Gents I think you're misunderstanding me :). First - Damien thanks for weighing in here - much appreciated. Let's break this down a bit:

1) This isn't webforms anymore, and there are no server controls to auto-encode your raw text

2) I fully realize the implication of encoded text in the DB

3) I also fully realize the hole you create when unencoded text is sent to the browser. ScottGu just did it on his blog (yes, I know it was a demo, but how often do we work under the "this is just a quicky" thing)

4) I'm not *forcing* anyone to do anything :)

OK - that said, let me address your issues Steve...

SteveSanderson

As you're in the priviledged position of designing a Microsoft-endorsed framework

I have two jobs really - the first is this Toolkit which isn't necessarily "Microsoft-endorsed" - it's just plain Micrsoft. As such, security is top of mind.

SteveSanderson

If you proceed with the idea "we'll encode the data inbound", you are not just perpetuating ignorance about encoding it outbound, you're actually *enforcing inaction*. It means that developers learn like monkeys they must *not* encode the data they send, because
some black magic somewhere in the system does that and they have to trust it to avoid double-escaping

Monkey's and BlackMagic aside (are we in Oz?) - I hear what you're saying about double-escaping. It's a problem to be sure, and I don't think that giving developers a choice with a default to the secure side of things is *perpetuating ignorance*. If anything,
it's helping the "monkeys" keep from opening their site up to XSS.

SteveSanderson

Of course not - and I understand that. But 99% of the developers I need to support don't do this sort of thing. They make the sites that get targetted...

SteveSanderson

Have you done any benchmarking of this

String manipulation comes at a cost and even if it's minor; but this isn't my point - we do encode everything coming through the HtmlHelper bits.

I understand the anxiety, but I don't think you're grokking that I'm offering alternatives here so you can do what you like - specifically turning *off* encoding as-needed. I think *my* default stance here comes from the countless times I've built an application
that I've forgotten to encode - and I've paid for it. You've done it too - so has Damien. The issue here is what does the platform do by default - and I don't agree that I'm *perpetuating ignorance* as you put it.

So - more specifically, you'll be able to do this:

MyObject.UpdateFrom(Request.Form) which will encode everything or

MyObject.UpdateFromRaw(Request.Form)

The difference here is that you're selecting the less-secure method - the choice is yours, as it should be.

Finally - setting <%=%> up to auto-encode doesn't work either because of <pre> and <code> blocks.

Re: UpdateFrom and Encoding

damieng

If you HTML encode data into the database you force web developers into writing unsafe code because if they do HTML encode now it will be double-encoded and will display as a mess of &quot; etc. all over the screen/output/XML/RSS.

Agreed and I'm not suggesting that this is done *always* - only by *choice* - your choice :).

Re: UpdateFrom and Encoding

I totally disagree with HTML-encoding data going *into* the database even as an option or choice.

If by your own measure they aren't going to reuse the database elsewhere then what is wrong with the MVC toolkit helpers just encoding on output to the page? The problem is entirely solved there and then.

By encoding values into the database you cause problems for yourself in the MVC toolkit development because your helpers won't know whether to encode or not on output and nor will anyone else.

You end up protecting a very singular scenario - poorly written sites that deal only with your encoded data. Any other type of data - generated, request objects, sessions, cookies etc. are not protected and well written websites crumble under the double-encoding
problem.

Developers ends up never knowing what to encode on output and what not to once down this path.

The problem is purely one of output and should be treated as such. Any hack to encode data going into the database is going to cause more grief and chaos than it solves.

Re: UpdateFrom and Encoding

robconery

Gents I think you're misunderstanding me :)

No, you are not listening to the community.

robconery

1) This isn't webforms anymore, and there are no server controls to auto-encode your raw text

So why not extend the asp.net compiler to auto encode <%= %>, if its not webforms no backwards compatibility problems there? If people could make pre-compilers that add runat="server" for <asp: tags I am sure you could come up with something to wrap all
calls <%= %> in Html.Encode(). Or just switch to Boo, it can do it no problem.

robconery

2) I fully realize the implication of encoded text in the DB

No you don't or the community would not have responded the way it did.

robconery

3) I also fully realize the hole you create when unencoded text is sent to the browser. ScottGu just did it on his blog (yes, I know it was a demo, but how often do we work under the "this is just a quicky" thing)

Bad developers, bad testing, bad framework design, maybe just a bug? Don't make the same mistakes you did in webforms fix the real problem. Html Encoding
output not input.

robconery

4) I'm not *forcing* anyone to do anything :)

Yes you are, you are forcing EVERYONE on this thread to call UpdateFromRaw. You are forcing me to write fxcop rules and do code analysis ensuring that I always call UpdateFromRaw and never call UpdateFrom. You are forcing me to educate my developers
on using non standard method calls. Why would a dev know to use UpdateFromRaw that just sounds weird, whats raw? You are forcing me to write some precompiler process to replace all UpdateFrom to UpdateFromRaw because I have no good way of testing that my
database does not contain encoded data. You are forcing microsoft to upset a lot of people by not listening to the community from the begining. Why ask for feedback if you don't listen?

And you may be forcing me to stick with Monorail...

Wow... I need to stop sounding like Bellware [:)]... I guess I would need a little RoR speak to really sound like him.

This is a harsh response please excuse the tone. I appreciate the work the whole team is doing. I want this project to succeed, I just think this is the wrong solution and it is not solving the real problem.

Re: UpdateFrom and Encoding

damieng

I totally disagree with HTML-encoding data going *into* the database even as an option or choice.

Agreed - 100%. What I'm looking at is the issue where someone does a "myObject.UpdateFrom(Request.Form)" and then spins it back to the browser on error, validation, or...? You have to agree that there is a hole here that I need to consider.

damieng

You end up protecting a very singular scenario - poorly written sites that deal only with your encoded data.

Unfortunately that singular scenario dominates :(.

damieng

The problem is purely one of output and should be treated as such.

I just read your posts on this issue and I have to say you're violating your own rule:

damieng

Vulnerable code often looks like this:

myLabel.Text = Request.Form["Something"];

In this example you show how a developer will take a Request value and spin it back to the page. This is the hole I speak of.

So - I'd like to ask you guys - what is the problem with having two methods? UpdateFrom() and UpdateFromRaw()? From your responses I think you guys are positioning me as dictating "all things

must be encoded" when that's not the case. Would you rather it be UpdateFromEncoded()?

I'd like to focus on this rather than my ignorance if that's OK with you guys ;).

Re: UpdateFrom and Encoding

robconery

I'd like to focus on this rather than my ignorance if that's OK with you guys ;).

Sorry Rob wasn't trying to beat you up personally, I just feel strongly about this one.

My vote is drop UpdateFrom completely and make two methods, UpdateFromEncoded and UpdateFromDecoded. Keeping UpdateFrom with an unexpected side effect just makes the application more difficult to read, understand, and debug.

Re: UpdateFrom and Encoding

abombss

abombss

No you don't or the community would not have responded the way it did.

I'd like to see if we can avoid inflammatory remarks. You might be frustrated the things I'm mentioning here, but comments like these don't help resolve anything.

abombss

Wow... I need to stop sounding like Bellware

Yep.

abombss

This is a harsh response please excuse the tone. I appreciate the work the whole team is doing. I want this project to succeed, I just think this is the wrong solution and it is not solving the real problem.

Good to hear. Now if you can consider my point of view on this, and that I used to be YOU 2 months ago (before I go this job), we can come to a solution. In reading everything there, can I sum up your post in "I want UpdateFrom() to load unencoded text,
and UpdateFromEncoded() to encode the text" - is that right?

Again I'm going to ask you Good Fellas if you can help me with this :). If, by default, we do an UpdateFrom() that's not encoded, how can we avoid the problem where the user does this:

Re: UpdateFrom and Encoding

robconery

But, they may also output an error message:

<%=o.Subject%> is invalid!

And this would be a problem.

This is why it would great to make switch in mvc that <%= wraps the output in HtmlEncode() . It would avoid this problem and force developers to think about when then don't their data encoded versus when it needs to be encoded.

Re: UpdateFrom and Encoding

I'm talking to Phil about an operator overload that could do just this - it's a pretty sweeping change but I think it makes sense. So one of the two could happen, we could overload ! to encode everything, leaving = as unencoded, or the opposite.

Re: UpdateFrom and Encoding

The error-message scenario you mention is a good one to consider, but please recognise this is just a special case. The protection you're proposing isn't going to help if, rather than using UpdateFrom(), the coder has user myObject.Value = Request.Form["something"],
nor will it help if they are echoing back something from the querystring, from a cookie, a user-agent string or whatever. These are more common cases.

In other words, this is a bigger problem and UpdateFrom()-with-encoding doesn't help, it just muddies the waters and prevents anyone from learning the rule that output should be encoded. I hate the idea of that confusion (not to mention the basic architectural
wrongness of it) - pre-encoding sounds like a disaster in the making. There are so many benefits to the encode-on-display method that I hope you don't let this odd special case get in the way. I appreciate your earlier points that you did propose to enable
an option not to encode, but what many of us have learned over the years is that the
only thing that matters is defaults - this isn't the open-source world where devs make intelligent choices on their own, remember!

Coming back to the idea of changing the meaning of <%= ... %>, which would actually solve the problem in a much bigger way, I've written up
some ideas in a blog post, including a demo implementation of the behaviour and mechanism for retaining the same MVC Toolkit syntax. Not sure what you meant about <pre> and <code> blocks - could you elaborate?

Is a change to <%= ... %> too radical a step, or does the MVC framework give you an opportunity to truly improve matters?

Re: UpdateFrom and Encoding

For all to know - I've been on an internal discussion for the last day on this, and this thread is being followed :). This discussion means something for all who feel you're not being heard :).

SteveSanderson

The protection you're proposing isn't going to help if, rather than using UpdateFrom()...

Yes - agreed.

SteveSanderson

pre-encoding sounds like a disaster in the making

Again - I understand this - and I'm suggesting we force anything on anyone. All I want at the end of the day is the most secure solution (primarily) that is as close to acceptable and reasonable as we can get. Security in some ways leads to more work.

SteveSanderson

only thing that matters is defaults - this isn't the open-source world where devs make intelligent choices on their own, remember!

Which is my point entirely. If you default to the secure side, you're safer. I understand (once again) the ramifications of that thought and please know it's part of the ongoing discussion. One solution offered is to not have a default and go with a 50/50
option: UpdateFromEncoded() and UpdateFromRaw() - naming TBD :).

SteveSanderson

Is a change to <%= ... %> too radical a step, or does the MVC framework give you an opportunity to truly improve matters?

Initially I was thinking this might be doable but it actually can be problematic. I've asked Phil to weigh in on it so I'll let him respond.

Re: UpdateFrom and Encoding

I personally agree that incoming data should not be encoded automatically in general. I've worked on multiple systems in which the same data needed to be formatted for multiple outputs, not just HTML. From
my experience it's actually quite common.

Aside: One crazy idea is to have an encoded column and a raw column for each displayable field. I don't like it personally, but just tossing that out there.

So let's focus on the output situation.

The question that was asked is if we can override <%= %> to Html Encode output.

Right now, the ASP.NET page parser converts <%= "Foo" %> to a call to Response.Write("Foo"); I don't think we want to change Response.Write() to default to HTML encoding because that would cause the entire page template to be an unencoded mess since literals
within an aspx page are clumped together into Response.Write calls.

There are two approaches I can think of that might not require changes to the underlying ASP.NET forms engine (which would be a huge breaking change).

1. Use the approach that Steve (last name?)
does here. He effectively implements his own CSharpCodeCompiler to intercept the code generation phase I mentioned. Very neat approach.

2. Another theoretical approach (as in I think it would work, but I haven't tried it) is to hook in your own
PageParserFilter. Ideally, this approach might allow for introducing an alternative syntax for unencoded HTML such as the <%!= Html.TextBox(...) %>

The next question is whether the ASP.NET team should make this the default behavior for ASP.NET MVC?

From this thread, I get the vague sense that people here are saying yes! [;)]

One question for you:

Would it make more sense to not change the semantics of <%= and introduce <%!= as the encoded version? Or perhaps <%%= (since % is an encoding character and might make more sense semantically?) The reason I bring this up
is that <%= has a looooong history in multiple template engines, not just ASP.NET. For example, RoR does not automatically encode <%=.

What would cause the least surprise for developers?

As for whether we include it in ASP.NET MVC, I'm sure there are issues I haven't even thought about. I personally think it's worth considering if its possible without huge underlying changes. It does seem to adhere to the general security principle of exposing
the least surface area and requiring opt-in when exposing something unsafe.

Re: UpdateFrom and Encoding

I'm for changing <%= default behaviour to cause HTML encoding by default because I believe Microsoft should be secure by default which I thought was their stance since IIS6 - even if it does mean breaking or knowing things (i.e. switching on ASPX ability)

If they are worried about breaking existing solutions then switch it on in the web.config as default for new projects via the MVC VS templates.

How easy is it to make it contextually aware? I deally if <%= %> is filling in an attribute of a HTML tag it should end up calling HttpUtility.HtmlAttributeEncode rather than ending up at HttpUtility.HtmlEncode.

Re: UpdateFrom and Encoding

Haacked

What would cause the least surprise for developers?

I understand the argument for making it the default (<%= %>), but I think the least amount of surprise will be introducing <%!= %> as encoding, since none of the other frameworks do encoding by default. would be a surprise for someone not to expect that.

Re: UpdateFrom and Encoding

When I looked at Ruby on Rails and security, I was surprised to see that it didn't automatically encode the output. It was a prefect opportunity to start making websites secure by default and sadly even with h() being very simply to implement, in books
its still only discussed around chapter 6-7.

I'm for secure by default, however it does come at a cost of maybe developers not being aware of the issues involved. However, those developers would also be the developers releasing insecure applications anyway. By having secure by default, we are protecting
against those people, or should I say we are protecting users from those people.

I also don't think we should be against changing the way <%= works due to the way it worked in the past / in other frameworks. If its wrong, we should change it.....

For me, I would like to see:

<%= Encoded %> <%!= Not Encoded %>

ASP.net 2.0 auto encodes for GridView etc and no one seems to complain....

Oh, on a sidenote. HtmlUtility.HtmlEncode is not the preferred way of securing your application as it uses blacklisting instead of whitelisting. No reported problems in 2.0 (there was in 1.1) but Microsoft have released the AntiXSS Library as the preferred
approach... :)

Re: UpdateFrom and Encoding

The developer does have to take some culpability in security as well. I don't think it's a fair assessment to assume not wanting to change the behavior of <%= is due to a lack of awareness or attention to security. It's not such a black and white decision.

We've had some interesting discussions internally sparked by this thread. One point I should bring up is that <%= %> is neither inherently secure nor insecure. It's a shorthand for <% Response.Write() %>. Should we automatically encode Response.Writes now?
Certainly not.

Here's another thing to chew on. Suppose we change <%= to html encode just for MVC. What happens when you have projects with both WebForms and MVC templates in the same project?

Inconsistency is just as bad (if not worse) for security as requiring opt-out. Now, when you're on some pages, <%= automatically encodes while other pages <%= doesn't encode. That's not a good situation.

One might conclude that <%= should be a universal change across ASP.NET. While I don't take security lightly, I also don't take a massive breaking change across nearly every site that uses ASP.NET lightly either.

Here are some ideas for improving the situation:

<div mce_keep="true">You as a developer can use the approach that David wrote-up to change <%= yourself.</div>

<div mce_keep="true">An extension method ToEncode() or ToHtmlEncode() is pretty nice as its discoverable and only two more key presses (with VS Intellisense).</div>

<div mce_keep="true">Better code samples and more scrutiny of code samples. Samples should not include XSS flaws as well as disclaimers about not using in prod.</div>

<div mce_keep="true">More education.</div>

<div mce_keep="true">Studies. How often are XSS exploits discovered and exploited on ASP.NET? This is important data for this discussion.</div>

<div mce_keep="true">Declarative controls for MVC that do the right thing in regards to encoding.</div>

After all, even if we did do <%= we haven't really prevented very common XSS attack avenues. For example, sites that allow users to submit HTML and display HTML. Encoding doesn't help in those cases. Only HTML scrubbing, white listed html tags, etc...

For example, we might consider a ToSafeHtml() extension method that doesn't encode HTML, but strips out common exploit tags. Or .ToSafeHtml(string[] allowedTags);

In any case, this is a very interesting discussion and thinking about helping to prevent XSS is certainly on our radar. Ultimately though, I think providing consistency and not too much magic under the hood is required.

We could do all the magic in the world, and a bad developer will always be able to write a page with an exploit. By at least remaining consistent, good developers know what to expect and can apply tried and true practices for writing secure web pages without
worrying about the framework circumventing their efforts.

Speculating, but I imagine that's why RoR and other frameworks chose to not encode <%=.

Re: UpdateFrom and Encoding

The more I read the thread and think about it, the more it makes sense not to change the default behavior.

Haacked

The developer does have to take some culpability in security as well

+1

My feeling is, keep as is and don't encode any input. If Microsoft wanted to include an extension for encoding with <%!= thats cool, otherwise I am sure MVC Contrib could pick up something like that too. This is something that should be consistent with
other frameworks, not innovative or creative... changing this will just makes things confusing.

Another good tool would be an httpmodule that could filter output and throw warnings for potential xss attacks when in a test environment. Not perfect but certainly better than nothing.

Re: UpdateFrom and Encoding

Phil, You make some very good points. No solution is fully secure (ValidateRequest has a number of holes) and I have to say I didn't think about WebForms + MVC in the same project which does put a different view on the idea of changing <%.

The solution needs to be simply and easy for the newbie to find. If its an external method (like HtmlEncode) then they are unlikely to come across it unless they go looking at security - I wonder how many developers bypass that topic. I have saw a number
of rails applications which have not used the h() simply because the developers wasn't aware of XSS, so even making it simple doesn't mean it will be implemented. This then comes down to your point about making sure all the samples are secure. Too many books
publish security flaws in their samples.

I tried to ask the rails team their decision not to be secure by default, I forget the extract quote but I think it was something like its upto the developers to make their applications secure and follow best practices, we don't want to enforce what they
should and shouldn't do. (from memory)

Re: UpdateFrom and Encoding

Absolutely the developer should be taking some responsibility but one oversight is all it takes. I've been encoding my HTML since '99 and have still missed it a few times in ASP and now in ASP.NET because of the src/img/href issue in HtmlControls.

There are so many changes between the MVC stack and WebForms such as loosing server-side controls, viewstate, postback, events that adding encoding by default to <%= %> is going to be a footnote in comparison.

You're right we shouldn't encoded Response.Write but I think we do need a strategy for building HTML from inside code for controls/helpers. I think the HtmlControls are a good start although better constructors and that bug fix would be needed.

I think a solution such as Steves whereby you can specify an option to enable it in the web.config for both MVC and ASP.NET applications switched on for MVC because it's new and the right thing to do. Whether it is switched on by default for new ASP.NET
projects is something for Microsoft teams to think about but being able to switch it on for those is a good start.

I'm not sure ToEncode or ToHtmlEncode add anything other than people attempting to write out HTML from code using strings rather than a recommended set of controls designed for building HTML. It's still easy to get things wrong such as standards compliance
and the whole thing would fall under the 'Primitive Obsession' smell because we are thinking too much about the serialisation format (string) rather than a logical abstraction (dom / htmlcontrols).

Robust supplied .NET functionality for white-listing HTML would be fantastic, +1 on that.

I don't think we can draw from the RoR philosophy and do nothing. As RoR matures as a platform I think they are going to come back and regret that decision - I bet there are exploitable RoR systems out there already.

Re: UpdateFrom and Encoding

Thanks again to everyone putting effort into discussing and thinking this through.

I'd love it if you achieved "minimum keystrokes == maximum safety", so I have a strong preference that <%= ... %> should encode its output.

Yes, it is a breaking change, but ASP.NET has had breaking changes in the past and the world didn't end. When request validation was shipped as an automatic update in .NET 1.1, the first my last company knew about it was when the production web app started
breaking. That was unpleasant, but they changed the web.config and got on with business. What you're proposing here is far less dramatic - this is a whole new platform which nobody even has in production yet. And you could enable changing <%= ... %> back to
its old behaviour in web.config.

As Damien points out, this is nothing compared to the breaking of nearly all existing web controls, which has no workaround (which I totally understand and support). If this isn't your perfect opportunity to make a big improvement to security, I fear there
will never be one.

What do you think about the idea of not actually forcing the developer to choose between encoding and not encoding - and letting the framework make the choice by default? It could be determined by the type of the evaluation result. If it's a plain old string,
encode it, if it's a special variant on string (let's call it RawHtml), then don't encode it. The HTML helper methods would return RawHtml of course.

Sounds weird, but the key point is that it makes it very difficult for the developer to do the wrong thing. They just write <%= ... %> and the right thing happens, except in the rare case where they want to print HTML verbatim *without* using a HTML helper
method, in which case they write <%= (RawHtml)value %>, or <%= value.toRawHtml() %> - a deliberately non-concise syntax to discourage unnecessary use. Even then it chose the safe default until they added the typecast. I know it's less obvious at first but
it's very simple when you know it. This is what I implemented in my
demo by the way. In my view this is more enlightened than NVelocity's style of forcing the developer to manually choose between $ and $! all the time (they're just one keystroke different, you can pick the wrong one without thinking about it).

But even if you go with the manual choice approach (with <%!= ... %>), this could be one of the best features of ASP.NET MVC.

Re: UpdateFrom and Encoding

I'm not too fond of encoding the input by default for all the reasons already mentioned. I like the idea of adding to <%=%>, but not changing it. As others have said, that would cause some confusion. If something like <%% = %%> did HTML encoding by default
it would not break the old model, and it would get used since it's easy. Also, it would be an opportunity to educate developers on XSS. Every one would say to always use <%% = %%> and when devs asked why they could learn about XSS. Education on the matter
is probably more important than the framework handling everything.

I also like the idea of having extension methods to handle the different encodings for HTML, XML, and JavaScript. They are discoverable and easy to use.

Re: UpdateFrom and Encoding

Ghotiman, I certainly agree that education is important. I'd be sad if we force people to type a big trainwreck of punctuation (<%% ... %%>) just to get the behaviour they should "always use". If other developers are as lazy as me, they're just going to press
the fewest number of keys needed to say their project is finished, which means <%= ... %> ("seems to work fine").

Re: UpdateFrom and Encoding

SteveSanderson1, I don't care to much what the syntax should be, just that it be different than the current <%= %>. If they change the behavoir of <%= %> then most of the developers who need to be educated on this will never know anything about it. Also,
there will be the inconsistant behavior between forms and mvc, or old code will break. If there is a new syntax, it can serve as the talking point for XSS. The new syntax should be easy, I think everyone can agree on that. I just think it should be different.

Re: UpdateFrom and Encoding

It absolutely baffles me to think that anyone would encourage default encoding. I know not everyone here has worked in PHP, but that language is a unorganized mess because of features such as "Magic_Escape_Strings" and "strip_Slashes" etc. The lesson learned?
Leave the encoding to be done by the developer, not the framework implementation.

We are all professionals here right? There is no need for hand holding.

Re: UpdateFrom and Encoding

It is incredibly easy to miss or forget with alternate syntax, especially if you also maintain ASP.NET which won't have the new syntax. If you could rely on everybody being professional you wouldn't have this problem - the fact is you can't rely on that.

As for hand-holding the fact is the majority of the time <%= %> is used to output simple text not HTML building.

Re: UpdateFrom and Encoding

I believe that the only responsibility of Microsoft is to provide the means to encode and decode HTML. The responsibility to use it is placed at the feet of the developer.

Sure, ignorance leads to crappy, vulnerable sites....But that is not Microsoft's fault, and that is not really their problem. As you can see from the PHP community, default features such as this only provide a false sense of security and enable the crappy
developer to continue to develop crappy sites. It does not magically turn them into good developers. Would you tell the ADO.NET team to only allow parameterized queries? Every decent developer knows better than using string concatenation to build a query,
but you will never be able to convince the ADO.NET team that removing the ability to do queries that way is a good idea.

Re: UpdateFrom and Encoding

I'm all in favour of trying to eliminate all those poorly developed sites. And if we can take a huge bite out of them by html encoding on input then it's possibly not such a bad idea.

However, I'm developing a WinForms interface which accesses the same database so encoding on input is obviously not an option.

The question then is "which to do by default?".

Well, if encode on input is OFF by default, will a novice turn it on? I'm not so sure that they would. If they are lazy enough not to bother about security they aren't even going to care that such an option exists.

If it's ON by default, will we turn if off? Of course we will. We'll check the comments of our web.config and turn it off. All we need is good documentation!

To be honest I'm not too concerned about which way they implement it as long as its extensible, overridable, configurable, and most of all it should be obvious what it's doing.

Re: UpdateFrom and Encoding

Encode on input is so very very wrong it's hard to put it into words. You would be forcing developers to do the wrong thing on input, and the wrong thing on output (no encoding) which means they'll be used to no encoding and when they pick up content that
hasn't processed through your input encoder is wide open for attack again.

Re: UpdateFrom and Encoding

Great thread. Nice to see how this discussion was brought back from a down spiral.

I'd like to add that although security is a cross-cutting concern,
it needs to be dealt in each component at the component's context. By that I mean, if the UI needs to be secure (it does) then UI security measures need to be taken in the UI code (HtmlEncode in the outputs for example), never let UI concerns happen at
the controller level (if you see any HtmlEncode calls in the controller code, think about it a little more.) And, please, no UI encoded stuff in my database, OK? :)

An example of security concern that could be dealt at the controller level would be to inspect the user input before assigning it to the models and later to the database. In practical terms a good example of that is making sure only the fields you expect
to be in the Response.Form gets populated in the model, so be careful on how you use myModel.UpdateFrom(Request.Form) so that you don't allow hand-crafted form posts to update sensitive properties.

Re: UpdateFrom and Encoding

damieng

Encode on input is so very very wrong it's hard to put it into words. You would be forcing developers to do the wrong thing on input, and the wrong thing on output (no encoding) which means they'll be used to no encoding and when they pick up content that hasn't
processed through your input encoder is wide open for attack again.

One thing I'd like to reiterate at this point in the thread is that I completely agree with no encoded data in the DB. My point (and others who like the idea of encoding on input) is that you protect one of the main vectors of XSS attack - regurgitation
of input to the screen (search results being one of the main culprits).

So, to be clear: no one is suggesting you must input encoded text into your Db. Especially me. This discussion started with regards to a method that binds an object from Request.Form and the decision to encode/decode being your or ours.

Re: UpdateFrom and Encoding

I'd like it to be overloaded.. but by default I would prefer not to encode..

I can kind of see the reason for the default to be encoded.. defaults targetting the most common scenario.. and I imagine you would love your 'product' to do as much as possible for you by default.. but stay highly custimizable and configurable..

Either way I wouldn't be too bothered what the default is as there is some option.. as otherwise it would just add to the code in .NET that becomes useless outside of its default scenario.. (disclaimer: most of the code in .NET I think it done great)..

My personal argument for the output encoding is that it should be handler by the output generators.. but it would be great if we had a simpler way to do primative output encoding (such as the <%!= idea).