I don't really want to publish the code yet since I am overriding functions and parameters on an "as required" basis and haven't done a review of the ext api. However if there is sufficient interest in doing this then I'd be happy to share it and work with other people.

Yeah - after an afternoon of digging deeper, it seems that htmlEncode is the answer. Adding ':htmlEncode' over all my XTemplates has been easy. Now I get to play with the trees and grids. Hopefully it won't be too painful.

Yeah - after an afternoon of digging deeper, it seems that htmlEncode is the answer. Adding ':htmlEncode' over all my XTemplates has been easy. Now I get to play with the trees and grids. Hopefully it won't be too painful.

This sounds like a good solution.

In the general case however it is critical that the server side validate input before it ever makes it into the database - this is certainly not the presentation layer's job. Any server side app that allows tainted input like that mentioned is not doing its job correctly.

One could make a reasonable argument that the presentation layer could also attempt to guard against this in case of man-in-the-middle attacks.

I will disagree, but only in an orthogonal way. You are correct that the server should validate data. However, that doesn't mean that the server should consider the presentation layer encoding issues. That is something that only EXT-JS and the web browser need be concerned with.

Data has a type, and the server should validate that data conforms to the requirements of that type. Consider a field that represents a users age. The business requirements for this probably require this to be a numeric field, and thus the server should validate the value before accepting it. (a well-designed client should probably also check for a number before sending it to the server, but this really depends on how the UI is designed. the authoritative validation is on the server)

This is a data type check, one that happens to be HTML safe. A user's name has a completely different range of values, and may or may not be HTML safe. "I <3 puppies" might just be a valid name for your application. Could "Norman Richards <orb@nostacktrace.com>" be valid for an email address field? Maybe, maybe not. It depends on your server-side business concerns. What about, for the sake of illustration, field that asks for the users favorite HTML tag? "<blink>" it is!

The point here is that server-side data validation does not imply HTML scrubbing or HTML encoding. To do so would probably be wrong. If you are storing your data in an XML file, maybe it gets encoded that way. If you are storing it in a database, encoding it that way would almost always be wrong. Maybe your business validation rules require it for a type, maybe they don't. Almost any non-trivial application will have fields whose values are not necessarily HTML safe.

Ok, so now we've established that server side validation has nothing to do with presentation layer encoding. They are related for sure, and in some cases (numeric values) they could be functionally equivalent, but that should be seen as a coincidence, and not one I think an experienced developer should rely upon.

So, now let's switch to the client side. The web client gets data from the server. This data is a value that should be presented to the user. In an extjs app, it is typically inserted into the DOM somewhere, usually in a way that treats it as HTML. That is to say we have data that almost always is not HTML itself, and at best is only coincidentally HTML-safe. It's data that, unless it is explicitly known to be typed as HTML and intended to be inserted directly into the DOM must be properly escaped to avoid all sorts of errors - involving security and just plain correctness of the app.

Remember, having HTML-unsafe values is NOT (necessarily) an error. An experienced web developer should recognize these values as the norm and not as the exception, and I would argue that any sane framework should always escape values before inserting them into the document unless explicitly told not to by the developer.

A framework that doesn't do this by default is setting it's users up for problems. A framework that makes this difficult or obscure is being negligent. The good news here is that it doesn't seem that hard to do with extjs. It's just not well documented, and from my skim of the forums not even very well understood, which is quite scary.

A framework that makes this difficult or obscure is being negligent. The good news here is that it doesn't seem that hard to do with extjs. It's just not well documented, and from my skim of the forums not even very well understood, which is quite scary.

That applies to any security issue. The problem with web app security is not that it's that hard to do, it's that you have to know what you're doing. If you look at the typical novice user on the ext forums, they simply don't have the technical background to know that they should be entity-encoding their values. I don't think it's reasonable to expect them to do what they don't know they should be doing.

Surely this is a no brainer. The server should not send unsafe data to the client whether there are security measures turned on by javascript or not. As once this unsafe data is on the client any knoweldgable person will be able to disable any security measures you have taken. Hence this change in view of tightening security is a waste of time.

Ofcourse there is a benefit for displaying html tags inside text fields for example.

In the general case however it is critical that the server side validate input before it ever makes it into the database - this is certainly not the presentation layer's job. Any server side app that allows tainted input like that mentioned is not doing its job correctly.

One could make a reasonable argument that the presentation layer could also attempt to guard against this in case of man-in-the-middle attacks.

If there's a man in the middle attack, you have serious problems no matter what.

My strategy is to scrub the content before sending to the browser. It can be stored in the database as-is, as long as you're scrubbing it for SQL injection.

It's a consistent pattern to use, and it makes sense to me. Consider you have a user table in your database that contains encrypted password and salt and email address and other sensitive information. If you're going to send an array of user information to the browser to display in a grid or tree or something, you're going to scrub it by removing the password, salt, and email address from each record.

As long as you're scrubbing those kinds of things, you may as well scrub the content of the fields for anything a user may have input that could be malicious.

Consider this vb4 software. In the admincp, you can set a prefix and suffix for user name display. Administrators can be shown in bold+blue, premium members in bold+black, and so on. If admincp were an ExtJS app, there'd be textfields that would allow the administrator to enter <span style="font-weight: bold; color: blue;"> for prefix and </span> for suffix. Why would you want ExtJS to be scrubbing those textfields?

Posts, on the other hand, are the obvious places where the public can do malicious things. If you use ExtJS' HTMLEditor to edit posts, anyone can basically add <script> tags (or img tags or whatever) that can do very bad things to other users who view those posts. So you have a business logic case where you might allow those things to be entered, but you would scrub the post content before displaying them to users.

It's obviously not specific to posts - users can leave visitor messages, post in member groups, send private messages, add fields to their profile, and so on.

Chods - I think you are confusing issues. Let's take the "I <3 puppies" example again. Is this a valid data value, perhaps for a user's display name in an application? It depends on your application. It's possible that it's an illegal value, but it's quite reasonable to say that it is a valid value. (If you don't like like it for user display name, think "user info" or something where it would be)

This is a valid value. It is safe to put this value in the database exactly like that. If you don't agree, ask yourself how you would search for all users that heart ("<3") something in their name. You certainly wouldn't html encode it and then search for "&lt;3", right? Or, ask yourself how you would include this value from the database in an email, a pdf or an iphone app. None of these applications should need to deal with html encoded data.

Ok, so now hopefully we are all on board that this is both a valid value and that it's appropriate to store it in the database directly. (that is, the database should know "I <3 puppies" not "I &lt;3 puppies" as the value) The next step is getting this information from the server to the client. Normally this will be through some sort of web service or remoting layer. The value is transfered through some sort of encoding. Maybe you use DWR and the results are encoded in a JSON-ish way. Maybe you use XML and you encode the data an XML-friendly way. Maybe your data value is returned as a single raw text/plain HTTP response. Whatever mechanism you chose, your server would encode the raw value in a way appropriate to the transport. Some of these may be equivalent to html encoding, some might not.

No matter what way it is encoded, in the browser it would be decoded from the transport representation back into the raw value. In other words, it should be possible to have a displayName javascript var that has exactly the text "I <3 puppies", right? If you disagree, ask yourself how you would take the length of the string or reverse the string in the client if you didn't have the raw value. Maybe that is the only purpose of the data. You aren't displaying it on the screen - you are performing a computation on the data, and for that you need the raw data, right? Whatever you are doing with the data, the raw data is needed.

Now, let's consider two tasks you might have for this completely valid and reasonable (but html unsafe) data value. You may want to include this value in a URL, maybe as a query parameter. In this case, the value needs to be URL encoded, right? This is different from HTML encoding, and you wouldn't really want to URL encode an html encoded value, right? How would the server that the URL points to deal with that? It wouldn't make sense. You encode the raw data in the way appropriate to it's use.

Finally, we get to the point that we want to include this valid and reasonable (but html unsafe) data value in the HTML document. This is done in javascript on the browser. Since the data is not HTML already (and almost none of our raw data would be) it should HTML encoded, right? Surely it's wrong to just put potentially unsafe values into the DOM.

Whenever we cross a data boundary, the data needs to be encoded properly for where it is going. When a raw value is sent from the server, it needs to be encoded in a way appropriate to the transport. It's then decoded on the client. When the client wants to cross another boundary, such as from a javascript variable into raw html, it must be encoded for html. Not encoding for HTML is a failure, and it is wrong. It might not happen to cause problems in some cases, but it is still wrong. The only case where it is right to NOT encode is when you are specifically returning data this is already HTML. If your server API is getDisplayNameAsHtml instead of getDisplayName, then you could reasonably insert the value into the DOM directly. But, designing an API like this is a pretty poor technique.

Ok, so now we can ask ourselves clearly whose responsibility is it to html encode data that goes into the DOM. Unless you are dealing with a (IMHO poorly designed) API that is only targeting HTML clients and never does any client-side processing of that data other than inserting it directly into the DOM, this is not the server's responsibility. The server encodes for the transport, not for the (possibly unknown) user of the API. The responsibility lies squarely on the user, on the client code you write running in the browser. In our context, that is our code using extjs, and extjs should make it easy to do this. (and, to re-iterate, after investigating, it's not all that hard. it's just not well documented)

I would argue further that extjs should do this by default and never allow the incorrect and unsafe behavior unless the developer asks for it, but that's a matter of style and perhaps backwards compatibility. If you are writing extjs apps and you aren't html encoding almost everything you get from the server, I would suggest you probably are making seriously mistakes somewhere in your app.

@mschwartz: your strategy of encoding before sending to the browser only works if all the clients are web clients. I am writing apps that have a multi-client model, where some of the clients calling the web services are not web apps.

OK. Thats all fine but as a solution to XSS this does not solve the problem as its easy for any hacker to disable the encoding function whatever that maybe and the hacker has access to the dirty data. As per the title of this thread the solution to resolve the issue on the client does not work.