Sending platform agnostic objects from a server

Hi, I have a smartphone application that I'm working on as a hobby. I thought I'd change the architecture of the application to offload to a server some of the work that is currently being by the application. I figure this could reduce the development of code that would have to be unique for multiple smartphone platforms/languages.

In my mind, the application would make a request to the server. The response to the request would be a collection of objects that the application would use. If I understand SOAP at all, it would be the simplest way to go for this. The SOAP protocol sends/receives data as XML and each platform that makes a request could take the XML response and deal with it in its own way.

Are there other simpler alternatives to SOAP that could be easily accessed with mutiple languages?

Is there a particular reason that you're looking to emit XML rather than JSON?Also, anyone who tells you that the S in SOAP really stands for simple* is not your friend and has no clue what they're talking about.

Are there other simpler alternatives to SOAP that could be easily accessed with mutiple languages?

They are legion: YAML, SOAP, XML of your own devising are all common (perhaps not YAML anymore)

JSON would likely be less pain than SOAP I would think (disclosure I've never *used* JSON in anger, though I have with SOAP[1])

Take care with thinking of the returned things as being "objects", that way lies attempts at transparent RPC mechanisms, and that way lies madness. Treat them as reasonably simple chunks of data (the more simple the less mapping nastiness you have in different environments).

Think very carefully before you put in anything that represents something with implicit state (DateTimes and file system paths for example) as such things tend to cause issues down the line.

Adding to Shuggy's suggestion to distinguish between the data you transmit and "real objects", keep in mind that JSON doesn't support anything directly analogous to pointers, so if you want several references to the same "object", you'll have to do that manually (refer to IDs and separately map IDs to objects), which isn't pretty. If all you want to transfer is average-ass data, you should be fine, but if you are trying to serialize "real objects", it's a major pitfall.

Is there a particular reason that you're looking to emit XML rather than JSON?Also, anyone who tells you that the S in SOAP really stands for simple* is not your friend and has no clue what they're talking about.

* "more simple than CORBA" is at once true and meaningless

I'm thinking about using XML vs JSON because I really don't know anything about JSON. But, I can look into it. Thanks.

Are there other simpler alternatives to SOAP that could be easily accessed with mutiple languages?

Take care with thinking of the returned things as being "objects", that way lies attempts at transparent RPC mechanisms, and that way lies madness. Treat them as reasonably simple chunks of data (the more simple the less mapping nastiness you have in different environments).

Think very carefully before you put in anything that represents something with implicit state (DateTimes and file system paths for example) as such things tend to cause issues down the line.

All of the data that I plan to transmit would just be strings. The strings wouldn't be reinterpreted as other types of objects. They would simply be assigned to member variables in an object instantiated by the application.

I didn't want to get into anything like RPC/CORBA/RMI. From what little I remember of RMI/CORBA, you need to create platform-specific stubs. That seems like way too much overhead for transmitted string data(or stuff that can be protected with CDATA sections).

If HTTP is your transport, you should consider REST + JSON. The HTTP verbs cover the CRUD (create, retrieve, update, delete) part of the message, while the JSON contains the objects you are acting upon. If you're using a dynamic language on the server-side, you can often map JSON "objects" to server-side objects -- e.g. a JSON object like this:

Code:

{"first_name":"Joe", "last_name":"Smith"}

Could get parsed into a "Person" object on the server-side pretty easily in Python or Ruby.

You can also use the HTTP Headers to negotiate the content type, so that if you want to use JSON for some platforms, or XML for others, or a native HTML interface on your API, you have a built-in mechanism for content type negotiation.

The OP might not be using Java. And if (s)he is, why not use GSON? It seems simpler and lighter than Jackson.

I assumed an Android application (perhaps because he was talking about SOAP and I was imagining sad enterprisey Java people). If it's not, enjoy all the crappy JSON parsing stuff in Obj-C land! (Seriously it's pretty much all gross, I've previously considered using libyaml-cpp to parse JSON on iOS because it's less horrible.)

As far as Java goes, GSON is fairly slow--probably worse on Dalvik--generates a good bit of garbage--definitely worse on Dalvik--and its API for deserialization is gross (or was, maybe it's improved these days). Jackson 2.0 is pretty much The Way To Go as far as I'm concerned.

The OP might not be using Java. And if (s)he is, why not use GSON? It seems simpler and lighter than Jackson.

Actually, I was thinking of using Java on the server side:

1) Because I am familiar with Java/Servlets2) Because if I go the SOAP route, at the very least, Android/Java and Cocoa have native XML parsers.3) While there is the argument that you should use the best tool for the job, I am inclined to minimize the number of languages involved. Also, part of the reason that this project is a hobby is because the stuff being done on the server is brittle. Since I know how to create unit tests in Java, I can alert myself when the brittle part breaks. Including a server component also means centralizing this brittle part.

If I were to use JSON, how well does it handle transmitting HTML? For that matter, from what reading I've done, JSON has a very simple key/value pair syntax, so how well does JSON do with transmitting large (on the order of kilobytes or megabytes) "values"?

If I were to use JSON, how well does it handle transmitting HTML snippets?

If you just stick such a thing in a string then it's pretty trivial...

The design considerations of embedding markup in something that's intended to be 'just data' is another thing entirely. I'm sure there are situations where it is reasonable (supplying a fully independent document within a request for example). I suspect many times it is indicative of a flawed approach to the system as a whole though.

2) Because if I go the SOAP route, at the very least, Android/Java and Cocoa have native XML parsers.

If you are comfortable with the idea of sending XML, send XML. SOAP presents very real pain points when you try to get SOAP messaging in a coherent ways on different platforms, as their parsers don't seem to play well with one another as often as you think.

One partner system we were working with not only had problems with SOAP messages from our (.Net) platform to theirs (Java), they had problems with SOAP internally (from JBoss to WebSpehere). Everyone was positive the other guy was sending malformed messages.

2) Because if I go the SOAP route, at the very least, Android/Java and Cocoa have native XML parsers.

If you are comfortable with the idea of sending XML, send XML. SOAP presents very real pain points when you try to get SOAP messaging in a coherent ways on different platforms, as their parsers don't seem to play well with one another as often as you think.

One partner system we were working with not only had problems with SOAP messages from our (.Net) platform to theirs (Java), they had problems with SOAP internally (from JBoss to WebSpehere). Everyone was positive the other guy was sending malformed messages.

The design considerations of embedding markup in something that's intended to be 'just data' is another thing entirely. I'm sure there are situations where it is reasonable (supplying a fully independent document within a request for example). I suspect many times it is indicative of a flawed approach to the system as a whole though.

I cast a very squinty eye at any Json responses that incorporate HTML markup strings.

There needs to be a very strong reason that the data is coming across that way. IMHO html markup strings in Json is a very strong indicator that the presentation layer is leaking. Its very hard to be a "platform agnostic data object" if you have pieces of your presentation implementation meshed in.

If HTML needs to be served, then they should either live at another endpoint, or be served as actual HTML fragments from your web services through the use of content negotiation. That's generally easily done these days on almost every modern platform.

Even better (IMHO) is to generate the HTML on the client side, directly from the data using a template.

Hi, I have a smartphone application that I'm working on as a hobby. I thought I'd change the architecture of the application to offload to a server some of the work that is currently being by the application. I figure this could reduce the development of code that would have to be unique for multiple smartphone platforms/languages.

In my mind, the application would make a request to the server. The response to the request would be a collection of objects that the application would use. If I understand SOAP at all, it would be the simplest way to go for this. The SOAP protocol sends/receives data as XML and each platform that makes a request could take the XML response and deal with it in its own way.

Are there other simpler alternatives to SOAP that could be easily accessed with mutiple languages?

Thanks,

Jason Mazzotta

Don't use SOAP. The acronym is a lie.

SOAP isn't simple and it isn't light.

If you plan on passing simple strings, JSON or YAML are probably more appropriate.

JSON or XML is fine, XML can be appropriate if your data types don't match up well with JSON (anything more than numbers and strings and JSON goes to shit fast). but I have to agree with others, SOAP is overkill.

Use whatever has good bindings in your language(s). Don't believe the purists on either side.

Truly. There's a choice between RPC and REST, yeah- but further than that, once you pick one, it doesn't really matter much which library you use, as long as it works OOB.

I'm more of an RPC person myself, and when you find a good client/server library combination, it is pretty much painless and quick to implement. I can see more the point of REST if you are doing very CRUDdy things or if you really want to have lots of different clients/platforms accessing your endpoints- but I suspect a lot of REST usage in the wild is really RPC done by hand (I do have faced many "REST" interfaces which were POST me some arguments, I'll give you some JSON results...).

And what's so complex about SOAP? Yeah there's tonnes of optional extras but jeez strip it to it's basics and a HTTP GET plus params returning basic XML is perfectly valid as a SOAP call.

Plus pretty much every major language has tools that can cook SOAP WSDL into objects so you may not even need to actually parse XML. I've used both and JSON doesn't appear to be any easier than SOAP. Certainly anything involving serializing dates in JSON is just hilariously terribad.

Plus pretty much every major language has tools that can cook SOAP WSDL into objects so you may not even need to actually parse XML

Yeah, and then it normally chokes if you're communicating between two different SOAP implementations[1], sometimes even on trivial stuff.

Yes encoding things like Dates is in fact quite complex to get right, but the expectations of *what is right* is in fact the problem. Once you know the desired behaviour it's literally a question of picking the already existing encoding that does what you desire (that part is a solved problem unless you're doing something insane) so at that stage it's best to do the encode/decode to strings yourself unless the default works properly (which is hopefully ISO 8601).

JSON has the distinct advantage over XML for this that the semantics of its composition operators (arrays and Objects) have a clearly defined expectation, that of what it would be if you just eval()[2] it in javascript. This makes it less useful in some ways (xml has a bit more flexibility, and complexity - especially wrt things like namespaces and entity references), but much simpler to get it working the same everywhere.

In XML you could think of attributes as giving you the Object semantics but it doesn't because it doesn't nest. Arrays are tricky because there is not even a meaningful convention on that in general xml, and soap hardly helps matters by saying things like: emphasis mine

Quote:

The representation of the value of an array is an ordered sequence of elements constituting the items of the array. Within an array value, element names are not significant for distinguishing accessors. Elements may have any name. In practice, elements will frequently be named so that their declaration in a schema suggests or determines their type.

It goes on with lots more MAYs and mushy language...

As far as I'm concerned if you're trying to make something *simple* then any standard about it should have almost no MAY, or SHOULD and constitute a bunch of MUST. That keeps things simple because it eliminates many sources of ambiguity, or places an implementation can misunderstand.

I'm not saying JSON is perfect, far from it. But the *critical* thing[4] that you want on an interchange protocol that assumes most of the encode/decode will be done for you by the library is that it is ubiquitous, and highly interoperable. It matters not if, say, YAML is better[3], YAML hasn't anything like the coverage of SOAP and JSON. Further, if you're going with something that is attempting to be an interop bridge between two different platforms/domains then it is in fact *desirable* that it be extremely limited in scope for (automatic) serialisation of object graphs, (or any constructed type for that matter) because the moment your abstraction leaks, and code out of your control maps something that doesn't have a clear native representation in the protocol to something not *quite* what matches in the rest of the world you will end up with pain.

1. I may be out of date on this one, maybe everyone got together and thrashed this out nicely. And I'll take my bacon sandwich with wings.2. obviously that is a VeryBadThingToDo(tm)3. I honestly have no idea either way, I'm just using it as an example4. after correctness, I'm assuming the stuff *works* in and of itself of course.

JSON has the distinct advantage over XML for this that the semantics of its composition operators (arrays and Objects) have a clearly defined expectation, that of what it would be if you just eval()[2] it in javascript. This makes it less useful in some ways (xml has a bit more flexibility, and complexity - especially wrt things like namespaces and entity references), but much simpler to get it working the same everywhere.

Again you keep mentioning things you don't have to use. Cisco's example easily translates into:

It is not about having to use it. It's about a fully conforming implementation having to handle it if another implementation might send it. The moment there's ambiguity that's another (potentially costly/complex) code path that might go wrong. Worse it might go wrong only in very specific circumstances (just one element in your array, no probs!)

If you're making your soap responses by hand then that's not so much of a problem but at that stage why the hell use soap, why not just do your own xml. It's like writing xml by hand, it's daft, there's tools to do it for you and do it write, use them.

At one stage the two most widely used soap implementations out there Axis on the java side, and Microsofts, would frequently fail to talk to each other for the most basic of object graphs (and their envelope handling was dire). This is ludicrous, your protocol is not 'simple', don't kid yourself with the name...

If you are not, in fact, arguing for the use of *SOAP* and instead are saying use really basic hand rolled XML then that's a *very* different matter, that's not what I'm arguing about certainly, I do that all the time. It is SOAP that is, IMHO, an abomination designed by people for whom the task was way beyond them.

Plus pretty much every major language has tools that can cook SOAP WSDL into objects so you may not even need to actually parse XML

Yeah, and then it normally chokes if you're communicating between two different SOAP implementations[1], sometimes even on trivial stuff.

I haven't seen issues with different SOAP implementations actually failing to parse each others SOAP messages. I have seen plenty of issues with other stuff, namely WSDL interopterability, and data type issues that will probably never be fixed.

I thought most of the envelope issues were worked out like a decade ago.

Quote:

Yes encoding things like Dates is in fact quite complex to get right, but the expectations of *what is right* is in fact the problem. Once you know the desired behaviour it's literally a question of picking the already existing encoding that does what you desire (that part is a solved problem unless you're doing something insane) so at that stage it's best to do the encode/decode to strings yourself unless the default works properly (which is hopefully ISO 8601).

Actually, even dates aren't that simple, and ISO 8601 is hardly trouble free. It can't do Julian dates, it can't represent GPS time properly, etc.

Quote:

In XML you could think of attributes as giving you the Object semantics but it doesn't because it doesn't nest.

You shouldn't, because that's not how XML works (perhaps unfortunately).

Quote:

I'm not saying JSON is perfect, far from it. But the *critical* thing[4] that you want on an interchange protocol that assumes most of the encode/decode will be done for you by the library is that it is ubiquitous, and highly interoperable.

That's only true if you need to interoperate with the world generally. While that's not a bad default assumption to make, it's hardly true all of the time.

Quote:

Further, if you're going with something that is attempting to be an interop bridge between two different platforms/domains then it is in fact *desirable* that it be extremely limited in scope for (automatic) serialisation of object graphs, (or any constructed type for that matter) because the moment your abstraction leaks, and code out of your control maps something that doesn't have a clear native representation in the protocol to something not *quite* what matches in the rest of the world you will end up with pain.

What's important is whatever code interprets the seralization format enforces the proper and necessary constraints. "Limited in scope" isn't the relevant concern, but it's probably easier to enforce constraints on limited scope formats.

Taken literally, your statement could be construed as an argument against things like image and video formats. Even if you regard them as necessary evils, they're still necessary.

Taken literally, your statement could be construed as an argument against things like image and video formats. Even if you regard them as necessary evils, they're still necessary.

Surely image and video formats aren't trying to be general object graph (or tree) serialisation mechanisms. They are targeted at a specific problem domain (often with many additional requirements based on the limitations of the external world). I am *very specifically* arguing against the complexities present within SOAP for the use case for which it is actually designed. It's horrid on so many levels.

It appears that implementations of it have stabilised far more since I had the displeasure of using it. So perhaps some of my ire is now misplaced.

If you're in a limited domain why bother with SOAP unless it's the easiest available library (I did acknowledge the utility of ubiquity, I'd just always try to pick something other than SOAP if an alternative option existed!)

Quote:

Actually, even dates aren't that simple, and ISO 8601 is hardly trouble free. It can't do Julian dates, it can't represent GPS time properly, etc.

I din't say dates were simple, I said picking what about them you want to represent was *hard*. If you know you need Julian/GPS time then you should know enough to know that a general purpose tool is never going to encode that information for you! I don't know of a formal standard for Julian date string representation wrt timezones, but so long as the epoch is clear and you define negative expressions as illegal formatting a Julian day (as a pair of integers of arbitrary length) seems pretty simple. If your platforms can't handle arbitrary precision then you are already having to restrict the actual domain to the representable one in some way (or have some non standard parsing code injected) so again, a one size fits many approach isn't going to cut it.

This is what I'm trying to (repeatedly) get at. Either have something domain specific, that then *can* safely do more automatically on the basis of that knowledge of the domain. Or go general and *incredibly* simplistic. Even strings are a potential mine field given encoding, but since it's text protocols we are discussing you can simply side step it and say whatever aspect of the protocol determines the format in use also prescribes everything contained in the payload, best decide if embedded nulls are legal or not though...

Sure, but the line between "general" and specific is so vague it's basically a cop-out to exclude them on that basis. The same practical issues occur, with many of the same consequences, regardless of whether I'm trying to solve a specific problem, or provide a generalized seralization format.

Plainly, I don't think any of the problems you're describing have anything to do with the fact that SOAP is a "generalized" serialization format. Their problem is that in many places they refused to make a adequate definition, stick with it, and encourage vendors to enforce those definitions.

If anything, their problem is precisely because they tried (in some sense) to do what you suggest: keep things simple. By not defining the array ADT, they're picked a simple definition that will accept virtually anything that's ever been called an array! Unfortunately, that sort of simplicity isn't really what's warranted.

Quote:

I din't say dates were simple, I said picking what about them you want to represent was *hard*.

Yes, but you went on to also state that unless you're doing something "insane", there's already going to be a string representation that works for you. That's just not true for dates, nor a whole mess of other things where this problem commonly occurs.

I don't think any of the problems you're describing have anything to do with the fact that SOAP is a "generalized" serialization format.

Neither do I, sorry if I gave that impression. it's because it s a bad "generalized" serialization format.

I think you and I differ on the simple part. I believe simple as it applies to a specification is to be *restrictive*. The more you restrict, the simpler translating it into an actual implementation because less extra effort is required of the author of at least one of the implementations (encoding or decoding). The (possible) increased cost to the author of the other implementation is offset by the reduction in the need for communication between them to avoid bugs (if that is even possible), I believe this will almost always lead to a net win, especially over the lifetime of the protocol (including if it leads to an orderly demise as it ceases to be appropriate).

You do limit yourself as a result. It all depends on the use case and trade off. XML allows you to use '' or "" for attributes. This is really useful if you want to hand write/edit some of it where you have significant chance of having one or the other character embedded in the value, but it comes at an implementation cost (including idiots who think they can hand parse it with their own implementation that miss this). I'm saying SOAP made many miserable decisions that don't even have a potential positive trade off, especially around arrays, I mean rectangular! seriously? How in hell is that simple?

there's already going to be a string representation that works for you. That's just not true for dates

I didn't mean that there would always be an off the shelf one. But the process of determining what it is about the date that matters should lead you to determine whether any existing ones are (or can be modified to be) or provide the basis of a spec to write your own. If nothing exists to deal with it then *someone* has to write it. This is the only solution, my way just gets you there better than "oh crap there's no time zone to represent this nasty thing we are using and some, but not all, of our historical forms are not parseable without guessing!"

But my take is that, apart from defaulting to to ISO 8061 with as much info as is available to the system at the point of converting the datetime value there is no reasonable default.

If your DateTime variables contain timezone info (including "UTC"/"not UTC" and some indication of precision[1]) great, include it. But the consuming system may have no concept of the timezone, nor the precision to represent it. This is inescapable, and so frankly I think it's sensible to consider, in a protocol intended to be utterly ubiquitous, saying:

Quote:

Dates and times are complex, you should do it yourself, we suggest via strings unless you want the slight speed improvement over numeric only formats. If you know you fit in the pretty simple case here's a link to how to encode/parse ISO 8061 in every language under the sun, and library implementations of the protocol API should provide these utility functions specific to your platform where ever possible, including a 'Safe' parsing one which, when presented with range/precision outside the bounds of what the default platform representation of DateTime can provide treats this as an error"

1. and if this matters it's likely not a decimal friendly precision so encoding as a human readable string is never going to fly anyway

I think you and I differ on the simple part. I believe simple as it applies to a specification is to be *restrictive*. The more you restrict, the simpler translating it into an actual implementation because less extra effort is required of the author of at least one of the implementations (encoding or decoding).

I really don't think any of those statements are actually true. JSON is not restrictive under any defintion I can think of. In my mind, JSON would be restrictive if it mandated all names within an object were in alphabetical order. It is however, simple, because it only has a few elements. I think when you say restrictive you really mean "well-formed" or "unambiguous". The few structures provided by JSON are certainly either of those things: there's only one way to represent an "object", and only one way to represent an "array".

Actual restrictions tend to add complexity simply because they have to be enforced.

Quote:

XML allows you to use '' or "" for attributes. This is really useful if you want to hand write/edit some of it where you have significant chance of having one or the other character embedded in the value, but it comes at an implementation cost (including idiots who think they can hand parse it with their own implementation that miss this).

If you can't get this write, you're not going to get the harder parts right, so I'm not sure it's a good argument.

But my take is that, apart from defaulting to to ISO 8061 with as much info as is available to the system at the point of converting the datetime value there is no reasonable default.

OK, fine, that's a bit better for most applications. The "much info as available" part is pretty important here. Just because you don't need subsecond precision doesn't mean I don't as well.

The problem with this advice is that while it's reasonable, lots of people don't realize date/time standards exist, don't bother to look them up, and then write out some hard to parse garbage on the wire.

Mostly though, I was giving you a hard time I'm especially bitter because I've worked with a lot of interchanges where if someone had bothered to:

You are right that I could change restrictive to "unambiguous" and it would probably be a more concrete and reliable statement.

Quote:

I'm not sure what you mean by precision here, especially given your footnote.

So, for example, .Net DateTime has a (uniform) granularity limit of 100 nano seconds, but most people using one will in fact be operating with values that actually only really have a useful precision of much less than this (likely in the milliseconds). You don't want to just throw anything away in the "I have no domain specific knowledge as to the actually precision" so a general solution has no recourse other than to provide the full (almost) 64 bits info encoded as a string.

It *may* choose to not include trailing zeros to make the general representation more compact, this is appealing (agai, for most people this will make it more readable and save space, for any one hand editing it will make it easier to get it right. However consuming code may take "X.0000" to mean something different than "X", or indeed "X.00", do you want people taking on information about the implied precision when reading it?

Even if the code does have an idea of the precision which should be used in the representation it is likely that the precision would not actually be a 'nice' round multiple of 10 (though people might choose to fudge it to the closest one to make their life easier) so indicating the precision 'inline' by including sufficient decimal zeros as required wouldn't even be sufficient anyway, some other technique is needed.

I can see why you're bitter. My world went to using GPS sourced leap second corrected time for *everything* long ago such that, on encountering a timestamp nowadays that wasn't externally sourced it must be from there unless it is explicitly called out as being anything else. Though the legacy of sql server's default (and horrid) DateTime type pre-DateTime2 do still haunt me a little.

However consuming code may take "X.0000" to mean something different than "X", or indeed "X.00", do you want people taking on information about the implied precision when reading it?

OK, I see what you're saying. In general, I'd consider systems that tried to indicate precision by significant figures broken (even for the simple case of "My datetime class doesn't carry anymore digits"). As you note, precision is rarely that simple.

Apart from it evalling() as a string object not a date object? How do you eval it as a date object?

You should do your own parsing, or use a library capable of doing it for you, which you trust to be correct for your current needs. Further you should be okay with it potentially blowing up in your face if you ever go cross platform in future (this is a cost benefit analysis only you can do).

I completely agree that when designing the initial spec for JSON they should have mandated that the *default* Date encoding should be ISO 8601. This would have lead to happy synchronicity in the common case.

If you're sending tons of dates about and this sort of thing really matters then maybe JSON isn't right for you if the libraries available to you don't handle this cleanly.

If you're in a monoculture then hand rolled XML still beats out SOAP unless your format is intended to change significantly over time and the effort of maintaining the parsing layer at either end begins to out weigh the effort of keeping things interoperable.