Sex, software, politics, and firearms. Life's simple pleasures…

Main menu

Post navigation

RFC: Action stamps

This is a request for comment on a convention for uniquely identifying user actions on the Internet. The motivating context was identifying commit changesets in version-control systems in a way independent of the specific VCS. It is anticipated that this format will have uses in recording many other similar sorts of transactions, including actions on web interfaces, where we want a simple cookie identifying “who did this and when”.

The proposed format is designated an “action stamp” and consists of an RFC3339 timestamp in Zulu time, followed by an exclamation point, followed by an RFC822 address (technically, an “addr-spec” as defined in 6.1).

Thus: 2011-10-25T15:11:09Z!fred@foonly.com

Advantages of this format include:

* Uniqueness: In distributed VCSes and elsewhere, email addresses are widely accepted as primary identity keys. The timestamp can be extended to subsecond precision to lower collision probability as far as desired.

* Ease of parsing: The format is textual, syntactically unambiguous, and easily mined from surrounding text. It is readily distinguishable from (a) a plain timestamp, (b) a filename, or (c) a standalone email address. Humans can read it easily.

* Good sorting properties: RFC3339 is a profile of ISO8601, which is designed so that lexical sort order coincides with timestamp order. Action stamps inherit this.

* Simplicity and compactness: Really, how could it get any simpler? Exactly one character of overhead.

Now I’ll address some possible points of contention with this proposal.

1. Why an exclamation point? Because nothing else uses it. Not since bang-path addresses went out of style, anyway.

2. Bletch. The T in the timestamp is ugly. Yes, it is. Live with it; whitespace there would break the action stamp into two tokens, which is undesirable for a cookie that we want to be readily machine-parsable. Also the T is a useful cue that the reader is looking at RFC3339/ISO8601.

3. Is the trailing ‘Z’ on the timestamp really necessary? Yes, it is. Local times are ambiguous without a location – you want something in the action stamp that makes it clear what timezone is intended and the easier way to do that is to mandate Zulu time (UTC/GMT) in the format. The Z is a reminder that it’s Zulu time.

4. Email addresses change, expire, and one person may have several. True, but the same would be true of any other identity token we could use in these. Email addresses have the desirable properties of being simple, universal, and implicitly describing a communication channel to the person. Experience with DVCSes, PGP keys, and ssh keys has taught us that the edge cases are manageable.

5. Er, there isn’t really much to this proposal, is there? That’s right. Brutal simplicity is part of the point.

Comments and criticism welcome. If reaction is positive I might try to turn this into an actual IETF RFC.

Google+

71 thoughts on “RFC: Action stamps”

Not sure about requiring the email address in that format.. Would prefer (svc)!(host)!(user) to prevent mindless spambots from harvesting.
Email!mit-oz.arpa!ralphw still allows you to represent the same idea, as well as newer monikers like facebook!facebook.com!computed

This argument indicts bare email addresses as well – but we don’t consider fred@foonly.com to be a bad format because spambots might harvest it. How and where action stamps should be visible is orthogonal to how they should be composed.

So you have (when)!(who) covered. As long as you accept the Single Universe theory and stick to the Gregorian Calendar, you’re ok. Well, except for the difference between Terrestrial Time and International Atomic Time, and not accounting for the effects of time dialation due to General Relativity.

As for turning it into an RFC, while you *can* do an RFC without also being a de facto standard, you’ll get a lot more traction in the IETF if you have a good-sized community using it before you write the RFC.

A couple of minor concerns I see with this (none are critical but some might merit addressing):

*Use of Zulu time*: Is it a server time or a local client time?

Server time is easier (server software usually has little problem knowing its timezone) but may be a problem is you need to display this in user friendly way later on – the client needs to be smart enough to convert and know user’s desired display TZ. Another problem with server time is when you need the timestamp of when the transaction was initiated/entered – client time has to be used for that due to possible lag between that and commit time.

True, but not really an issue for the range of applications I have in mind – identification of DVCS commits is the paradigm case here. For those, the main desideratum is simply that the ID be unique within the userbase of the application using it and implicitly point back to a meatspace person. Knowing when two such identities tie to the same meatspace person is occasionally interesting, but usually less important.

Issues about multiple addresses and addresses going invalid are going to come up with whatever identity cookie we use, so they’re not really a strike against using email addresses. DVCSes do after all successfully use email addresses for this purpose.

>Also as ralphw said, this format doesn’t allow for multi-source/multi-protocol ID. What if I prefer to use my IM handle, or my LinkedIn username, or OpenID, or MSN/MS passport ID, or Google identity?

I meant to reply to this earlier. My reply is: Don’t overcomplicate life! All those other kinds of ID are occasionally interesting, but an email address is the one form of Internet ID that is effectively universal.

Yes, technically. But if you’re looking for one of these in the normal way, parsing forward, your state machine will consume the Z! and not be confused. Not be confused even if the address part contains a ‘!’, actually.

The point of Zulu time is that regardless of the native time zone of the server *or* the client, the time is recorded relative to a single time zone, in this case, UTC. Thus, as I write this at 0959 EDT, regardless of the server time that ibiblio uses, the post will be recorded at 1359UTC.

To add to my above comment, the only identity protocol you have to define is RFC822, but if proprietary implementations want to use proprietary identity protocols, then they can without breaking quite as many things. Also, someone might well decide to use OpenID rather than email if it’s web-based (not likely for a DVCS now, but how long will it be before someone writes an IDE in a browser?

That’s why optional subsecond precision in the ISO8601 part is useful. The issuing application can check to see if it’s the same second as the last timestamp it issued for this user, and if so append subsecond precision to the new one.

I second the objection that email address is not good precisely because of harvesting by spambots.

First, someone could make a spambot that is designed to checkout/clone open source repositories.

Second, I could see this being useful beyond VCS’s. It would be neat if, say ATOM/RSS could be extended to build-in the comment feed. With unique user and universal timestamp identification, this could provide a universal export/import for blogs or blog-like data. Even without import/export, one can see a future whereby an RSS/Atom reader can also incorporate blog comments, and allow the user to add comments to the blog without having to leave the RSS reader. This is particularly nice on mobile devices; I absolutely hate to leave my Google Reader app for Android to go to some blog and make a comment.

It’s a shame OpenID is so cumbersome; I could see something like this being useful otherwise.

>First, someone could make a spambot that is designed to checkout/clone open source repositories.

Then we’re already screwed; they’re going to mine addresses out of the commit metadata, so using email addresses in action stamps does nothing to make things worse.

>Second, I could see this being useful beyond VCS’s.

Indeed. And the same reply to “ZOMG! What about address harvesters!” applies to all of these. There’s a fundamental opposition, which simply cannot be reconciled, between “it’s a publicly useful identity token” and “I want to hide it from harvesters”. You cannot have both of these; it’s mistaken to object to email addresses because they don’t square that circle.

For those complaining about potential for spamming: additional email addresses, especially for arbitrary strings, aren’t hard to get. Get an unreadable address that you use only for commits, whitelist the servers/projects you post commits to, and send everything else automatically to spam-bin.

Yes, technically. But if you’re looking for one of these in the normal way, parsing forward, your state machine will consume the Z! and not be confused. Not be confused even if the address part contains a ‘!’, actually.
</blockquote

Yes, but if a parser is looking only for email addresses (for example, maybe somebody is writing a script to make a list of all contributors to a project) then it might catch 'Z!fred@foonly.com' as a match.

Wouldn't it be better to have a character (or characters) that is not valid in email addresses as the separator?

So “2011-10-25T15:11:09Z!Muhammed.(I am the greatest) Ali @(the)Vegas.WBA” would be allowed, using RFC822’s example? I guess not since no internal whitespace would be preferable. So it’s RFC822’s addr-spec but with non-RFC822 lexing rules that also need documenting? Might be worth ditching RFC822 and stating what’s allowed another way?

>I looked at RFC822 before writing this up. The set of available characters under that criterion is small and they all have obvious problems.

What about an @ sign?

Z@me@somewhere.com is not a valid email address. Z@me is not a valid email address. The only part of it that is valid is ‘me@somewhere.com’, so this would not trip up an address parser. It also makes a certain kind of sense given the current trend for identifying users and people with a leading @.

Indeed. And the same reply to “ZOMG! What about address harvesters!” applies to all of these. There’s a fundamental opposition, which simply cannot be reconciled, between “it’s a publicly useful identity token” and “I want to hide it from harvesters”. You cannot have both of these; it’s mistaken to object to email addresses because they don’t square that circle.

No, my objection isn’t that I want something that is public and not public. It’s that the specific function of email address goes beyond just a unique identifier:

If a spambot has my email address, he can send me spam.

If a spambot has my OpenID or something, he can’t really do anything.

I’m not arguing just for myself — I’ve had aaron@traas.org for many years, and it’s publicly visible on my web site among other places. This is for more casual users that don’t want to bother setting up aggressive spam filters.

Do you really want to cite RFC822 and not RFC 2822 – Internet Message Format?
As 2822 says: This standard supersedes the one specified in Request For
Comments (RFC) 822, “Standard for the Format of ARPA Internet Text
Messages” [RFC822], updating it to reflect current practice and
incorporating incremental changes that were specified in other RFCs
[STD3].

System A says to System B, “Hey, X did this at time T”. System B has no way of checking if that’s true, and has to place all its trust in System A. This is somewhat concerning given that it is tied to a real world address, and that makes it easier to believe for humans.

For example, if you used this system for your comments, this comment would be tagged that it was entered by “rverghes@gmail.com”. But it was really entered by “someone using rverghes@gmail.com” who may or may not be the actual owner of that email address.

The lack of verification did lead to a lot of problems for email (fraud, spam, etc.), and I wonder if you aren’t setting yourself up for some of the same problems with this system.

I would contend that both user, and machine/terminal/point of presence identifiability is a necessary feature for audit trail (and to preserve commits for organizational/group commits/stamps where a unique RFC822 is not used, and a shared/group/organizational address is preferred); and that, although it would be a considerably longer token, that the format should be {RFC822addres}{802.3 address}{RFC3339 timestamp}, in whatever order, and using whatever notional separators decided on.

I don’t like using an 802.3 address either, but I can’t think of a better choice to provide pseudounique machine identifiability.

Of course many would be wary of exposing their MAC to the universe. A better choice may be to include a hash of the mac.

Honestly, I’d rather use a public key hash for the machine, and a public key hash for the user; but that would give up a lot of human readability, would require all users and machines to have a public key (although you could make it optional. Those users and machines without a public key would revert to email and mac, those with keys could use the hashes… but that would complicate things unnecessarily)…

Using hashes in general isn’t a bad thought though. Use a standard hash of the MAC in particular, to shorten it, but to preserve the sortability etc…

>I would contend that both user, and machine/terminal/point of presence identifiability is a necessary feature for audit trail

This, children, is called “mission creep”. It is one of the banes of every effort to write clear, simple specifications and technical standards. Learn to recognize it; then, lest you be stuck forever in the land of Get Nothing Done, learn to shoot it ruthlessly through the head.

Please consider defining a global syntax early, preferably using an existing approach such as the URI and URN syntaxes. In general my experience indicates that every time you create a purely local name representation it inevitably escapes into the wild in a variety of incompatible ways (as well as God killing a kitten). You don’t need to use the global syntax in a DVCS file, but it would be nice to have something standard in place up front for uses in other contexts.

This would probably also make any IETF process go more smoothly. They seem to like it when naming integrates into their preexisting schemes.

When you say “uniquely identifying user actions on the Internet”, then “2011-10-25T15:11:09Z!fred@foonly.com” is not enough – there could easily be multiple servers on which fred@foonly.com has made an action in the same second, and they don’t necessarily know of each other so they don’t know to use the subsecond precision. This makes me think that your requirement is softer, maybe it would help if you tried to formalize the use of these identifiers a bit more.

In particular, to what extent do you expect different systems to want to exchange these identifiers and know only from them what’s being talked about? And given that your proposal is not extensible, i.e., the two systems always exchange a time and a user, why put them together in a single string?

Further, maybe it would be useful for a system unsuspecting of these identifiers to be able to find out what’s being talked about?

If you’re designing identifiers for things (or actions, in this case) on the internet, you really, really should use the URI syntax. In the case here, a urn:action-stamp:time!email would be preferable to time!email, I suspect. 17 characters more is not a significant overhead, considering the other metadata of the event that is being identified.

Moving to something else than email, or some other time format (non-gregorian/interstellar etc.) could unambiguously happen through something like urn:action-stamp2:.

>In particular, to what extent do you expect different systems to want to exchange these identifiers and know only from them what’s being talked about?

Well, my motivating cases are (a) identification of commits in a DVCS and (b) identification of comments in a bug tracker. In both cases there’s an implied scope within which these stamps refer to a unique object, with no need to care that the user might at the same time be leaving cookies in other DVCSes or other trackers.

>why put them together in a single string?

To facilitate parsing by tools like repository or issue-tracker browsers.

looking at your preceding post (to which I got later), I suspect you don’t want to “identify user actions on the Internet” but identify user actions in a single system. I don’t mean centralized and synchronous, but I mean a system where it’s reasonable to expect a single user to have up to one action in one second. And the identifiers are only considered meaningful by themselves in this one system; elsewhere they’d need to be qualified — you’re not proposing syntax for such qualification.

The tighter scope would remove the need for a URI syntax (for which I argued above), and for coordination (actual or in the thought process) with “uses in recording many other similar sorts of transactions, including actions on web interfaces, where we want a simple cookie identifying ‘who did this and when’.”

“Well, my motivating cases are (a) identification of commits in a DVCS and (b) identification of comments in a bug tracker…”

Yes but there’s a crying need in the brains of a lot of people who would want to track everything that happens on the ‘net. Anything you invent will not stay in the tidy world of open-source software generation. They’ll grab it and stick it anywhere. Be careful what you wish for.

If you insist on this, then I’d suggest that you have every open-source project issue a private-public key pair to all its bonafide members, so that commits and such can be signed digitally. No email addresses to harvest, and I think more security.

Well, my motivating cases are (a) identification of commits in a DVCS and (b) identification of comments in a bug tracker. In both cases there’s an implied scope within which these stamps refer to a unique object, with no need to care that the uer might at the same time be leaving cookies in other DVCSes or other trackers.

It does not seem to cover the case where a project has more than one repository and you may want to refer to commits in one repository from another or refer to two commits in different repositories for the same bug ticket.

One of the projects I work on has (at present) 6 repositories all of which are nominally covered by one issue tracker, so this case holds some interest to me.

I’ve noticed that the action stamp addresses who and when, but not what. I’m guessing that this is intentional in order to keep “what” being more or less freeform, but it might be useful to extend the action stamp to include a standardized “what”. So you could have:

2011-10-25T15:11:09Z!fred@foonly.com!action

For a DVCS, for example, action might include a commit, merge, branch, etc. You could even have URL-style paths or something like:

“2011-10-25T15:11:09Z!fred@foonly.com” is not enough – there could easily be multiple servers on which fred@foonly.com has made an action in the same second

If fred@foonly.com makes more than one action in the same second, then all actions after the first must include a fractional seconds part of sufficient precision to disambiguate the actions.

Action the First of the Second in question: 2011-10-25T15:11:09Z!fred@foonly.com
Action the Second of the Second in question: 2011-10-25T15:11:09.53Z!fred@foonly.com
Because . and digits sort before alphas, problem solved.

Furthermore, rather than just limiting the second part to an email address, let us allow for other URLs:

My understanding is that the action tag [time!email] is a declaration by the author that she performed this action at this time. It is not a cryptographically signed audit trail, and it is also not meant to identify the computer where the action was performed.

So most of the problems thrown up do not apply. It is the responsibility of the author to make sure her action tags are unique and refer to the correct author. So I see no reason to include an URI for the computer/service.

>In practical views, it makes little or no difference, but go with the more modern interpretation.

I don’t know about the email situation in particular, but in general, when a specification is changed in the real world, some people stick with the older one. So older specifications, when still workable, tend to be more inclusive.

>Then you can just use your email address. But if Fred sees a reason to do so, why not let him?

I think date/time and email are all that is required, possibly with the up-converted (destination repository) commit-id tagged on the end?
[john@example.com, 2011-10-25 12:27, 0AFG1348-964BE211]

Of course, it’s supposed to be able to be applied to other actions, such as web actions. Meaning that the commit-id isn’t really all that useful by itself — it’s needs a qualifier to tell some action parser that doesn’t deal with DVCS commits to not parse it.

I like the idea, but I agree with the comments by various others that this needs:

a) some thought given to extensibility/future-friendliness. In particular, when something comes up (and it will eventually) that makes this format undesirable for whatever reason, how will it be replaced? What is the “upgrade path”. And how can people who want to use this for something you haven’t thought of yet adapt it to their use case? The best standards are those which can be the basis for things their authors never imagined.

b) some way of specifying what sort of action this represents. This could be optional, but if you don’t say something about it now, people will add it in a jillion incompatible ways later to fit their own use cases, leading to all kinds of incompatible implementations.

URI/URN compatibility would be nice from the perspective that there’s a lot of software out there already that deals with them, so they could be easily modified to account for this particular format.

Adding a machine identifier is at best unnecessary, and at worst a really bad idea. This is the Internet, people. Who _you_ are doesn’t depend on how you get here.

All three of your concerns (extensibility, future friendliness, and flexibility for alternative uses) are best served by making a format as simple as possible while still doing what it needs. If you can think of a simpler format for Eric’s purpose than the one he has suggested, then show us.

What I if have a set of transactions that are (logically) applied atomically, as a unit, but for some reason aren’t submitted as one? What timestamps do we use? Or are we required to roll them up into one to report in this format?

That doesn’t really happen with a DVCS, but might with some other possible cases, I think?

1. We don’t know enough to make an action stamp url that meets a reasonable set of use-cases, because we lack the use-cases. Some of the are already catered by the tag URI (blog posts) anyway.
2. It’s overkill for this purpose
3. It’s not as good as datetime/email for this purpose because it isn’t as readable.

The concept of a universally consistent action stamp is the antithesis of distributed networking, because an action is never just one thing, but rather a composition of links to potentially dynamic data.

Understood. I suppose my error is there are cases where the commit hash would not be found in the current copy, but an action stamp could.

Or you may also (or only) be making the point that the action stamp is always an unambiguously key, even if the corresponding commit can’t be located in the current copy.

However, consider my logic was a cost vs. benefit analysis, given those (afaics) rare obscure cases where an action stamp provides any operational advantage (given afaik the main point of referencing is the user needs to be able to locate the code that was changed), and the disadvantages (some noted in this thread) of conflating an author+timestamp with what is (“was” in case of reposurgeon) semantically a reference to set of code changes.

The above stands on its own, and it appears to be a subjective design choice.

Tangentially, I am pondering that if DVCS would “propagate” the morphed commit hashes, they would survive all the cases where an action stamp would locate a commit in the current copy.

Apologies if I have butchered some key understanding of DVCS. I am not an expert user of them.