The proposed Deterministic ID concept has been mis-characterized, and rejected on this basis from the beginning. However, the discourse (before yesterday's attempt to shut down active discussions amongst other voices in the community) has surfaced a much
bigger issue from my perspective.

On the one hand the initial attempts to reach a reasonable compromise on Identity specification by simply allowing the object creator to choose the method of RFC 4122 compliant UUID was rejected on the basis of "one way of doing things". But these same
voices then go on state that something as critical as the criteria for what constitutes the difference between a new object and a revision to an existing object is totally subjective and left up to each individual object creator.

As an implementor (aka, programmer), I have wrestled with IDs in the current MITRE Python libraries. After trying many things, my own (drastic!) solution was to monkey-patch the libraries so that I had full control over IDs.

I think the core question is this: "Do we compare IDs to determine object equivalence, or must we compare multiple attributes?" It's classic computer science.

The reason I say that it doesn’t align with our versioning proposal is that for that approach we specifically said that we wouldn't define that any particular changes to an object would necessarily result in a new object. The determination of
what constituted a “material change” that would result in a new object was entirely left up to the producer, mainly because of the issue with identifying these “immutable” fields that Bret points out. Any deterministic ID approach that we standardize on would
explicitly define which changes would result in a new version, taking that “material change” decision out of the hands of the producers and putting it into the hands of us as specification authors.

Re: "Deterministic IDs will break the proposed versioning approach, which has had very good agreement both on slack and at the F2F. I say this because versioning requires that we give producers the ability to determine what constitutes a material
change and, when they determine that, let them create a new ID for the construct. Using a deterministic ID precludes this…ID changes would be defined by changes to the fields included in your hash."

Again this is a mis-characterization of my proposal. The proposed approach is to use the Immutable properties of an Object. Any changes to these immutable properties are by definition a new object, not a revision.

Why not have a required identifier based on UUID4 (my preference) and an if vendors want a deterministic id then they can include an optional attribute for that. For systems that want to use that optional identifier it would provide support for.

If we decide to do deterministic IDs, than I think we will need to redo versioning, for the reasons that John points out, and others that we do not yet fully realize.. And if we are going to back and redo all of these things, than we should probably
go back and redo timestamps (there seems to be more people that dislike our timestamps than those that dislike UUIDv4 based IDs). This would keep parity and fairness in the community.

If we do deterministic IDs, then we will need to determine which fields in each TLO will be used, on a TLO by TLO basis. We will also need to add a field to each TLO that includes namespace and other things that are used for the deterministic
IDs.

We define an approach to doing deterministic IDs in the standard, and require that people use it. I believe this is what Eric is suggesting.

We define that we will not do deterministic IDs, use UUID4, and those wanting content hashes/correlation IDs can do it in a custom field. This is what the current text states.

We leave it open to implementors to decide whether they do deterministic or non-deterministic IDs. This is Pat’s suggestion.

Given those choices, in order of preference I would say: 2, 1, 3. My reasoning:

We’re a standards body and should whenever possible identify standard ways of doing things. Leaving things like this open will lead to divergent practices and usages of the ID field that will cause incompatibilities down the line. We should pick
one way and do it, and let those who want to do custom things use custom fields to do so. So, if we do deterministic IDs, we should go all out and actually do it. Having some people doing it one way, other people doing it a different way, and most people doing
UUID4 is IMO not a good argument for the standard. We lose out on a lot of the value of deterministic IDs if we make it optional instead of required.

Deterministic IDs will break the proposed versioning approach, which has had very good agreement both on slack and at the F2F. I say this because versioning requires that we give producers the ability to determine what constitutes a material change
and, when they determine that, let them create a new ID for the construct. Using a deterministic ID precludes this…ID changes would be defined by changes to the fields included in your hash.

The use case for deterministic IDs is definitely iffier for me than in CybOX (I.e., Pat’s IP address example is for CybOX, not STIX). What would it look like for indicator, observation, sighting, actor, campaign, etc.? Those are evolving concepts
and defining a set of fields that you use in every case to match exactly and do correlation might not be the right approach.

@All: I retract my assertion that there has been no discourse on this and look forward to engaging with any in the community interested in further exploring this topic. I'll rename the subject so those not interested can discriminate.

Eric,

Many thanks for engaging in the discourse.

Re: "What I would also observe is the sentiment that "we can have an opaque UUID and a deterministic UUID and if people want to or eventually discover the usefulness of a deterministic UUID in the future" is flawed. Unless people who do not think
they need a deterministic UUID generator happen to use the deterministic UUID generation algorithm, old STIX databases will not be compatible with a new, post-deterministic UUID regime."

I'd like to better understand where you see potential issues. . From a practical DB perspective the only operational differences I can see are:

(1) Deterministic IDs are guaranteed to have no collisions. Depending on the implementation of the "random" generator, this is not always the case for Version 4 UUIDs (as described in the RFC).

(2) Any "random" UUIDs generated by Version 4 are easily discernible in the UUID.

(3) One can makes inferences from a Version 5 UUID in a "Community of Trust" where Source Namespaces used to "sign" and objects are shared. Otherwise one can infer nothing else from a Version 4 or Version 5 UUID.

Given the primary structural reasons for Object IDs (as unique references) what potential STIX incompatibilities do you foresee?

What Patrick M/P Patrick [I could not resist] point out is this deterministic UUID generation algorithm is (1) useful, (2) known useful because it is being used, and (3) is easy to implement, because it is being used.

What I would also observe is the sentiment that "we can have an opaque UUID and a deterministic UUID and if people want to or eventually discover the usefulness of a deterministic UUID in the future" is flawed. Unless people who do not think theyneed a
deterministic UUID generator happen to use the deterministic UUID generation algorithm, old STIX databases will not be compatible with a new, post-deterministic UUID regime.

Now the true academic answer is to not have a UUID at all, because people will get it wrong and any time there is redundant data in a message there will be the opportunity to screw it up.

Now the practitioner in me says we should have a deterministic UUID because experience is showing it is useful in practice andit simplifies STIX implementations by haveone and only one way of doing it. If you don’t
think you need a deterministic UUID generator, think of it as you aregetting a UUID generator for free - you won’t have to think about it.

The requirements for fractional seconds were self-evident to anyone collecting, correlating, and analyzing Cyber Threat Intelligence in November 2013: For example 10GbE (10×10^9 or 10 billion
bits per second) was widely deployed across distributed enterprise backbones/server farms. Packet Capture and Netflow Collection were integral to our Security Operations and CSIRT Investigations.

Community members provided real world use cases and challenges where (1) "My" products operate at these speeds, (2) "My" use cases require sub-millisecond Timestamps, etc. Paraphrasing
the arguments made against the requested change included (1) "My" product doesn't operate at these speeds, (2) "My" use cases don't require sub-millisecond Timestamps, (3) I'm using epoch time in my application, so I don't see the need for fractional seconds,
(4) We're not going to pay attention to 'fringe' cases

"This is the wrong way to go and here is why":

A small faction should not be able to summarily reject a stated community open requirement on the basis of "we don't have or understand this requirement".

re: "We have debated this issue almost as long as we have debated timestamps."

No, we have not even engaged in an open, inclusive discourse on this proposal. I've responded before that the core assumptions you are making to the list as the basis for rejecting the concept
are not accurate. Paul Patrick has gotten it right.

The proposed concept is that the Deterministic UUID is generated using RFC 4122 Version 5 hash of the (1) Namespace, (2) Object Type, and (3) Immutable properties of
the Object. So for an Object describing the IP Address "1.2.3.4", the Immutable property of the Object = "1.2.3.4". The "1.2.3.4" value wouldNEVER change for this Object.
Any changes to an objects immutable properties would be a new object, not a new version.

I'm not going to rehash the rest...

Bottom Line:

Current language

"An identifier uniquely identifies a STIX top-level object. Identifiers MUST follow the form [object-type]--[UUIDv4], where [object-type] is the exact value from the type field of the object
being identified or referenced and [uuid] is an RFC 4122 compliant Version 4 UUID. The uuid field MUST be generated according to the algorithm(s) defined in RFC 4122, Section 4.4 (Version 4 UUID)."

"This is the wrong way to go and here is why":

This language unnecessarily and arbitrarily constrains the method for the generation of the RFC 4122 variant UUID
to Version 4 (random/pseudo random).

We "get" that the proposed adoption of Deterministic vs. Random IDs has been rejected. However, there's no reason to prevent those of us who have valid Use Cases for Deterministic IDs as part of our internal
implementation details. They will be indistinguishable from any other UUID except for the most significant 4 bits of the time stamp will be a "5" instead of a "4".

Proposed Language

"An identifier uniquely identifies a STIX top-level object. Identifiers MUST follow the form [object-type]--[UUID], where [object-type] is the exact value from the type field of the object
being identified or referenced and [uuid] is an RFC 4122 compliant UUID. "

I thought one of the primary goals of using deterministic IDs was de-duplication? The only way you can do de-duplication is if you are hashing all of the relevant fields, of which versioning would be part of.

We have debated this issue almost as long as we have debated timestamps. So, as with timestamps, there is a difference between "I just do not like it" and "this is the wrong way to go and here is why". If either the timestamps or IDs that we
have proposed are wrong, or are going to make life miserable, please speak up and explain why.

I don’t mean to cause a stir or re-open topics, but on today’s call during the discussion about identifiers there appeared to be some confusion about how Patrick had proposed having deterministic identifiers. If I heard the comments
correctly, there seemed to be the opinion that you had to hash the entire object, which would include the revision information, in order to generate the identifier. But as I read Pat’s proposal, that isn’t what he proposed if you look back below at his email.

In Pat’s proposal, he proposed using a tuple of organization namespace (aka domain name), type of object, and one or more value(s) as input to the uuid5 algorithm. The organization namespace and type of object was used to significantly
the possibility of collisions that could occur with the key value(s) (e.g., IP address value) when the tuple contents were hashed using the SHA1 per the uuid5 algorithm. Because the entire object, including revision information, isn’t used to determine the
hash there isn’t the issue of breaking relationships.

From an implementors point of view, I've been doing a very similar approach for a while without any issues. It avoids the need to maintain a list of identifiers that we’ve used before and thus simplifies the effort to support.
In addition, the generated identifiers are valid UUIDs with the same number of bytes, etc. Per RFC 4122, there is no way, beyond making sure that the timestamp portion of the UUID is in the future, to validate if a UUID is valid much less enforce that the
UUID had been generated using the uuid4 algorithm or not. So attempting to enforce that the uuid portion of identifiers are generated by the uuid4 algorithm is pointless.

So I submit that if someone wants to use the uuid3 or uuid5 algorithm, they should not be prohibited to do so but our text should clearly state that it is RECOMMENDED that the uuid4 algorithm be used.

I’m happy to tweak the text in the document accordingly, but wanted to raise what I believe was confusion on this topic to the group before doing any text changes.

Are you planning then to build your deterministic IDs without the use of the versioning information? Because, if you do use versioning information for your IDs, to help with all the things you outline below (which you would need
to actually find real duplicates), your deterministic IDs will change with ever revision of the object. And thus all of your relationships will break.

Thanks,

Bret

Bret Jordan CISSPDirector of Security Architecture and Standards | Office of the CTOBlue Coat SystemsPGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE7415 0050"Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

re:."This is the start of the 2 day review process prior to the initial motion, so please provide
any comments now so we can adjust. We can discuss comments on the Tuesday working call and, assuming we’re able to resolve them, make the motions to approve these sections on Wednesday morning."

I have added notional changes to Identifier to add previously requested support for Deterministic UUID Identifiers.
The current language arbitrarily constrains the RFC 1422 UUID identifier to the UUIDv4 pseudo-random generation method.

Format and language are notional and will need corrections by the editors.

Basis:

If certain Vendor Factions, Communities of Trust,etc. want to use psuedo-random UUIDv4 generation

The method of generation of any UUID can of course be determined from the generated UUID.

-- Win-Win for both camps and with no cost that I can discern

4.5.​ IdentifierType Name: identifierStatus: ReviewMVP: Yes

An identifier uniquely identifies a STIX top-level object. Identifiers MUST follow the form [object-type]--[UUIDv4], where [object-type] is the exact value from the type field of the object
being identified or referenced and [uuid] is an RFC 4122 compliant Version 4 UUID. The uuid field MUST be generated according to the algorithm(s) defined in RFC 4122, Section 4.4 (Version 4 UUID).

4.5.1 Version 4 UUID. The uuid field MAY be generated according to the algorithm(s) defined in RFC 4122, Section 4.4 (Version 4 UUID).

As we make progress on these specifications, it’s important to make sure that we have consensus on specification text and can document that consensus. To that end, the STIX co-chairs and editors would like to start a development
cadence where we move content in the specifications through informal consensus, to review, to motions to approve the text.

In order to balance this desire with the desire to avoid hundreds of votes, we’d like to try the following process:

Content is developed by the SC and achieves some consensus, potentially in a mini-group (Status = Concept to Development)

We send a notice out to the cti-stix list saying that text is ready for review and formal acceptance (Status = Review)

After waiting 2 business days without hearing comments, we make a motion on the cti-stix list to accept the text as-is.

We’ll wait 5 business days to hear objections. If there are no objections, we’ll consider it accepted without a formal vote via unanimous consent (this will be made clear in the motions). If there are objections, depending
on the type of objection and the exact circumstances we’ll either move back to the development/review phase or hold a ballot to approve the text via a majority vote. Once the motion is passed either via unanimous consent or via a ballot we’ll move it to the
draft status (Status = Draft)

Draft status doesn’t mean that the text cannot change. We can make editorial changes through out the process without going back to earlier phases, but if we make any material changes we would move the concept back to the “Development”
phase and start again. This is also not a replacement for the formal approval of the complete specification text when STIX 2.0 is done, it’s just a way to ensure that we have consensus at a more granular level as we move forward.

We hope this process gives you time to both have input prior to the official review phase and see what we’re moving to the review phase while at the same time avoiding votes on every single topic.

For this first round, please review the following sections in the STIX 2.0 specification:

This is the start of the 2 day review process prior to the initial motion, so please provide any comments now so we can adjust. We can discuss comments on the Tuesday working call and, assuming we’re able to resolve them, make
the motions to approve these sections on Wednesday morning.

Thanks everyone! I realize this may sound overly formal to some of you but in practice I’d expect that it just means you have more defined things to be reviewing at any given time.