Table of Contents

RxNorm Extension - an OHDSI resource to represent international drugs

The purpose of this document is to define the process, rules and resulting structure of incorporating international drug vocabularies into an overall RxNorm-like system, called RxNorm Extension.

Drug vocabularies contain drug products and their components. Only about a third of these products are identical in the drug markets of individual countries or jurisdictions of a drug marketing approval agency. Even if the active ingredients are the same, they can differ in their Drug Strengths, Drug Forms, Brand Names, package sizes and manufacturers or distributors.

Therefore, these vocabularies need to be incorporated into the existing Drug Domain in such a way that all existing drugs and their components are correctly mapped, and the missing ones added as new concepts. This includes a life cycle for each Concept, allowing to generation, deprecation and update over time.

The processing script that follows these instructions can be found here.

General structure

The Drug Domain should be organized in a hierarchical structure described Drug Domain. This structure is based on RxNorm, which also forms the core of the content. RxNorm comprehensively describes the drug market in the United States. It may not contain products sold in the markets of other countries. It also does not contain US medical food or food supplement products.

This structure contains at a minimum (from bottom to top):

Concept Class

Composed of

Branded Drug

Ingredients, their strength, form, brand name

Clinical Drug

Ingredient, their strength, form

Branded Drug Form

Ingredient, form, brand name

Clinical Drug Form

Ingredient, form

Branded Drug Component

Ingredient, strength, brand name

Clinical Drug Component

Ingredient, strength

Dose Form

Form

Brand Name

Brand name

Ingredient

Ingredient

Drug Class

Drug class

It may optionally contain

Concept Class

Composed of

Marketed Product

Ingredients, their strength, form, supplier (brand name and box size are optional)

Quantified Branded Drug Box

Ingredients, their strength, form, brand name, size and box size

Quantified Clinical Drug Box

Ingredients, their strength, form, size and box size

Branded Drug Box

Ingredients, their strength, form, brand name and box size

Clinical Drug Box

Ingredients, their strength, form and box size

Quantified Branded Drug

Ingredients, their strength, form, brand name, size

Quantified Clinical Drug

Ingredients, their strength, form, size

Branded Pack

Branded Drugs, their number (box size is optional)

Clinical Pack

Clinical Drugs, their number (box size is optional)

Supplier

Supplier

Currently not supported in the Standardized Vocabularies:

Concept Class

Composed of

Note

Precise Ingredient

Ingredient

Ingredient used instead

Multiple Ingredient

Ingredients

Single ingredients used instead

Dose Form Group

Dose Form

Explicit Dose Forms used instead

The Concepts are connected through hierarchical and lateral relationships.

Combined target structure

To incorporate a new set of drug information, a structure should be achieved that contains every Concept only once and preserves the RxNorm structure, no matter which vocabulary the additional Concept is coming from. In a way, it should create a mixed RxNorm/drug vocabularies union.

In order to achieve this, any two equivalent Concepts have to be matched through its components: Ingredients to Ingredients, Forms to Forms, Supplier to Supplier, etc. Concepts are defined as matching if all components match. For example, a Clinical Drug matches another Clinical Drug if it contains the same Ingredients at the same strength and the same Dose Form.

Rules for adding Concepts

To add a Concept for which there is an existing equivalent:

It should be recorded as a non-standard (source) Concept.

It should be mapped through a “Maps to” relationship to its standard equivalent.

All other relationships are optional and for QA and convenience. The standard Concept will take its place as the official representation.

To add a Concept that does not have an equivalent:

It should be recorded as a standard Concept (standard_concept = 'S'), with the exception of Brand Names, which should be recorded as a non-standard Concepts.

It should have hierarchical and lateral relationships in the same manner as RxNorm Concepts do.

It should form relationship to relevant drug classes. The relationship_id of these relationship do not have to follow the RxNorm standard, as it differs for every drug class. Classes are most often defined for Ingredients, but some non-Ingredients may directly designate a Concept Class and “jump over” the Ingredient or even Drug Forms or Drug Components. They will be inferred automatically.

Units used in the strength determination are not added. They must be mapped to a Standard UCUM Concept instead. If a unit is not present in the UCUM vocabulary it has to be added.

Challenges and problems

To implement a tool to create and maintain the above structure, a number of issues need to be taken care of:

Excipients: There is no general agreement of what is an active agreement and what is an excipient. Therefore, some of the ingredients need to be declared as “semiactive”, such as gelatine. Generally, if ingredient can be biologically active, but is not present in preparation for its pharmaceutical properties, it should be considered excipient. Excipients should be excluded from a list of a drug's ingredients.

Forms: These are not used the same way across drug vocabularies. For example, RxNorm has a Form “Cream”, but also “Ophthalmic cream”, “Vaginal cream”, “Rectal cream”, “Oral cream” and “Cutaneous cream”, making this Form ambiguous. Instead of a one-to-one mapping, a one to many mapping with an order of precedence is required to establish matching equivalence between Forms.

Strength: RxNorm normalizes weight units to “mg” and volume units to mL, but other vocabularies might not. There might be units like “µg”, “gram-%” or “volume-%”. Special unit conversion tables are needed instead of simple unit mappings. This approach becomes infeasible if units are used where the conversion is dependent on the molecule, like “mol” or “equivalent”.

Ingredient forms: Ingredients might have ambiguous chemical forms, which RxNorm calls “Ingredient” and “Precise Ingredient” (e.g. a salt of the active compound). They have to be mapped to the right Standard RxNorm Ingredient. If there is no RxNorm Ingredient to map to and the drug vocabulary to be added contains several ambiguous forms of the same Ingredient, one of them has to be declared Standard. In rare cases there might be several Standard duplicates of the same Ingredient. In those cases mappings from source vocabularies must be made with precedence. Another problem might occur due to the strength is given for a precise ingredient, rather than a standard ingredient. An ingredient that is presented in the way of aqueous/spirit extract should be considered as the same one.

Implementation

1. Registering a new drug vocabulary

If a drug vocabulary gets added for the first time, it needs to get listed in the VOCABULARY table.

2. Creation of input tables

The new vocabulary should be prepared in the following tables:

DRUG_CONCEPT_STAGE

Field

Required

Type

Description

concept_name

Yes

string(255)

An unambiguous, meaningful and descriptive name for the Concept in English language

domain_id

Yes

string(20)

A foreign key to the DOMAIN table. The standard content is 'Drug', but for non-drugs it could be 'Device' of 'Observation'

vocabulary_id

Yes

string(20)

A foreign key to the VOCABULARY table. The value of this field should be identical for all records, indicating the new vocabulary being added.

concept_class_id

Yes

string(20)

One of the above listed RxNorm Concept Classes

concept_code

Yes

string(50)

The code in the source vocabulary. If the source vocabulary does not contain a code, e.g. for ingredients or dose forms, they will be created automatically (see below OMOP created codes)

source_concept_class_id

No

string(20)

Concept class that is given by the source vocabulary

possible_excipient

No

string(1)

A flag only relevant to ingredients, indicating whether or not they are not active ingredients and could be omitted from an ingredient list.

valid_start_date

No

date

Date when the Concept became valid. This may or may not coincide with the date the product went to market.

valid_end_date

No

date

Date when the Concept became invalid. Market withdrawal does not mean a Concept is invalid.

invalid_reason

No

string(1)

Flag indicating wether the Concept is active (today's date between valid_start and valid_end_date), or upgraded ('U') or deprecated ('D').

This table is expected to contain as a minimum the comprehensive list of Concept Classes:

Drug Product (Branded Drug, Clinical Drug, Marketed Product etc.)

Form

Brand Name

Ingredient

Unit

Supplier

Device (for source conccepts falling outside of Drug cathegory)

It may contain Branded or Clinical Drug Forms or Components, but if not they will be derived (see below). Note that units should not have their own concept in the DRUG_CONCEPT_STAGE table. Instead, they should be used as verbatim. If the precise Concept Class is not known, it can be included as “Drug Product” and the correct Concept Class will be assigned during the incorporation automatically based on the availability of Strength, Dose Form, Brand Name, Supplier, Quantity and Box Size information.

Concepts that belong to the source vocabulary, but do not belong to Drug domain by OMOP rules., should be classified as 'Device'. Typically, these belong to different substance groups:

Blood products for transfusion, blood plasma, autologous and non-autologous transplants of any kind

Cosmetics, sunscreens, non-medicated shampoos and soaps, etc.

Surgical materials like bone cements

Topical/external disinfectants

Animal drugs can be handled as Drugs or Devices, depending on what their role in patient data can be expected to be.
Note that only concepts from Drug domain can have attributes.

RELATIONSHIP_TO_CONCEPT
This table should contain the mapping between source codes and Standard Concepts for Ingredients, Brand Names, Dose Forms, Suppliers and Units.It also may contain mapping from source drugs to Standard Concepts for related ATC classes. All other relationships will be ignored.

Field

Required

Type

Description

concept_code_1

Yes

string(255)

The source code

concept_id_2

Yes

integer

The existing target Concept

precedence

No

integer

For multiple concept_code_1/concept_id_2 combination the order of precedence in which they should be considered for equivalence testing. The mapping with the highest prevalence among the drugs will be used for writing a record to the CONCEPT_RELATIONSHIP table. A missing precedence will be interpreted as precedence 1. Every precedence value should be unique per concept_code_1

conversion_factor

No

float

The factor used to convert the source code to the target Concept. This is usually defined for units

This table should contain all mappings from the new to existing Concepts and their precedence. It should also contain links between Ingredients and Drug Classification Concepts, particularly for the new Ingredients. Since ATC is used as an international standard, relationships to ATC are very desirable.

Units should be mapped to Standard Concept Units. Weight units should be converted to milligram, volume units should be mapped to milliliter, molar - to millimole with the right conversion factor. The source_code field should contain the verbatim string of the unit.

INTERNAL_RELATIONSHIP_STAGE

Field

Required

Type

Description

concept_code_1

Yes

string(255)

One source code of the pair

concept_code_2

Yes

string(255)

The other source code of the pair

This table should contain relationships for each Drug Concept: To the Ingredients (always), the Dose Form (if appropriate),the Supplier (if appropriate) and the Brand Name (if appropriate). All other relationships will be derived and ignored if they exist in the table. The relationships need not be symmetrical, only the one initiating from the Drug Concept is required.

If Drug Product concept does not have an Ingredient attribute, it will not have any standard mapping target after processing. Supplier attribute will not be considered for concepts without DS_STAGE or PC_STAGE entry since Marketed Product concepts can not exist without dosage information.

DS_STAGE

Field

Required

Type

Description

drug_concept_code

Yes

string(255)

The source code of the Drug or Drug Component, either Branded or Clinical

ingredient_concept_code

Yes

string(255)

The source code for one of the Ingredients

amount_value

No

float

The numeric value for absolute content (usually solid formulations)

amount_unit

No

string(255)

The verbatim unit of the absolute content (solids)

numerator_value

No

float

The numerator value for a concentration (usually liquid formulations)

numerator_unit

No

string(255)

The verbatim numerator unit of a concentration (liquids)

denominator_value

No

float

The denominator value for a concentration (usally liquid formulations). It should contain a number for Quantified products, and null for everything else.

denominator_unit

No

string(255)

The verbatim denominator unit of a concentration (liquids)

box_size

No

integer

The amount of units per box

This table contains the dose of each ingredient in each drug, as well as the box_size. For drugs which have no strength information or have only for some of the containing ingredients, the ds_stage record might be omitted. '0' in ds_stage is only allowed for inert drugs.
Drug ingredients should match those in internal_relationship_stage.
If ingredients are mapped to the same one in relationship_to_concept their dosages should be summed up.
A drug should not contain ingredients in solid (amount) and liquid (numerator/denominator) form. It might be either source data bias or drug pack.

PС_STAGE

Field

Required

Type

Description

pack_concept_code

Yes

string(255)

The source code of the Pack, either Branded or Clinical

drug_concept_code

Yes

string(255)

The component drug product in the Pack

amount

No

integer

The number of units of the drug product in drug_concept_code

box_size

No

integer

The number of packs if the pack is boxed (several packs in a larger container

This table contains the composition of a Clinical or Branded Pack: The Clinical or Branded Drug and the number in each pack. If it is a boxed Pack, it will also contain the box size, since Packs have no records in DS_STAGE like the other drug products.

4. Concept Codes

Source systems my designate codes for different levels or Concept Classes. For all Concepts that are inferred or do not come with a code has to be assigned. The codes are constructed of the word “OMOP” and a running number. The running number should be unique across all vocabularies. That means, each time a new vocabulary is added or refreshed, the next Concept Code should be the one of the last (without the 'OMOP' string) +1.

5. Quality of input tables

The input tables need to have the following quality requirements:

Rule

If rule is violated

Each record should be unique in all tables.

The processing will fail.

Concept Codes should be unique and should not repeat for different products.

Only the highest Concept Code is retained, and the other ones are treated as non-standard Concepts and mapped to the highest.

Each product should have links (records in INTERNAL_RELATIONSHIP_STAGE) to all their Ingredients.

The product will be treated as if it had only the linked Ingredients. If no Ingredients are linked, the product will be processed into the CONCEPT_STAGE table, but as an orphan without any related Concept Classes.

Ingredients should be linked to their Standard Counterparts.

These Ingredients are treated as new Standard Ingredients.

Dose Forms should be linked to their Standard Counterparts.

The processing will fail.

Brand Names should be linked to their Valid Counterparts.

These Brand Names will be treated as new Concepts.

All % in source dosages should be converted into mg/ml (mg) unless it is a gas.

A drug would not be mapped to it's Standard Conept

Marketed Product (a drug that has relationship to it's supplier in INTERNAL_RELATIONSHIP_STAGE) should have both dosage and Dose Form

The product won't be processed into CONCEPT_STAGE table.

Boxed drug should have both dosage and Dose Form.

The product won't be processed into CONCEPT_STAGE table.

Product ingredients should match in INTERNAL_RELATIONSHIP_STAGE and DS_STAGE

The processing will fail.

When mapping Ingredients, Dose Forms or other attributes are mapped to multiple targets precedence values must be present and unique for each source concept

For quality assurance of input tablesyou can use drug_stage_tables_QA.sql script from project's github

5. Processing

If all 5 tables DRUG_CONCEPT_STAGE, INTERNAL_RELATIONSHIP_STAGE, RELATIONSHIP_TO_CONCEPT, PC_STAGE and DS_STAGE are available, the new terminology can be built:

Inferring of missing Concept Classes

All missing Concept Classes are inferred from the existing ones, from bottom upwards.

Concept Class

Defined by

Clinical Drug Component

Ingredient-strength. Note that Clinical Components are always single-Ingredient. This is in contrast to all other Concept Classes

Branded Drug Component

Ingredient-strength(s), Brand Name

Clinical Drug Form

Ingredient(s), Dose Form

Branded Drug Form

Ingredient(s), Dose Form, Brand Name

Clinical Drug

Ingredient-strength(s), Dose Form

Branded Drug

Ingredient-strength(s), Dose Form, Brand Name

Quantified Clinical Drug

Ingredient-strength(s), Dose Form, Quantity

Quantified Branded Drug

Ingredient-strength(s), Dose Form, Brand Name, Quantity

Clinical Drug Box

Ingredient-strength(s), Dose Form, Box size

Branded Drug Box

Ingredient-strength(s), Dose Form, Brand Name, Box size

Quantified Clinical Box

Ingredient-strength(s), Dose Form, Quantity, Box size

Quantified Branded Box

Ingredient-strength(s), Dose Form, Brand Name, Quantity, Box size

Even though all drug classes are inferred, only those will be written to the CONCEPT table that have no mapping to an equivalent Standard Concept.

Matching
This step is necessary to add inferred equivalence relationships between new and existing Standard Concepts. All matches are created. Links in the RELATIONSHIP_TO_CONCEPT table are ignored.

The matching considers all components (Ingredient-strength(s), Dose Form, Brand Name, Quantity, Box size) in the order of precedence and optionally for records where possible_excipient is set to 1. A 10% mismatch between strength values is still considered a match. Matching beteween normal and Quantified products compares the Numerator Value of the non-quantied to the Numerator divided by the Denominator Value.

Result
All records in the DRUG_CONCEPT_STAGE table are written to the CONCEPT_STAGE table as follows. The standard_concept field is set to 'S' for all products and Ingredients and Brand Names that have no match to existing Standard Concepts. Dose Forms are always written as non-standard.

All records linking drug products to their Ingredients, Dose Forms, Suppliers and Brand Names are written to the CONCEPT_RELATIONSHIP_STAGE table. Note that this can be a one or two step connection:

* Ingredients, Dose Forms, Suppliers and Brand Names that have no equivalent to RxNorm (and are therefore Standard Concepts): These are converted from the INTERNAL_RELATIONSHIP_STAGE table.
* Ingredients, Dose Forms and Brand Names that have an RxNorm equivalent (at least one) are not written into the CONCEPT_RELATIONSHIP_STAGE table, but the RxNorm equivalent instead, using the records from the RELATIONSHIP_TO_CONCEPT table with the relationship_id = 'Has standard ing', 'Has standard brand' and 'Has standard form' .

Relationships between Drug Products or derivatives (Drug Forms and Components) are connected through CONCEPT_RELATIONSHIP_STAGE records with the the following relationship_id values:

Concept Class 1

Concept Class 2

Relationship ID

Brand Name

Branded Drug

Brand name of

Brand Name

Branded Drug Comp

Brand name of

Brand Name

Branded Drug Form

Brand name of

Brand Name

Ingredient

Brand name of

Brand Name

Quant Branded Drug

Brand name of

Brand Name

Marketed Product

Brand name of

Branded Drug

Brand Name

RxNorm has ing

Branded Drug

Branded Drug Comp

Consists of

Branded Drug

Branded Drug Form

RxNorm is a

Branded Drug

Branded Pack

Contained in

Branded Drug

Clinical Drug

Tradename of

Branded Drug

Clinical Drug Comp

Consists of

Branded Drug

Dose Form

RxNorm has dose form

Branded Drug

Quant Branded Drug

Has quantified form

Branded Drug

Quant Clinical Drug

Tradename of

Branded Drug

Marketed Product

Has marketed form

Branded Drug Comp

Brand Name

RxNorm has ing

Branded Drug Comp

Branded Drug

Constitutes

Branded Drug Comp

Clinical Drug Comp

Tradename of

Branded Drug Comp

Quant Branded Drug

Constitutes

Branded Drug Form

Brand Name

RxNorm has ing

Branded Drug Form

Branded Drug

RxNorm inverse is a

Branded Drug Form

Clinical Drug Form

Tradename of

Branded Drug Form

Dose Form

RxNorm has dose form

Branded Drug Form

Quant Branded Drug

RxNorm inverse is a

Branded Pack

Branded Drug

Contains

Branded Pack

Clinical Drug

Contains

Branded Pack

Clinical Pack

Tradename of

Branded Pack

Dose Form

RxNorm has dose form

Branded Pack

Quant Branded Drug

Contains

Branded Pack

Quant Clinical Drug

Contains

Branded Pack

Marketed Product

Has marketed form

Clinical Drug

Branded Drug

Has tradename

Clinical Drug

Branded Pack

Contained in

Clinical Drug

Clinical Drug Comp

Consists of

Clinical Drug

Clinical Drug Form

RxNorm is a

Clinical Drug

Clinical Pack

Contained in

Clinical Drug

Dose Form

RxNorm has dose form

Clinical Drug

Quant Branded Drug

Has tradename

Clinical Drug

Quant Clinical Drug

Has quantified form

Clinical Drug

Marketed Product

Has marketed form

Clinical Drug

Marketed Product

Contained in

Clinical Drug Comp

Branded Drug

Constitutes

Clinical Drug Comp

Branded Drug Comp

Has tradename

Clinical Drug Comp

Clinical Drug

Constitutes

Clinical Drug Comp

Ingredient

Has precise ing

Clinical Drug Comp

Ingredient

RxNorm has ing

Clinical Drug Comp

Quant Branded Drug

Constitutes

Clinical Drug Comp

Quant Clinical Drug

Constitutes

Clinical Drug Form

Branded Drug Form

Has tradename

Clinical Drug Form

Clinical Drug

RxNorm inverse is a

Clinical Drug Form

Dose Form

RxNorm has dose form

Clinical Drug Form

Ingredient

RxNorm has ing

Clinical Drug Form

Quant Clinical Drug

RxNorm inverse is a

Clinical Pack

Branded Pack

Has tradename

Clinical Pack

Clinical Drug

Contains

Clinical Pack

Dose Form

RxNorm has dose form

Clinical Pack

Quant Clinical Drug

Contains

Clinical Pack

Marketed Product

Has quantified form

Dose Form

Branded Drug

RxNorm dose form of

Dose Form

Branded Drug Form

RxNorm dose form of

Dose Form

Branded Pack

RxNorm dose form of

Dose Form

Clinical Drug

RxNorm dose form of

Dose Form

Clinical Drug Form

RxNorm dose form of

Dose Form

Clinical Pack

RxNorm dose form of

Dose Form

Quant Branded Drug

RxNorm dose form of

Dose Form

Quant Clinical Drug

RxNorm dose form of

Dose Form

Marketed Product

RxNorm dose form of

Ingredient

Brand Name

Has brand name

Ingredient

Clinical Drug Comp

RxNorm ing of

Ingredient

Clinical Drug Form

RxNorm ing of

Marketed Product

Brand Name

Has brand name

Marketed Product

Branded Drug

Marketed form of

Marketed Product

Branded Pack

Marketed form of

Marketed Product

Clinical Drug

Marketed form of

Marketed Product

Clinical Drug

Contains

Marketed Product

Clinical Pack

Marketed form of

Marketed Product

Dose Form

RxNorm has dose form

Marketed Product

Quant Branded Drug

Marketed form of

Marketed Product

Quant Clinical Drug

Contains

Marketed Product

Quant Clinical Drug

Marketed form of

Marketed Product

Supplier

Has supplier

Quant Branded Drug

Brand Name

RxNorm has ing

Quant Branded Drug

Branded Drug

Quantified form of

Quant Branded Drug

Branded Drug Comp

Consists of

Quant Branded Drug

Branded Drug Form

RxNorm is a

Quant Branded Drug

Branded Pack

Contained in

Quant Branded Drug

Clinical Drug

Tradename of

Quant Branded Drug

Clinical Drug Comp

Consists of

Quant Branded Drug

Dose Form

RxNorm has dose form

Quant Branded Drug

Quant Clinical Drug

Tradename of

Quant Branded Drug

Marketed Product

Has marketed form

Quant Clinical Drug

Branded Drug

Has tradename

Quant Clinical Drug

Branded Pack

Contained in

Quant Clinical Drug

Clinical Drug

Quantified form of

Quant Clinical Drug

Clinical Drug Comp

Consists of

Quant Clinical Drug

Clinical Drug Form

RxNorm is a

Quant Clinical Drug

Clinical Pack

Contained in

Quant Clinical Drug

Dose Form

RxNorm has dose form

Quant Clinical Drug

Quant Branded Drug

Has tradename

Quant Clinical Drug

Marketed Product

Has marketed form

Quant Clinical Drug

Marketed Product

Contained in

Supplier

Marketed Product

Supplier of

From the pack_content, build the name of the Branded and Clinical Packs.

Relationships between any Drug Concept Class and a Classfication Concept Class is recorded through the “Drug has drug class” and “Drug class of drug” generic relationship pair.

Finally, a new DRUG_STRENGTH_STAGE table should be created from DS_STAGE and the the unit conversions in RELATIONSHIP_TO_CONCEPT, so the content can be added to the DRUG_STRENGTH table. This includes only Drug Concepts that have no mapping to an existing Standard Concept and are now Standard themselves. The ingredient_concept_code field is either the RxNorm equivalent, or from the newly added vocabulary, if unavailable.