A Source is defined as something that refers to a Table in a Design context. It is crucial for Designs that refers to the same Table multiple times, because it serves as unique identifier for the scope of the query.

A Table relates to a table in a database. It defines a direct link to a table in the database. In addition, it also defines and contains columns and aggregates so you can select which you want to show.

A Table can be identified by the file naming schema: <name>.table.m5o and should be stored in the /model directory.

An Aggregate relates to a calculable column, via count, sum or other (i.e., aggregate definitions). These are limited to predefined methods with no custom SQL as well since custom SQL will be handled through transforms with dbt.

A Report is a saved state of selecting and analyzing a Design. It contains a subset of fields that you select from multiple tables and is ultimately the selected analysis. It can also be generated from raw SQL.

A Report can be identified by the file naming schema: <name>.report.m5o and should be stored in the /model directory.

Meltano uses Singer Taps and Targets to Extract the data from various data sources and load them in raw format, i.e. as close as possible to their original format, to the Data Warehouse. Subsequently, the raw data is transformed to generate the dataset used for analysis and dashboard generation.

Meltano can be used in any ELT architecture by using the right taps and targets for the job. The strategies supported can range from dumping the source data in a data lake to keeping all historical versions for each record to storing well formatted, clean data in the target data store.

When considering which taps and targets Meltano will maintain, some assumptions are followed concerning how source data is stored in the target data store:

All extracted data is stored in the same Target Database, e.g., we use a Database named RAW for storing all extracted data to Snowflake.

For each tap's data source, a schema is created for storing all the data that is extracted through a tap. E.g., The RAW.SALESFORCE schema is used for data extracted from Salesforce, and the RAW.ZENDESK schema is used for data extracted from Zendesk.

Every stream generated from a tap will result in a table with the same name. That table will be created in the schema from that tap based on the information sent in the SCHEMA message.

Meltano supports schema updates for when a schema of an entity changes during an extraction. This is enacted when Meltano receives more than one SCHEMA message for a specific stream in the same extract load run.

When a SCHEMA message for a stream is received, our Targets check whether there is already a table for the entity defined by the stream.

If the schema for the tap does not exist, it is created.

If the table for the stream does not exist, it is created.

If a table does exist, our Targets create a diff to check if new attributes must be added to the table or already defined attributes should have their data type updated. Based on that diff, the Targets make the appropriate schema changes.

If an unsupported type update is requested (e.g., float --> int), then an exception is raised.

Columns are never dropped. Only UPDATE existing columns or ADD new columns.

Data is upserted when an entity has at least one primary key (key_properties not empty). If there is already a row with the same composite key (combination of key_properties) then the new record updates the existing one.

No key_properties must be defined for a target to work on append-only mode. In that case, the target tables will store historical information with entries for the same key differentiated by their __loaded_at timestamp.

If a timestamp_column attribute is not defined in the SCHEMA sent to the target for a specific stream, it is added explicitly. Each RECORD has the timestamp of when the target receives it as a value. As an example, target-snowflake sets the name of that attribute to __loaded_at when an explicit name is not provided in the target's configuration file.

When a target is set to work on append-only mode (i.e. no primary keys defined for the streams), the timestamp_column's value can be used to get the most recent information for each record.

For targets loading data to Relational Data Stores (e.g., Postgres, Snowflake, etc.), we unnest nested JSON data structures and follow a [object_name]__[property_name] approach similar to what Stitch platform also does.

At the moment we do not deconstruct nested arrays. Arrays are stored as JSON or STRING data types (depending on the support provided by the target Data Store) with the relevant JSON representation stored as is. e.g. "['banana','apple']". It can then be extracted and used in the Transform Step.

The Singer spec doesn't define how to handle concurrency at the ELT level.

Making the streams concurrent themselves is pretty straightforward, but making sure the state handles concurrent updates is the real challenge, and also source specific.
Some sources supports pagination endpoints or a cursor-like API, but not all APIs are made equal.

Also depending on the data source, you might have some limit on how concurrent you can be, for example Salesforce limits to 25 concurrent request, but Netsuite allows only one query at a time per account.

Every time meltano elt ... runs, Meltano will keep track of the job and its success state in a log.

This log is stored the Meltano system database.

Note: Out of the box, Meltano uses a SQLite database named meltano.db as its system database.
However this is customizable using the MELTANO_BACKEND=sqlite|postgresql environment variable, or
using the (-B|--backend) switch at invocation.

Meltano UI consist of a Flask API and a Vue.js front-end application, which are both included in the meltano package. In other words, the Flask API is not exposed at a project level and any customizations needed must be done at the package level.

To run Meltano in production, we recommend using gunicorn for setting up your HTTP Server.