Details

When an issue is open, the "Fix Version/s" field conveys a target, not necessarily a commitment. When an issue is closed, the "Fix Version/s" field conveys the version that the issue was fixed in.

QA Validation Status:

Validated by QA

Description

Add JdbcConnectionUuid connect string parameter.

=== From an email to mondrian dev list ===

It's important to me that connection factories (the means by which Mondrian gets JDBC connections to the underlying databases... which include instances of javax.sql.DataSource, or (URL, username) credentials) can be represented as strings. It was a mistake to allow javax.sql.DataSource objects to be passed into Mondrian when creating a connection via the legacy API. olap4j made it more difficult to pass in non-Strings, and that made life painful for some people. I thought it would be possible to just register DataSources in JNDI and pass in the JDNI name, but as Marc pointed out, Pentaho has to run in containers (such as Tomcat) with read-only JNDI environments.

Mondrian already has a DataSourceResolver SPI. This is important, and this works. The one thing it doesn't do is tell Mondrian whether two data sources point to the same database.

Consider setting up a distributed cache. It's important that all of the participating instances of Mondrian know that they are looking at the same database instance. If they don't know it's the same database, they can't safely share their cache. If we used an SPI to determine equality, it's difficult to ensure that the same SPI is being used on all machines. When I'm answering a support call, it's easy to forget to ask whether someone has overridden the default implementation of the SPI.

So, how to tell whether two connection factories are the same, without introducing an SPI? We introduce a new connect string parameter, JdbcConnectionUuid. (This complements existing parameters Jdbc, JdbcUser, JdbcPassword and DataSource.) If two mondrian connections have the same JdbcConnectionUuid, Mondrian will take the client at its word that the back-end databases are identical. It will not consider the other parameters in determining equality.

Determining whether two schemas are equal, and therefore candidates for sharing a cache, comes down to two parts: Are the connection factories equal (using JdbcConnectionUuid etc. as described above)? And are the contents of the XML schema files equal (using UseContentChecksum, Catalog, CatalogContent, DynamicSchemaProcessor, as today)? Both of these questions are answered by looking at a string.

JdbcConnectionUuid is optional in the connection parameters. If not specified, Mondrian would use the same connection factory matching rules as today. (Internally, Mondrian will generate a Uuid so that all connections have one.)

As its name suggests, it's a good idea if JdbcConnectionUuid is a UUID. But it doesn't need to be. It could be an MD5 hash. It could be anything the user likes. They should just make damn sure that it is unique.

Activity

Have some doubts on this:
Should the current way of getting the schema-dependent part of the key be kept?
When JdbcConnectionUuid isn't provided, should the behavior be exactly the same as before?

To better frame the question, this is
how RolapSchema.Pool.get returns a Schema:

1. Always a new one if UseSchemaPool=false
2. From the mapMd5ToSchema using the schema's md5 to fetch if UseContentChecksum=true
3. From mapUrlToSchema using the schema key otherwise.

and how the RolapSchema key is currently generated:

1. Full schema xml if either CatalogContent or DynamicSchemaProcessor are provided.
2. <catalogUrl>.external#<dataSourceInstanceId> if dataSource is provided
3. <catalogUrl>.<connectionKey>.<user>.<dataSourceString> otherwise

For the purpose of cache sharing, having the JdbcConnectionUuid enables us to merge the last two cases, but keeping how the schema part of the key is created still leaves two incompatible key domains for schemas that can be equal and using the same database.
Using the full xml has the added problem of bloating log files.

So only if always using both JdbcConnectionUuid and UseContentChecksum=true would we have a good chance of using the same cache for the same database/schema. Is this right?

Tiago Gomes Ferreira
added a comment - 18/Sep/12 10:27 AM Have some doubts on this:
Should the current way of getting the schema-dependent part of the key be kept?
When JdbcConnectionUuid isn't provided, should the behavior be exactly the same as before?
To better frame the question, this is
how RolapSchema.Pool.get returns a Schema:
1. Always a new one if UseSchemaPool=false
2. From the mapMd5ToSchema using the schema's md5 to fetch if UseContentChecksum=true
3. From mapUrlToSchema using the schema key otherwise.
and how the RolapSchema key is currently generated:
1. Full schema xml if either CatalogContent or DynamicSchemaProcessor are provided.
2. <catalogUrl>.external#<dataSourceInstanceId> if dataSource is provided
3. <catalogUrl>.<connectionKey>.<user>.<dataSourceString> otherwise
For the purpose of cache sharing, having the JdbcConnectionUuid enables us to merge the last two cases, but keeping how the schema part of the key is created still leaves two incompatible key domains for schemas that can be equal and using the same database.
Using the full xml has the added problem of bloating log files.
So only if always using both JdbcConnectionUuid and UseContentChecksum=true would we have a good chance of using the same cache for the same database/schema. Is this right?

Internally it should use (connection key, schema key) in all 3 cases. Sounds like that changes behavior in #1 – which is a good thing, since they could theoretically have provided the same catalog content on different databases.

Even if they don't provide a JdbcConnectionUuid, you can create one internally (e.g. using an MD5 hash of user name and JDBC connect string, if that's what they provided). That way all connections are identified using the Uuid.

Julian Hyde
added a comment - 18/Sep/12 11:19 AM Internally it should use (connection key, schema key) in all 3 cases. Sounds like that changes behavior in #1 – which is a good thing, since they could theoretically have provided the same catalog content on different databases.
Even if they don't provide a JdbcConnectionUuid, you can create one internally (e.g. using an MD5 hash of user name and JDBC connect string, if that's what they provided). That way all connections are identified using the Uuid.
Does that answer your questions?

1. Ensure the latest mondrian.jar is being used
2. Edit the datasources.xml from:
<DataSourceInfo>Provider=mondrian;DataSource=SampleData</DataSourceInfo>
to
<DataSourceInfo>Provider=mondrian;DataSource=SampleData;UseContentChecksum=true;JdbcConnectionUuid=SampleDataUUID</DataSourceInfo>
3. Using the xmla4js plugin, issue a query on the sampledata catalog
4. Check the mondrian.log file and search for a line containing

Pedro Vale
added a comment - 04/Oct/12 6:33 AM To validate from the BA Server:
1. Ensure the latest mondrian.jar is being used
2. Edit the datasources.xml from:
<DataSourceInfo>Provider=mondrian;DataSource=SampleData</DataSourceInfo>
to
<DataSourceInfo>Provider=mondrian;DataSource=SampleData;UseContentChecksum=true;JdbcConnectionUuid=SampleDataUUID</DataSourceInfo>
3. Using the xmla4js plugin, issue a query on the sampledata catalog
4. Check the mondrian.log file and search for a line containing
get: catalog=solution:steel-wheels/analysis/SampleData.mondrian.xml connectionKey=null, jdbcUser=<irrelevant>, dataSourceStr=<irrelevant>, dataSource=<irrelevant>, jdbcConnectionUuid=SampleDataUUID, useSchemaPool=true, useContentChecksum=true, ma
p-size=<irrelevant>, md5-map-size=<irrelevant>
The relevant part is that both the jdbcConnectionUuid and the useContentChecsum info match the info added to datasources.xml