Pre-reading
The purpose of this new thread is to focus the discussion, garner support, and tweak the outlined draft proposal for the next iteration of XBMC database support.

I'll keep this first post as the WIP until the majority are content. This first post may be better facilitated by a wiki page but I'll leave that decision to the XBMC team.

Proposals are inline, in BLUE. These proposals are designed to become part of the official documentation as appropriate.

Ideally it would be great to get input from as many as possible to ensure we get it right first time round and that it is scalable from the word go.

Background
The current implementation utilises a separate database for each of Video, Music and Program information. Each database schema shares some similarities in structure and naming convention but do not conform to a strict standard.

A visual representation of each schema is provided in the attachments.

The database backend implementation is based on SQLite3 (SQLite version 3). SQLite[1] is an embedded SQL database engine, in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.

Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Moving Forward
The aim of this document is to standardise the design and structure of the database to facilitate the future scaling of XBMC in both a standalone configuration and when networked.

The current SQLite implementation stores the database information locally only. Each user profile has there own databases stored in userdata://Database

This works well for single XBMC setups, however, it does not scale well with multiple XBMC setups that utilise a single media repository. In the latter case each XBMC instance requires a dedicated database to store and retrieve information. Currently, to have the same information across all instances requires the databases to be manually replicated.

A solution to this problem is for XBMC to communicate to a variety of database backends; locally and remotely.

Database Servers
The additional database backends to be supported initially include the existing SQLite3 as well as MySQL[2] and PostgreSQL[3].

The inclusion of MySQL and PostgreSQL will allow for remote database connections that facilitate information storage and retrieval from a common database to multiple XBMC instances.

Data types
Due to the current SQLite implementation the data types are already suitably constrained to be well supported across multiple database backends. In SQLite the data type of a value is associated with the value itself, not with the container in which it is stored. Fortunately The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statement that work on statically typed databases should would the same way in SQLite.

The data types referenced in the existing schema are:

INTEGER

TEXT

BOOLEAN

DOUBLE

FLOAT

Proposed: The data types DOUBLE and FLOAT are to be consolidated to the use of FLOAT only.

Schema Naming
The current naming of columns appears to follow a standard for the majority but a number of fields are clearly not conforming.

Proposed: All table names are to utilise CamelCase with the initial identifier representing the data type of that column with the exception of Primary Key and Foreign Key identifiers which are always to be of type INTEGER. The identifiers for the data types are:

Map table names do not conform to any particular standard, for example in the Music database the table “exgenrealbum” maps the “genre” table to the “album” table, whilst the table “extragenresong” maps the “genre” table to the “song” table. In the Video database we have an “actorlinkmovie” table which maps the “actor” table to the “movie” table.

Proposed: Mapping tables are to be named “map_primarytable_secondarytable”. The previous tables would then be named as “map_genre_album”, “map_genre_song” and “map_actor_movie” respectively.

Indexs are to be prefixed with “idx_”.

Views are to be prefixed with “vw_”.

Database Joining
The current XBMC database implementation separates the Music, Video and Program information into separate databases. This does not easily support the ability to link a music video (from the Video database) to an artist (from the Music database).

Proposed: The Video, Music, and possible the Program, databases to be merged into one Media database. The existing conflicts of table names are to be addressed with a “music_”, “video_” or “program_” prefix were appropriate. The version table will be consolidated to a single table.

Database SettingsProposed: The advancedsettings.xml will need to be modified to specify the database backend you wish to communicate with. The default will be the existing SQLite3 implementation.

Database Abstraction Library
The current database abstraction library utilised dbiplus for which their only exists and SQLite3 driver.

Proposed: OpenDBX[4] will be used as the database abstraction library. OpenDBX supports the initially desired backends SQLite, MySQL and PostgreSQL. OpenDBX supports a number of other backends which may be supported at a later stage.

Missing Tables
In the current implementation there appears to be a number of tables missing due to the existence of mapping tables that link to undefined Primary Keys. These tables are director and writer.

First of all, great initiative & appreciate research/analysis done so far!

jmarshall & cptspiff are probably in the best position to comment on the above. The proposal seems sound to me...

To go one step further, and this can be addressed at a later stage I guess:
How do we manage local (i.e. sources not shared between clients) and network sources in case the db is used by multiple xbmc clients?
I.e. if we use a central database server for multiple xbmc clients, how do we ensure differentiation between 'local' and 'network' sources. Moreover, do the clients need consistent mapping of network drives?

DonJ Wrote:How do we manage local (i.e. sources not shared between clients) and network sources in case the db is used by multiple xbmc clients?
I.e. if we use a central database server for multiple xbmc clients, how do we ensure differentiation between 'local' and 'network' sources.

The table of interest here is the "path" table, both in the Music and Video databases. In this table the column strPath stores the path (eg. smb://media/movies/). We can add two fields here, namely a boolean to indicate its a local path (eg. bLocalPath) and a unique profile ID (eg. idProfile) or XBMC ID (eg. idXBMC). Where local media paths are found only ID's matching the ID doing the requesting will be processed.

This will need a new table that can list all XBMC instances in an XBMC network with persistent unique ID's.

If you are referring to the individual mapping between multiple XBMC instances to the network drive then I would think yes. However, right now I can't think of a case where consistent mapping would not be the norm.

Hi,
Looks great, your proposal solves many of the problems with the database naming when moving to a consolidated database.

As I mentioned in the other thread (http://forum.xbmc.org/showthread.php?tid=37449), there is already an abstraction layer in XBMC, so I am not sure that OpenDBX is really necessary. The existing code has hooks to pass the ip address and user id/password to the database wrapper layer (currently sqlite).

My plans are/were to expand this current layer to include mysql, however, conversion of my system from mythtv to XBMC has taken much longer than expected, so I haven't had time to work further on it. So if you beat me to it, that would be great.

Maybe this has already been discussed: Right now the database tells XBMC which media files are present. The file system also contains this info, and has the correct info. The database right now is leading for XBMC, requiring frequent library updates and cleaning, which from a user perspective I find slightly awkward.

Maybe a better way would be to do a quick directory listing on startup (or on network events, or ...), and then use the database to retrieve all the associated meta data, for the existing files only

Benefits:
- starting XBMC without a network connection will only mean you don't see those files which are on the network
- when removing a file from your HDD you will not need to clean your library, the meta data is still present in the database but not retrieved
- a "clean library" is only needed once in a while to purge the db from meta data for files no longer present in the file system.

Please don't hit me if this has been discussed without me being aware of it

eriksmith200 Wrote:Maybe this has already been discussed: Right now the database tells XBMC which media files are present. The file system also contains this info, and has the correct info. The database right now is leading for XBMC, requiring frequent library updates and cleaning, which from a user perspective I find slightly awkward.

Maybe a better way would be to do a quick directory listing on startup (or on network events, or ...), and then use the database to retrieve all the associated meta data, for the existing files only

Benefits:
- starting XBMC without a network connection will only mean you don't see those files which are on the network
- when removing a file from your HDD you will not need to clean your library, the meta data is still present in the database but not retrieved
- a "clean library" is only needed once in a while to purge the db from meta data for files no longer present in the file system.

Please don't hit me if this has been discussed without me being aware of it

Isn't simpler to just purge the library on deletion of the file?(when the network is reachable) And in case the net is not reachable we could just make a simple journal ( to just list what is to be deleted) to apply as soon as the network become reachable.

First of: Great analisys, you wheren't kidding when you spoke of the engineer in you

firnsy Wrote:Proposed: The Video, Music, and possible the Program, databases to be merged into one Media database. The existing conflicts of table names are to be addressed with a “music_”, “video_” or “program_” prefix were appropriate. The version table will be consolidated to a single table.

I haven't looked at the schemas yet, so i can be wrong, but this duplication is really necessary(if it is duplication)? can't this be solved whit an intertable (if necessary)?

wstewart Wrote:As I mentioned in the other thread (http://forum.xbmc.org/showthread.php?tid=37449), there is already an abstraction layer in XBMC, so I am not sure that OpenDBX is really necessary. The existing code has hooks to pass the ip address and user id/password to the database wrapper layer (currently sqlite).

You are right there are two ways this could be approached, either add the backend support to the existing dbiplus implementation, or use a new one which in this case is OpenDBX.

I'm still leaning towards OpenDBX because of its large backend support and that it is still in active development. If we find the cost of replacing dbiplus is greater than adding MySQL and PostgreSQL support then of course this will be reconsidered.

I see the final implementation will be laid down in phases to allow a smooth-ish transition from what we have now in multiple local databases for media, to a single local database for media, to a single database that can be stored locally or remotely. So the library replacement is actually in the final phase (albeit the larger one).

wstewart Wrote:My plans are/were to expand this current layer to include mysql, however, conversion of my system from mythtv to XBMC has taken much longer than expected, so I haven't had time to work further on it. So if you beat me to it, that would be great.

Hurry up then ... This won't happen overnight and the more people working on it the more robust it will be

Short form for those that can't be arsed reading it all: How paths are defined per client is the key piece of information that needs to be determined.

Long form for those with time on their hands:

1. Consolidation is really just for the benefit of not having multiple databases on the server. There is very little need to cross-link the 3 databases we have at the moment, and the small amount of cross linking that might be wanted can easily be done externally. Music videos naturally belong in the music database rather than the video database, though there are ofcourse technical issues in getting them in there (lack of tagging a major one, and also having to make sure the player system supports them well.)

2. The big problem is dealing with paths and profiles as already discussed above. This is a non-trivial problem. Ideally, we want to be able to just point XBMC at a database and we're done. We don't want to have to care that the paths may be different and so on. Thus, translation of paths needs to be either done in the database or in the client. It seems to me that doing this in the client makes the most sense, though I currently have no solid ideas on how to handle this. It's not as simple as a copy and paste for several reasons - rar/zip/stacks/multipaths etc. all need handling, and these are not naturally hierarchical. Multipaths are easy (just store multiple rows to define a 'node'), and stacks are probably doable as well. Rars and Zips are tricky, as theoretically you can have arbitrary numbers of movies within rars which may be within a zip or whatever. This is something that needs solving prior to finalizing how paths and urls are stored. Perhaps the solution is to ignore it and force everything to be on a network drive that allows the same URL format for everything, or perhaps we just tag those paths that will be the same for all machines (i.e. network) - I don't know the best way to handle it.

3. Metadata has to be completely separate from file data. The link has to be very carefully setup. My thinking for the video library is as follows:

a. A "video" table that basically contains a filtered version of the filesystem. Each entry has a URL to the item on the filesystem that this entry covers (could be a folder or a file, a stack of files, or part of a file), along with the video URL that should be played when the item is clicked on, along with some other pertinent info, such as pathid (path that this item resides in), nodeid ("node" or source that this item resides in), a hash to enable rescraping and so on. Each entry represents a playable item.

b. A "filedetails" table would have information on the underlying file(s) for that item, such as the streamdetails and other file-based information (size, date, etc.) This is stuff that can be removed when the "video" item is removed, as it's easily generated from the file.

c. A "metadata" table handles all the metadata for this item. This is scraped info and local info. It'll be fixed field and will also have key/value pairs. This is information that may be scraped from online sources or local sources, and thus should not be removed if possible. It may be split into two, as I see a use for a single-row of information that gets key details needed for sorting (eg title, sorttitle, originaltitle, rating, runtime - one off stuff) and additional information used for grouping (genre, group, actor, director, studio, year, <insert_user_definable_grouping> etc.)

d. Multiple "video" items may link to a "metadata" item, so a video->metadata link can be made. It's a n:1 relationship though I think, so a foreign key in the video table to the metadata table will suffice.

The idea is that the video table allows for fast listing based on path or fast listing based on node or source (eg "all movies"). Once we have the listing, we then background load the rest of the information (one query per entry).

firnsy Wrote:Hurry up then ... This won't happen overnight and the more people working on it the more robust it will be

I think you are much further head at this point and I have much to do before I can look at this more. I had hoped to do the coding last month, but getting XBMC up&running and my video collection moved over has taken so much longer than anticipated.

I have too many jobs on the go, I have to complete the transition to XBMC for the waf, install a new intake on my mustang again for waf since the current one is cracked, work on finishing the basement before year end and spend time with my daughter before she grows up too fast.....

So please don't wait for me, I will jump in and help when I can. I will certainly be on board to test the new code, I have a mysql backend running previously used by mythtv that I plan to re-use for XBMC.

Short form for those that can't be arsed reading it all: How paths are defined per client is the key piece of information that needs to be determined.

Long form for those with time on their hands:

1. Consolidation is really just for the benefit of not having multiple databases on the server. There is very little need to cross-link the 3 databases we have at the moment, and the small amount of cross linking that might be wanted can easily be done externally. Music videos naturally belong in the music database rather than the video database, though there are ofcourse technical issues in getting them in there (lack of tagging a major one, and also having to make sure the player system supports them well.)

2. The big problem is dealing with paths and profiles as already discussed above. This is a non-trivial problem. Ideally, we want to be able to just point XBMC at a database and we're done. We don't want to have to care that the paths may be different and so on. Thus, translation of paths needs to be either done in the database or in the client. It seems to me that doing this in the client makes the most sense, though I currently have no solid ideas on how to handle this. It's not as simple as a copy and paste for several reasons - rar/zip/stacks/multipaths etc. all need handling, and these are not naturally hierarchical. Multipaths are easy (just store multiple rows to define a 'node'), and stacks are probably doable as well. Rars and Zips are tricky, as theoretically you can have arbitrary numbers of movies within rars which may be within a zip or whatever. This is something that needs solving prior to finalizing how paths and urls are stored. Perhaps the solution is to ignore it and force everything to be on a network drive that allows the same URL format for everything, or perhaps we just tag those paths that will be the same for all machines (i.e. network) - I don't know the best way to handle it.

3. Metadata has to be completely separate from file data. The link has to be very carefully setup. My thinking for the video library is as follows:

a. A "video" table that basically contains a filtered version of the filesystem. Each entry has a URL to the item on the filesystem that this entry covers (could be a folder or a file, a stack of files, or part of a file), along with the video URL that should be played when the item is clicked on, along with some other pertinent info, such as pathid (path that this item resides in), nodeid ("node" or source that this item resides in), a hash to enable rescraping and so on. Each entry represents a playable item.

b. A "filedetails" table would have information on the underlying file(s) for that item, such as the streamdetails and other file-based information (size, date, etc.) This is stuff that can be removed when the "video" item is removed, as it's easily generated from the file.

c. A "metadata" table handles all the metadata for this item. This is scraped info and local info. It'll be fixed field and will also have key/value pairs. This is information that may be scraped from online sources or local sources, and thus should not be removed if possible. It may be split into two, as I see a use for a single-row of information that gets key details needed for sorting (eg title, sorttitle, originaltitle, rating, runtime - one off stuff) and additional information used for grouping (genre, group, actor, director, studio, year, <insert_user_definable_grouping> etc.)

d. Multiple "video" items may link to a "metadata" item, so a video->metadata link can be made. It's a n:1 relationship though I think, so a foreign key in the video table to the metadata table will suffice.

The idea is that the video table allows for fast listing based on path or fast listing based on node or source (eg "all movies"). Once we have the listing, we then background load the rest of the information (one query per entry).

Cheers,
Jonathan

1. Having to deal with only one db makes things easier, i think. I was just thinking ( maybe i'm going a little too far or OT on this, but it seems to me a good idea so i want to share it ) Why don't implement a mechanism similar to the one implemented for strings handling, but for queryes? this way we can modify the db schemas without touching the source code (which would use an id to identify the query to use) and also we can easily mantain the queryes (they would be in a separate xml file and not in the source code). plus, maybe in a distant future, we can extend this to skinners so they can implement user defined queryes all across the db and show the result wherever they want ( no more dirty hacks to retrieve information).

2. I think we should go down the uri road and define (if not already defined) an uri to identify the local resources (eg: file://<source_name>) if i'm not wrong we already have something similar for smb:// and upnp.
For rar and zip handling maybe a function that recursively check if the resource is a media playable and if so start playing, otherwise relaunch itself on the new object collection.

3. i think we can extend the same concept also for music and program. I was thinking to try and merge (and normalize) the three dbs. As soon as i've something presentable i will post it (just give me some time... lots of things to do :-D).