Moodle custom fields

At the recent Hactoberfest Moodle developer meeting, one of the themes was custom fields/metadata. What’s the difference? I hear you ask… Well, metadata might be the same property on multiple things e.g licence, learning outcome, whereas a custom field is on one objecte.g. course duration. Some properties might be both/either metadata & custom field e.g. language.

My interest is for OpenLearn which has a range of custom fields on its courses e.g. seo url, educational level, language, duration. I’m also interested in extending licences to apply to more objects and be more useful, and maybe extending tagging which, I believe, is just another sort of metadata. Other places already in Moodle where metadata exists could be rationalised.

The meeting also discussed improvements to custom profile fields to use the same system. There is a desire to offer alternate name field and modern IM ids. The new system should have an improved UI and offer better controls like if you’re designing your own moodleform in code e.g. to add help popups. None of this is controversial or difficult to code.

The big decision is on the database backend. There are two options for implementation. Option A: flat tables called _extend for every current table (e.g. user_extend, course_extend) with dynamic creation of new columns in the _extend when new fields are added. Or Option B: generic data tables (one for each data type) with property/value/object columns. Option A matches how OpenLearn currently does its metadata, Option B is used by Remote Learner Moodle partners.

Whichever is the chosen method, we would need to extend the current DB API function calls so that if you get records from user, for example, any user custom fields would automatically be included in your result object.

One suggestion is that option A works for custom fields, and option B works for metadata. Do they really need different treatment? Well, it seems more elegant not to, in my opinion. If the reasons why an implementation is inappropriate for a new metadata/custom field thing is long, then the design is wrong.

Pros and cons

Option B means controlled DB structure changes, but the number of joins required to get single user object back could be huge if there are multiple custom user fields of different types. Some database platforms have limits to the number of joins they allow (apparently) – could we ever reach this? Multiple joins would also be a slower query, and it would be harder to apply indexes on these tables to support search queries. These tables could grow massive over time, and experience problems similar to current log table. Can this solution ever perform well enough?

The flat tables of option A are easier to join and index, so result in quicker queries, but dynamic database change is bad & maybe insecure, and could grow like topsy. Also this method doesn’t support cross-object metadata at all. And we could still have problems with joins if there are multiple values for a metadata field e.g. authors.

Next steps?

A detailed SQL investigation is needed to analyse the performance implications of the two options, and wider developer discussion.

Either way should be ok for OpenLearn, so on one level I don’t really care what is decided. But Option B just ‘feels’ more elegant to me if we can only get the queries to perform well enough.

The point of having a table for each datatype is so that the fields can be indexed properly, instead of treating everything as text. This allows you to, for example, search for all courses whose duration value is bigger than 5.

Brian said: October 30, 201211:07 am

Is there an online discussion / wiki / tracker issue related to this? I’d like to see the technical details of the two proposed solutions and know what kind of performance problems there are.

The main documentation for our discussions at the conference can be found here http://docs.moodle.org/dev/Perth_Hackfest_October_2012 including a link to the custom fields session. There seem to be a number of wiki and forum entries on the same subject so I’m not sure which would be the best place to start a discussion – probably best to start a new one in fact.

To be clear though, there are concerns that there may be risks to performance with either of these options. Further investigation will be undertaken to see whether these concerns are valid or not. No feature will be implemented unless it can be proved to perform well on a large site.