Kategori: In English

TL;DR

Nordvision agreed to fund a programmer’s exchange between DR in Denmark, Rúv in Iceland and Yle in Finland. The exchange started off with me going to Iceland for two weeks. We collaborated on a meta-data project for Drupal called Yild by preparing for migration of Rúv’s terms to a Yild-compatible format and by writing a new Spotify provider module for it. We feel a lot of good came out of the project and are eager to continue it when someone from Iceland comes to Yle for a similar exchange in Q1/2017.

The project

Three nordic broadcasting companies: Svenska Yle, DR and RÚV have received funding from Nordvision to undertake a two week developers’ exchange between the companies. This text documents my experiences and findings from working with RÚV for two weeks.

Where it all started

The project first started in Amsterdam in 2014, when we met Helgi Páll Þórisson from RÚV at Drupalcon, a big convention for enthusiasts and professionals of the content management system Drupal. Both Icelandic RÚV and Finnish Yle were already using Drupal and we have the unique common ground of both being a publicly funded public service company with the same challenges and demands on the web platform.

We discussed the merits of Drupal and the unique problems we were facing. Quite soon the idea of collaborating was born. Coincidentally, there is already an organisation for such collaboration: Nordvision, which gathers representatives from all Nordic public service companies around the same table in a partnership.

Later, we would run into Helgi several times at various Drupal events and our visits to other Nordic public service companies. The idea of working together started to grow and pretty soon a developer exchange was proposed: One developer from Yle would work in Iceland at RÚV for a couple of weeks and later on, someone from RÚV would come work with Yle in Finland. Danish DR also expressed interest in being part of this program and time tables and other circumstances permitting, a similar exchange involving DR is planned for the future.

The exchange

After the summer of 2016 things started coming together: Nordvision approved the request for funding and pretty soon a time table was drafted: I would go to RÚV for two weeks in November 2016. At Yle and RÚV we started planning the details of the exchange.

We decided the developer exchange would focus on metadata, since Yle already uses a tagging system for Drupal that connects tags to an official source, such as Wikidata. This module or plugin for Drupal is an open source project called “Yild” (for Yle integrator for linked data). We agreed that our main focus would be to determine whether the same system could be used at RÚV and if so, to start implementing it.

The team

RÚV has a team of highly skilled web developers wrestling with a multitude of technologies, such as Swift, Python, PHP, to name a few. The team is cooped up in a far away corner of the building, fairly well isolated from the rest of the workforce. This is typical and unsurprisingly the same arrangement we use at Yle: Developers need to be isolated from outside disturbances and any interaction should ideally be channeled through one intermediary, a team leader or a scrum master.

What’s impressive is that two of the developers, Hannes and Jón, had only been at RÚV for a month when the exchange started and they were already highly productive. Hannes showed me the Apple TV app he was working on and told me the first version was already in production! Most of all, what stayed with me, was how fast RÚV had managed to make their new employees productive. Something many other companies could take note of.

Volunteer work

One thing that sets our organisations apart is that many developers at RÚV seem to be involved in some kind of extracurricular and in many cases volunteer activities:

Jón runs an non-profit initiative at the web address Koder.is to teach young kids programming. This is done with a Raspberry Pi computer that is able to run the popular game Minecraft in such a way that it can be programmatically manipulated using Python. This means the kids can learn to build houses by programming logic instead of the traditional approach of walking (in the game) to a spot and placing a building block there. I was impressed by how visual this process of learning the basic tools of programming is.

Gunnar is deeply involved in an initiative first introduced by BBC to give 9000 Icelandic children their own one-circuit-board computer, the so called Micro:bit. This collaboration between Microsoft and BBC gives the children an app that runs on their mobile. Using Bluetooth, they can first design the program in the app and then transfer it to the Micro:bit for running. I started Googling the Micro:bit and apparently there have been some Workshops at Kiasma, our modern museum, in Finland to introduce it to the public. In Iceland, this project has already evolved into an Icelandic web page developer by RÚV that contains dozens of exercises for those who want to learn how to program. I’m hoping to see similar initiatives in Finland as well. Maybe Yle could look into something similar or even collaborate with RÚV?

Micro:bit. The price is as tiny as the device itself: 25€.

Team management at RÚV

The first thing that catches your eye at the web developers’ office at RÚV is the gigantic kanban board in the middle of the room. Almost all the sticky notes are in the “ready for testing” or “done” column.

Yes, most of the issues are in the ”done” column.

Like most software companies these days, the chosen methodology is a sort of customised agile model with daily meetings and a desire to achieve synergy through shared problems and tasks. Whenever I told the team about a problem I was facing during a daily meeting, someone was always eager to offer a solution: “Icelandic always has problems with character encoding because of our many special characters”, Gísli Þórmar Snæbjörnsson offered helpfully.

Whenever one team-member recognised another member was working on something they had previous knowledge about, they were quick to offer assistance. I believe one big reason, perhaps the biggest, for implementing agile in a team is to move from working as individuals to working as a team and the Icelandic developers’ team was clearly a team. I often saw three or four people flock around the same screen, speaking enthusiastically about some new idea of problem they were having. In contrast, at Yle I usually see people keep to their own screen and I think this is something we could learn from the Icelandic team: to be more brave in seeking out other people’s ideas and thoughts.

Kick-off

We started the project with a kick-off meeting the very first day. Everyone on the web-team attended and it was decided I would primarily work with Jón, Hilmar and Helgi. These were all people with prior Drupal experience or, in Jón’s case, new employees who needed to familiarize themselves with the platform.

As part of the Kick-off meeting I demonstrated the Yild module to the whole team. Yle has been using the module for tagging content for several years and it is this experience and expertise that puts us in a position to help RÚV take their tagging to the next level.

Real structured data opens up exciting possibilities: For one, Google will start to know what the content is about, which will help the search engine ranking. Using a common vocabulary, there is also the possibility of linking content from different sites. If Svenska Yle writes about the Eurovision Song Contest, we could link to RÚV’s content about the same subject.

Code review

The next day the Drupal developers gathered for a more technical meeting: The code review and presentation of the Yild module. I explained the structure of the project and walked the team through the programming patterns in use.

Yild is a multi-part module where a core module provides the basic functionality. The core then calls upon so called provider modules to fetch tag suggestions and details from various external sources. Each external service has its own provider: Wikidata, Geonames, Wikipedia, Musicbrainz, etc.

Writing your own provider module is a relatively trivial task if they have a well-documented api to fetch data from. As a result from the code review, Hilmas became enthusiastic about writing a provider module for Spotify since that was something he needed for another project. By making Hilmar a co-maintainer for the Yild project, it would also be possible for him to contribute to the Yild project itself with a new component that added functionality: the ability to tag articles and other content with any artist, song or album that is on Spotify, complete with l descriptions and album covers.

Jón would in turn work on a provider for tagging with Icelandic politicians (an idea that was later scrapped and replaced by a provider for looking up Icelandic companies).

Helgi and I would work on researching how to best migrate from the current tag system on RÚV’s Drupal site. It was suggested that a migration tool for trying to match existing tags against a preferred provider would be useful for the whole Drupal community, so that would be my main focus for the next weeks, along with trying to match RÚVs tags against Wikidata to find out how many I could actually match.

The Spotify Provider

Hilmar was quick to finish his first draft of the Spotify provider module. After committing it to the version control repository at drupal.org, I read the code, tested the module and offered my comments and insights.

In just a few days, Hilmar went from not having seen the project before to contributing a provider module that was added to drupal.org for the benefit of the whole world. This felt like a big win for the project. Two Nordic public service companies had actually worked together on the same code and then released it as open source.

Looking up the Icelandic artist Mugison using the Spotify provider finds all albums and songs.

In the meantime, Jón discovered a bug that broke an important part of the functionality of Yild on cleanly installed systems. He quickly fixed the bug, made a patch, which was added to drupal.org and credited to him. We now had two Icelandic web developers from RÚV with official contributions to a Drupal project started by Finnish Yle.

Greynir

One thing to keep in mind is that the quality of your structured data can only be as good as the data you get from the provider. There was always a concern that Wikidata wouldn’t be good enough in Icelandic to actually be used with RÚV’s content.

During one of our first meetings, Jón mentioned there was a very exciting open source initiative for cataloging and structuring content on Icelandic news sites called Greynir. Greynir is an NLP (Natural Language Processing) parser for the Icelandic language. It indexes several sites and breaks the content into a tree structure, distilling the content into specific terms, such as names of people, places, job titles etc.

Greynir is developed by Vilhjálmur Þorsteinsson, a veteran programmer and enthusiast of deep learning and artificial intelligence. Jón set up a meeting to sit down with mr Þorsteinsson to discuss the possibility of turning Greynir into an API for use with Yild.

One way we use Yild at Yle is to fetch term suggestions based on the content. This requires some kind of analysis service that reads the text and offers suggestions on terms that are relevant for the content. There is currently no such service for Icelandic that we know of, but Greynir is very close and with a small amount of work might become such a service. This would help immensely with the tagging of Content at RÚV.

During the meeting mr. Þorsteinsson said it would be quite easy to give Greynir an Api structure and it’s an angle we are excited to move forward with.

Migration tools

The biggest obstacle for implementing Yild on RÚV’s site was always migrating from the current, unverified tags to something actually connected to Wikidata.

I set out to write a migration tool that would go through all 2000 tags used on RÚV’s website and try to match them with their Wikidata counterparts. After some challenges with character encoding – of which I had been duly warned – and a lot of translation help from my fellow team members and Google translate the tool was finally ready. Out of more than 2000 terms approximately half could be matched against Wikidata. The rest of the terms were saved in a report delivered to Helgi for further inspection.

Matching tags in Icelandic against terms found on Wikidata.

The way to move forward is now to analyse the migration data and see if it’s good enough to work with. The plan is to run the migration script on RÚV’s test servers and then on the production servers, after which moving to use Yild is simply a matter of enabling the module and teaching the editors to use it.

One particularly amusing problem was when I tried to match the Icelandic word for heavy rock, Þungarokk and kept getting a totally unrelated word from Wikidata that means “roofing sheet” (the sheets of corrugated steel used as roofing on some houses). The word was bárujárn. After a while I asked a team member if he had any idea why this might happen and he laughingly told me it was actually a term used by metalheads as a synonym for heavy rock since back in the beginning of the musical genre they thought the music sounded like hitting sheets of corrugated steel with a hammer. Sometimes the code works perfectly and it’s just the interpretation of the data that is wrong.

The way forward

Having completed the first part of the collaboration project, I feel it’s important not to lose momentum. It’s easy to pat ourselves on the back and congratulate each other on a job well done, but the reality is we’re not done yet.

In the near future, possibly in January, a developer from RÚV is scheduled to join our developers at Yle for two weeks. During that time we will work more on the metadata issues and hopefully come a step closer to implementing Yild on RÚV’s website. We will also likely work together on other issues to benefit from the knowledge of the Icelandic web developers.

In February RÚV is hosting Drupalcamp Northern Lights 2017 in Reykjavik. This will be another opportunity to meet and discuss the project. Hopefully by then, we will be on track to enable the Yild module at RÚV and start tagging in a common way.

There is another Nordvision project focused on API collaboration between the companies. During my stay I demonstrated Yle’s many APIs and walked some RÚV developers through the documentation. As it turns out, RÚV is eager and excited to bring their content to a similar API structure and I feel there is a great opportunity for collaboration around this issue as well. Both in imparting our knowledge of API design, but also ensuring our APIs are built with a similar structure so that we can share information and data more easily in the future.

Above all else though, I feel it’s important to keep the wheels in motion. There is a lot of potential and we’ve come a long way already. By producing something tangible, we stand a chance of bringing even more Nordic public service companies and specifically the developers together in collaboration.

To face the challenges media and journalism faces in our new global and digital age, we need to solve the problems of packaging, distribution and relevance. For a large public broadcasting company this means there is a new need for streamlined workflows, efficient content handling and decentralised ways to aggregate content. We at the Finnish Broadcasting Company (Yle) believe metadata is the key to accomplish this.

We have on our site annotated all articles with terms from the Finnish thesaurus and ontology service Finto (previously named Onki). Last fall we also started using Freebase for people, organizations, places, events and media. Currently we have used over 10 000 general terms from the KOKO-ontology and over 5000 terms from Freebase. This whole period has been a time of investment, and now it’s time to cash in on it.

Getting the metadata sorted out

There is a project within Yle called Metatiedot kuntoon (link in Finnish) – Getting the metadata sorted out. The objective of the project is to unify the metadata used within Yle, bridge the silos created by the many systems, parts of the organization and languages used in our company. And also to open up our content as data.

For us at the Swedish Speaking department at Yle, we are sort of a miniature of the whole organisation, this has meant an opportunity as a smaller and more agile unit to experiment and try new paths. We have been the first to use the above-mentioned semantic annotation compliant with the principles of the semantic web. And we now continued with the practical job of linking the data.

We have at the time of writing (June 17th 2014) linked together Yle-content created in different parts of Yle’s organisation, using different publishing platforms and different languages. We have also indirectly been able to form connections between the written articles and the yet untagged video and audio content on our on demand streaming service, Arenan. Thus being able to enrich our knowledge of our own video and audio content through graph relations.

An API-driven approach

The ”traditional” way of building the semantic web and doing linked data has been through RDF and SPARQL. Our work has taken a bit of a different path, mostly due to limited resources and the application of lean development. For example the decision to start using Freebase, over say dbpedia, was mostly dictated by the fact that they had a better API for our purposes. We have also been uninterested (read: have not had the resources) to build our own schemas and ontologies and charting the world according to Yle. So we have chosen existing ontologies and metadata repositories to use for the annotation of our content.

And with that background RDF and SPARQL has seemed to be a too great learning curve and initial investment to make. But we haven’t wanted to part from the track set up by the semantic web effort. So we have taken great care to ensure the quality of the semantics, and safeguarded the machine readability through RDFa, Schema.org, Dublin Core and JSON-LD.

At the same time there is an ongoing large effort within Yle to put all our content up on API’s (link in Finnish). We have API’s for programmes, articles, images, user statistics and so forth. And now we also have an API for metadata.

We send our articles from Drupal to the articles-API. From there their metadata is also sent to a Neo4J graph database. And from there we can make relational queries over the meta-API. We have at the moment four different queries for our front end utilising the meta-API:

Linking content tagged with the same tag in different languages (all used metadata repositories are multilingual supporting at least Finnish, Swedish and English).

Giving recommendations for content with similar tagging.

Giving recommendations for content with similar tagging, but in a different language.

Giving indirect graph based recommendations of audio and video content from our on demand-service, Arenan, based on article-tag-relations.

These are now working on our development platforms and are at a proof-of-concept level. They will be gradually put in production as soon as we get some queries optimised for speed and our AD and front development team (of one 🙂) gets the rough edges sorted out.

Recommendations is the key to relevance

Overlapping the silos of organisation, systems, language and media type, using metadata and API’s

In the image above you see an article about flooding in Serbia. In the right hand column there are two clips that have been embedded in other articles with similar metadata, showing related flooding clips. In the subject listing below the videos is a list of more reading on the same subject(s). And furthest down to the right an article that also relates to flooding from our Finnish-speaking colleagues.

Above the drawing I left our developer along with a note – ”Can you build this?” – for the article-programme-connection.

Future proofing media publishing on the web

As more and more traffic to our web site comes through search engines and recommendations directly into specific content, we strive to build our service along with the credo that ’every page is a front page’. This means wherever you land on our site you should get the full public service spectrum of content and find relevant story’s to follow up on.

At the same time the editorial resources are highly limited, and we want our journalists to make content, not edit subject pages. So all this will have to be as highly automated as possible.

Additionally we are preparing for opening up our API’s for third parties to build services upon, for users to fine tune their own content flows (a first product built on Yle’s API’s is Uutisvahti / The News guard – a personalised news app) and free aggregation over the internet for humans as well as machines.

In this way we hope to keep up our relevance and presence to our audience through high quality public service journalism in todays fragmented media reality.

So we recently found out we’ve been a bit boneheaded with the pathauto alias to our Freebase taxonomy pages. The pattern was http://svenska.yle.fi/term/freebase/politics-1234, where 1234 was the term ID (tid) in Drupal.

This is a stupid alias pattern, since it makes our urls obfuscated – impossible to determine from the outside without access to our Drupal database. Also it feels like bad semantical form, because the url contains something that’s meaningless to most people. So we wanted to change the pathauto alias to use the unique Freebase ID instead.

Here it is important to remember that the best looking url naturally would be just http://svenska.yle.fi/term/freebase/politics, but because of disambiguation (dealing with the case where a word can have many different meanings, such as ”mercury”, which can be an element, a Roman god, a planet and many other things) we want to guarantee that a url is unique.

If we look up the word politics on Freebase, we find that its unique Freebase ID is /m/05qt0 and so we would like our url to have the form http://svenska.yle.fi/term/freebase/politics-m05qt0.

Using our own Freebase module (which may be released to Drupal.org at a later time) has put a field called ”field_freebase_freebaseid” under a taxonomy vocabulary called ”freebase”. This means we have access to the token [term:field-freebase-freebaseid] and that makes the whole pattern for Freebase taxonomy term listings the following:

term/freebase/[term:name]-[term:field-freebase-freebaseid]

The problem

The problem is that when we change the url alias pattern we want to leave the old alias intact and redirect from it to the new one. This functionality is built into the pathauto module: you can open up a taxonomy term for editing, save it and the new alias will be generated and the old one made into a redirect.

However, we have 6 000 Freebase terms and it would take a day to open up them all and save them to get the new alias with a redirect. It seems fortunate then, that we have a bulk update feature in the pathauto module. Bulk update generates aliases for all entities in a specific content type. Unfortunately, bulk update only works on entities, in our case taxonomy terms, that don’t yet have a pathauto alias. What you have to do is delete all current aliases and then start the bulkupdate, which will generate new aliases using the new pattern. If we start by deleting all current aliases, no redirects can be created! Here are some articles and threads discussing this very issue. Apparently it’s been a problem for around four years:

Basically, if you’ve created thousands of pathauto aliases that have been indexed by Google and need to exist as redirects to the new alias, you’re out of luck! This seems like an incomprehensible oversight and part of me thinks I must’ve missed something, because this isn’t acceptable.

The solution

Searching the web has given us several ideas about how to deal with this issue, but most require some kind of manual hacking of the database, which doesn’t really sound like something we want to do.

Instead, we ended up writing a simple drush script that just loads all terms in a taxonomy vocabulary (”freebase” in our example, but the script could easily be modified to take a command line parameter). Writing the script took about a third of the time it took to write this blog text, so hopefully at least two other Drupal-users will find this beneficial.

I am assuming you are familiar with Drush scripts, but to briefly explain, assuming your module is named ”freebase”, you can just create a file called ”freebase.drush.inc” in the same folder and when you activate your module, your drush.inc file will be autoloaded as an available drush script.

Yle’s internal Image Management System (IMS) was recently renewed. It was a big leap forward for the site svenska.yle.fi to move all its images to the cloud, not only for the storage but for all the transformations as well.

Background
IMS is a custom, PHP based solution for storing images in a central place. It supports uploading and cropping images as well as managing image information such as tags, alt text and copyright. Images may be searched by tags or upload date. The system is multilingual, currently supporting English, Finnish and Swedish.

IMS was born about 5 years ago in December 2008, when the first version of it was launched. It was a quite simple looking tool for uploading and searching images. The workflow was to upload an image, enter its information such as tags and copyright, select its crop area(s) and save it. Then an editor would select the image while writing an article in SYND, FYND or any of the older now migrated sites. The image id is saved in the article, and the image is displayed via IMS. This way the same image may be reused in multiple articles.

Different image sizes for each image are needed depending on where the images are displayed on the site. IMS had a custom made, JavaScript based cropping tool for selecting the crop area of the image. The alternatives were to use the same crop area for all different image sizes, or to crop each size separately. The result was that we had 10 image files stored per uploaded image: The original plus the cropped version in nine different sizes ranging between 640x360px and 60x60px. All of there were in ratio 16:9 with the exception of the last one which was 1:1.

Cloudification
Along with Yle’s new Images API, the new version of IMS serves images from a cloud service. All transformations are done on the fly by specifying parameters in the image URL. Therefore, no actual image crops are performed on our servers anymore, we only save the crop coordinates (as a string in our database) and relay them to the cloud.

IMS also supports choosing crop areas for different image ratios now, instead of for different sizes. Available ratios to choose from are 16:9, 2:3 and 1:1.

When uploading an image, the image is first uploaded locally to our server. It is given a public id (used as a resource identifier by the cloud service), which, along with other information related to the image, is saved to our database. After that we tell Images API where the image is located and what public id it has. The image is fetched from it location and pushed to the cloud service. Now we can start using our image directly from the cloud, and that is exactly what we do next.

Once an image has been uploaded, the user is redirected to the image editor view. Already here, the image shown is served from the cloud and scaled down to an appropriate size just by adding a scale parameter in the image URL. The user may now define how the image should be cropped in each available ratio, enter some tags, alt-text etc. For increased SEO, we actually save the given tags and copyright text into the image file itself, in its IPTC data. This means, however, that each time the values are changed, the image has to be sent to the cloud again, replacing the old one.

Drupal integration
We have a Drupal module integrating with IMS in order to fetch images from it. In the Drupal frontend we initially always render a 300px wide image in order to show the users some image almost instantly, even though it would be very blurry as it might be scaled up. When the page load is ready we have a JavaScript going through all images and swapping them to a bigger version.

In the old days when we had those 9 different sizes available, it was hardcoded in the script that which size should be used where on the site.

With the cloud service in use we are able to utilize its on-the-fly image manipulations. Now our script actually looks up the size of the containing element of the image (e.g. the parent of the img) and renders an image in exactly that size. This is done simply by changing the size parameters given in the image URL. This enables us to control how large images we are serving just by changing the element size in the stylesheets.

The difference for tablet/mobile when we can select the best possible size for any resolution (click to view bigger version)

Challenges
One of the most challenging things we encountered was the fact that many images are 640x360px in size. That is the original image size! So how to show images that small in articles where we wanted an 880px wide image? We add an upscale effect.

Using the cloud service’s image manipulations, we take the original image, scale it up to the desired size and blur it as much as possible. Let’s call this our canvas. Then we put the original image in its original size on the canvas that we just made. The result is that it looks like our image got blurred borders. The same kind of technique is used on TV when showing old, 4:3 clips in 16:9 format.

We ran into a few bugs in open-source libraries we used. We decided to ditch the custom crop tool and use open-source Jcrop library instead. There was an issue when using a fixed aspect ratio together with a minimum or maximum allowed height of the crop area. We fixed the bug in our GitHub fork and created a pull request to get the fix contributed.

Also when using the cloudinary_php, the PHP library for the cloud service, we noticed a flaw in the logic. When specifying an image to be cropped according to specific coordinates, zero values were not allowed. This prevented any crops to be made e.g. from the top left corner of an image (where both X and Y are 0). The bug was fixed in our fork and merged via our pull request into the library.

Migration
Another challenge was that we had over 160 000 images with a total file size of somewhere around 400GB. For all these, we needed to a) generate a public id, b) upload to the cloud and c) save the image version number, given from the cloud as a response to the upload, in our database.

Of course we had to do this programatically. With a quite simple script we uploaded a sample batch of images. The script read X amount of rows from the database and looped through them, processing one image at the time. The idea was good, but according to our calculations the migration would have taken about 29 days to finish.

We then thought of having multiple instances of the script running simultaneously to speed things up. Again, the idea was good, but we would run into some conflict issues when all the scripts would try to read and write against the same database, let along the same table in the database.

Our final solution was to utilize a message queue for the migration. We chose to use RabbitMQ as our queue, and implemented the Pekkis Queue library as an abstraction layer between the queue and our scripts.

This way we could enqueue all the images to be processed and simultaneously run multiple instances of our script and be sure that each image was processed only once. The migration took all in all about 20 hours.

Written by Rasmus Werling
Rasmus “Rade” Werling has worked with Drupal development for 5 years. He’s speciality’s are backend coding and coming up with creative solutions to problems. He has contributed Drupal modules of his own and loves to take on challenges.

On monday we will have our first public test for our Voice Operated eXchange our switch.

What we have done is in a radiostudio splitted the microfones and connected the signal to a Shure automixer that recognize when there is sound in the mic. The automixer gives a GPIO to an Arduino that with Skjaarhojs libaries gives a comad to an ATEM TVS to cut to a camera. As cameras we use GoPros (3 BE and 3+). When the signallight in the studio goes off the swither goes to an input with CasparCG(server 2.0.7beta) where we run a an html-page with a longshot-picture over it.

The resolution for the system is set to 720p and the list of equipment used are

User story: Help editors pick the correct representation of an article when connecting it to be displayed in a new department.

Background: We noticed that editors where picking the wrong representation of an article as there can be many of them. The reason for this being that we let editors customize the representation of their article depending on the department where it is being used.

Step by step: 1. Go and create a new view, add a Entity Reference display, and add the fields (filters etc just like a regular view) you want to display to the editors. If you select more than one field you will need to specify the ”Search fields” (Format –> Settings). In our case we added the date and department title. We are also thinking of setting a sort criteria DESC as it is likely most editors are looking for recently published content.

2. Go to your content type, and edit the field that is a Entity Reference. Change the reference selection to a view, and then pick the view you created. In ”View used to select the entities” select the display you made.

3. This is the end result with some additional CSS work:

Just adding the fields would have improved the editorial workflow, but just a little bit of CSS helped to make it even more usable. I removed the default wrappers and added some custom CSS classes for the fields. This way I was able to adjust the styling of the title, date and department.

Decided to style all Entity References in the admin theme with a bit more padding and a border-bottom line as it improves readability.

When trying to inspect the autocomplete div I noticed it was quite difficult to grab it via the inspect element function, but grabbing it via ”Copy as HTML” worked. This is what the basic markup looks like on a regular Entity Reference Simple field.

There has been a need to replace Drupal’s core search with Apache SOLR in Svenska YLE for a quite some time. Before I could begin implementation, we needed to decide which Drupal modules we would select to handle the issue. There were really only two options: the Apache SOLR and Search API modules. Search API was already familiar to us and there was better Views support for our purposes, making it the obvious choice from the very beginning. At this point, we haven’t even done any actual comparison between Search API and Apache SOLR.

We already had an Apache SOLR test environment on YLE’s internal network, so we only needed to discuss how to work with the Apache SOLR service on the local environments of the developers. We could either use a local virtual SOLR environment (e.g. VirtualBox) or we could use an external service that could be accessed from anywhere. Using SOLR service within YLE’s internal network was out of the question because the development environment service needs to be functional outside of YLE’s network.

We investigated some of the external SOLR services available, but we finally chose to use local virtual SOLR environments. The main problem with this was how to ensure that all developers would have the exactly the same development environment and how to ensure that the development environment would be similar to what exists in the production environment. After a few trials and errors, Vagrant-box gave us the solution to this problem. I will not go any further into the subject of Vagrant at this point, except to say that Vagrant is the perfect tool for managing environments.

Once the modules and environments were selected, the actual implementation work could begin. We were using SOLR 3.x in both production and test environments so I needed to get a similar environment set up in my local environment. I found a ready-made vagrant-solr-box on github, so I decided to try that first. The environment worked just fine, so I continued the implementation using that.

I installed the Search API and Search API SOLR modules and also the Search API SOLR Overrides module for overriding SOLR server settings in different environments. Configuring Search API in Drupal was already a familiar procedure to me, and everything proceeded very smoothly. I began by configuring the Search API SOLR server and index. I replaced the content listing pages with the help of the Search API Views module, and everything seemed to work nicely on my local environment. We were now ready to move everything to the test environment, where a “real” Apache SOLR environment was waiting for us. All we needed was a new SOLR core for our site.

As I mentioned, everything had proceeded reasonably well, so far, but in the test environment, we started to run into some problems. First, Drupal wasn’t able to connect to the Apache SOLR server. By adjusting the proxy settings, we were able to resolve this issue, but Search API still just wasn’t working with the multicore Apache SOLR on the test environment. Indexing was successful on our local virtual environments, but these had a single-core SOLR server. The configuration that had worked just fine on my local environment didn’t work at all in the test environment, even though both were using the same version of Apache SOLR.

To solve the problem, we started by installing Vanilla Drupal on the test environment with the same modules in use on the actual site. By doing this, we were able to exclude any possible problems that might be caused by our own installation profile and features. Search API was not indexing content on this new test site, either, so we decided to try upgrading SOLR. We upgraded SOLR from version 3.6 to 4.4, and at the same time we updated schema.xml to support the latest Search API and Apache SOLR modules. This resolved the problem, the test site was able to index content to SOLR, so we configured the actual site and indexing started working there, as well.

We were very relieved when this adventure was finally over. A task that initially had seemed easy didn’t turn out to be quite so easy after all, as these things usually go, but there is no greater joy than when everything works out in the end.

With the SOLR index we have been able to replace most of the taxonomy listing pages, and this has meant a reduction on the processor load (on the database server) – especially in views that have depth enabled. The next thing to looking into is to remove the standard Drupal search index, to get a smaller database.

Written by Ari Ruuska
Ari has worked with Drupal developing about 7 years. Most of that time as a consultant at YLE as Drupal Developer and architect. He has also managed Drupal projects and developers team.

As Mårten explained in an earlier post we decided to redo our installation profile into a more diversified model with a common core set of modules and with a differentiated set of modules for every site that will use the common core.

Splitting the previous Yle – profile that we used as a base for our installation from the beginning was easy enough. Just create two new installation profiles – Syndprofile and Fyndprofile that used most parts of the earlier Yleprofile and the parts that were specific for the two installations into their own sets of repositories, ie synd_modules and fynd_modules and so on. Starting development on the fynd-platform was also very successful and has since launced two successful projects: Kuningaskuluttaja and MOT.

However, on the svenska.yle.fi – platform, the change posed a lot of challenges to overcome, since we had to change installation profiles on a running site. And since a lot of the path settings to modules and themes are set in the database at installation time a lot of file paths would have to be changed. So when we first tried the easiest approach, just replacing the profiles/yleprofile installation folder with a profile/syndprofile we were left with a severely broken site, that couldn’t find any modules or themes, no matter how many times you tried to clear the cache. So it became clear that we had to do the changes directly into the production database.

So what we did was to dump the database into an sql file and doing a search and replace on every occurence of yleprofile to the new synprofile. We opted to slightly change the name to synprofile since there are a lot of serialized data that contains the path in the database as well.

We have around 500 000 nodes in our database at the moment, and a pretty large index, but using sed to do the search and replace the operation only took a few minutes to do, even though the sql file itself is getting close to 5 Gigabytes in size.

This, however, was not enough in our case. We also had a couple of modules and our own theme settings and features that was not going to be used in the other distributions so we had to manually change the location of these. Fortunately most of the file path settings are stored in just a few tables.

system
registry and
registry_file

The paths are also stored in the cache tables so it is advisable to truncate them at the same time, especially

cache cache_bootstrap and cache_path

It took a few tries to get every step in this workflow to work out without problems, but when we finally figured out how to do it the migration process went pretty smoothly. Just one final hurdle gave us a bit of cold sweat when the site didn’t even bootstrap even though the migration process had otherwise gone smoothly. But, a manual flush of the memcache server solved that too.