Scoop! The inside story of the news website that saved the BBC

How BBC News Online was created from scratch

Common Topics

How the tech team was put together, the software designed – and its mastermind found writing code at a dialup ISP

In late 1994, philosophy graduate Matthew Karas was finishing a master's degree at Cambridge in computer language processing. In Manchester in the 1980s Karas helped put pirate radio stations on air in Hulme and Moss Side, and he was a musician: his band Soil had earned handwritten fan mail from Morrissey. Unable to find a job in language processing, he found himself working shifts at a dialup ISP called Delphi for £7 an hour.

The internet provider had set up a consulting business, which was building Sky’s first websites. Delphi wanted to employ twenty staff to hand-code HTML and paste in content for the broadcaster's news, weather and sports websites. “Why not design a system to do it?” asked an astonished Karas, who then handed them what became one of the world’s first substantial web content management system (CMS). In the end, the software saved Sky having to employ 16 HTML grunts.

Rupert Murdoch's News International acquired Delphi outright, and cast the team into a joint venture called LineOne with BT and United News & Media. News Intl withdrew from the venture in 1999, and LineOne was acquired in 2001 by the Italian telco Tiscali. Today, it’s better known as TalkTalk, and is owned by The Carphone Warehouse.

Delphi would be a hub for London’s first generation of web coders, many of whom socialised on a mailing list called Haddock, including Matt Jones, Stef Magdelenski, Yoz Grahame, Tom Broxton and Karas’ friend from his philosophy studies, Stuart Tily. As the software boss, Karas had little time for the Haddock banter and in-jokes. What intrigued him more was building a better CMS.

Then came the call from Eggington.

Karas liked the can-do spirit of Smartt and Eggington, and the independence they had carved out in the vast BBC bureaucracy. They also discovered they had something else in common.

“We believed very strongly that the internet was for everyone,” Karas told The Reg. The BBC had launched a hobbyist ISP in 1994, the BBC Networking Club, for computer enthusiasts. “The BBC Networking Club, was fine for its time, but appealed to people who shop at Maplin.”

The technical brief of BBC News Online was daunting: the site would need to be constantly updated and offer stories in a variety of languages, taking feeds from the BBC’s World Service operation.

“I told Bob [Eggington] I thought it could be done within two years with 12 programmers. Bob said it had to be done in 16 weeks with 5 programmers,” said Karas. A date was set to launch in November 1997.

Matthew Karas

Karas handed in his notice at LineOne and took up his new job at the BBC on 3 June, 1997, returning to the ISP briefly over the summer to tempt across a dozen of the team: they were mostly software developers, but there were a few designers too.

Designer Matt Jones, who had already left Delphi, was one of the first to join the new operation, and was put in charge of prototyping how the site would look and behave.

Karas wasn't impressed by what he saw across the Atlantic. A CMS at ABC required journalists to construct sophisticated database queries to build articles, while another, at MSNBC, looked smart but wasn’t built for the intense workflow the BBC anticipated.

Karas also examined expensive packaged software options, including Vignette’s StoryServer, marketed by CNet. Vignette had rapidly acquired big-name publishing customers with big budgets, including the Guardian newspaper. But these packaged options still required a lot of integration work.

A risky architecture

If hiring chattering London techies was a gamble by Eggington, then the architecture Karas wanted – “a CMS done properly” – was also risky. And it was politically charged, too.

The BBC had a contract with ICL Fujitsu and every new project was obliged to use Fujitsu staff, and wherever possible, technology.

“I had knew what we wanted to do – from learning what worked and didn’t with my CMS at Sky," said Karas. "People were dazzled by the internet and interactivity, but underneath, it was a publishing system.”

Non-trivial technical issues, such as managing queues of articles awaiting publication, needed to be resolved. The system would need to juggle 60 in-house staff hammering it at once, publishing stories, with many more feeds coming in from other places. Karas devised a demanding specification that he knew ICL Fujitsu could not match. The News Online system needed to be able to publish an article from an editor’s desk within thirty seconds. The specification ensured that Fujitsu’s offering would fail: it couldn’t process the feeds, resolve the links, in the turnaround time required. Fujitsu withdrew.

Smartt was adamant the site wouldn't lazily publish stories direct from the wire feeds. Instead, he and Eggington wanted to tap into the vast amount of high-quality output generated by the BBC, particularly its World Service division, which broadcast to more than 40 countries. The BBC’s radio journalists used a VAX minicomputer with a system called BASYS (which DEC had acquired in 1992) that became Avid iNews – and also a Unix system called Edit. If the radio news scripts were in a computer somewhere, why couldn’t News Online use them? It wasn’t going to be easy.

Help was around, but Karas felt it wasn’t the right help. He wanted his team to focus on the overriding requirement of achieving rapid publishing, something that minicomputer sysadmins might not intuitively understand. This meant politely rejecting help from the BBC’s legendary technical expert Brandon Butterworth, who Karas had known from his pirate radio days: both had been in Manchester in the mid-1980s.

“Brandon was a visionary in a lot of ways: he foresaw the use of the internet for time-based media – such as radio and TV – before anyone else did. He knew the protocols were up to it, and what needed to be done,” said Karas.

“But the system really had to work for editorial – it was a publishing system. Brandon’s view was that of a system architect, a Unix guy.”

The architecture Karas devised in 1997 was a database-driven system although “the public were miles from a database”, he said: the website would publish mostly pre-built pages rather generate them entirely on the fly from the database every time a visitor requested an article.

The application servers were split into two groups: one set handled the content queues to produce pages for the public-facing web servers; the others ran the site logic. It was a loosely coupled architecture with several advantages: the interfaces between them were well designed, and it also meant one part of the system could be upgraded or, heaven forbid, break without significantly derailing the other parts.

The 1998 CMS architecture for BBC News Online
SQL Server was later replaced by an Oracle database, but the architecture remains (click to enlarge)

The design gave the BBC great flexibility, too: new output formats such as WAP, an Indian language edition, and a simple version for blind users, or Netscape and Microsoft push channels, could be added very quickly and easily.

At the centre of the design was the database, initially Microsoft’s SQL Server – which was fast and cheap – although this was later replaced by an Oracle product. The database accepted submissions from journalists, using a custom-built client, and the text scraped from legacy systems within the BBC empire.

Karas ensured the programmers were kept well away from the data store.

“Programmers don’t have a clue how databases work, particularly the security side,” he said. “I hired a database administrator just to keep them off the system.”

For the application engines, Karas made an unusual choice: WebObjects. Steve Jobs’ NeXT demonstrated the web programming framework in 1995; Apple acquired it along with Jobs in December the following year. At that point, WebObjects required a programmer to learn Objective-C, an esoteric programming language only really used at the time to write code for NeXT machines.

Karas felt this was not a challenge for a bright programmer, who should already understand object-oriented programming, and it was less confusing than C++. The Apple framework was ideal for web publishing, but the problem was that hardly anyone used it. Fortunately Karas was able to find two well-known reference customers who were using the software kit: NASA and IBM. Although neither used WebObjects in the demanding way Karas envisaged, it was enough to get him the green light to proceed.

“We were doing agile development before there was an agile. It was extensible, it was loosely coupled, and it was database-driven but not at the outward layer. The core of it is still there,” he said.

Tapping into existing internal BBC systems to scrape content was an inspired move, and a typically British piece of improvisation. Many radio production staff were located overseas, and there was no budget to fly them to London to teach them coding. The web news team wondered how to lift the journalists' work with minimal disruption.

The developers decided to add three simple instructions to the existing workflow: the correspondents or producer must add a headline to the radio script, they must spell correctly, and they must not leave in cueing information – such as advice for a continuity announcer on how to pronounce a name, or when to discard a given script.

“There were bad at headlines at first, and they would write a whole paragraph for them, but they got the hang of it,” said Karas. “But it worked. An editor sitting in Moscow who had never seen a web page was producing web pages.”

For the website design, the team took a leaf from The Register, according to Karas. Although El Reg was not yet a full-time operation – that came in April 1998 – it was was one of the first popular UK news sites. The Reg had a rolling front page with older stories scrolling off the bottom. The BBC designers devised an area in which major stories could be manually positioned, ideal for busy news days, and all other articles were simply listed in order of publication time and date.