CCK and Field API

CCK is the most useful module for Drupal. It adds custom fields and content types to Drupal. Drupal 7 incorporates the community efforts of CCK into the core as the Field API.

The CCK module allows a content type to have multiple fields with various field types and different field widgets and formatters. A field must be assigned a widget to define its input style and at least one formatter to define its display style. The UML diagram above describes the relationship between content types, fields, field widgets, and field formatters in creating a CCK type.

Reuse Fields

When a CCK field is added to a content type for the first time, this CCK field’s is created in Drupal as a class and an instance is assigned to the given content type. When the field is assigned to a content type, its configuration parameters are stored in the instance instead of the class. Instead of adding new fields to a content type, adding existing fields is a better option to reduce the system’s complexity and to improve scalability.

Adding an existing field only requires the administrator to choose a field from a list of fields defined in other content types and select a widget to define the input style. The most recently created field instance brings in the default parameters that can be changed later.

Reuse Fields with the API

CCK allows adding customized fields, widgets, and formatters in modules. Many third-party modules (Drupal CCK modules list) are already created to accomplish different tasks, including images, videos, and other internal and external references. CCK for Drupal 6 provides a set of API (CCK Developer Documentation for Drupal 6) for module developers. Drupal 7 provides native Field API for module developers.

Performance Issues

CCK (or Field API in Drupal 7) adds extra complexity to a Drupal system. When creating a new field, the field’s definition is added to the field class table and the field’s configuration is added to the field instance table; meanwhile, a new table is added to the Drupal database to store the field data. Database tables add complexity to the system. In addition, queries of nodes will incur JOIN expressions of tables to field data. Multiple JOINs will impact database performance since MySQL responds poorly to queries with multiple JOINs of tables if not properly configured.

Reuse of fields can reduce the number of tables in the Drupal database. For example, if 10 image fields, field_image_a, field_image_b, …, field_image_j, are added to the system, 10 tables are added to the database. If a single content type only utilizes two image fields, one thumbnail and one image, we can redefine the fields as field_image_thumbnail and field_image. Only two tables are introduced to the database with the latter configuration.

Reuse of fields can also reduce the system’s complexity. Instead of creating and maintaining 10 different fields, Drupal admins maintain only two fields and their documentation. Database administrators only need to improve performance of two extra tables. KISS is always a good principle.

Creating Drupal sites is easy and requires no fancy skills. Drupal installation is simply one-click; setting up modules is simply one-click; creating new content types is also simple mouse clicks. Unfortunately, the power of control over Drupal is usually abused because of Drupal’s initial impression of simplicity. When building a hobby site, Drupal entities, such as blocks, content types, views, and URLs, are created randomly without deliberate consideration. Meanwhile, Drupal gurus would probably go in another approach by carefully planning names ahead.

Naming conventions

Naming conventions are overhead for most casual hobby use of Drupal. When a hobbyist wants to install Drupal for the first time, the guy won’t gain anything from naming conventions. When the hobbyist becomes a Drupal professional and sets up his tenth Drupal installation for a client, the guy may want some naming conventions for blocks, fields, content types, and views he created so he can easily maintain the other nine Drupal websites.

The Drupal community has defined coding standards for naming functions and variables. This standard is roughly based upon PEAR Coding Standard. Unfortunately PHP does not support either dot-separated packages or namespaces. Neither does Drupal support namespaces for its variable names and machine names for fields, content types, views, and other Drupal entities.

Human-readable Names and Machine-readable Names

Drupal demands two different types of names: a human-readable name and a machine-readable name.

The human-readable name is a text field containing any character. Drupal stores it as plain text into the database and treats it as plain text to display. Drupal recommends the human-readable name to contain only alphanumerics and spaces. However, this is not a strict restriction. The human-readable name can be used for applying naming conventions.

The machine-readable name is a text field containing only lowercase letters, numbers, and underscores. Drupal usually uses the machine-readable name directly as PHP variable names, database table names, and database field names. The machine-readable name must then follow the strict character restrictions. Machine-readable names only allow the use of lowercase letters, numbers and underscores. Underscore becomes the only option to define namespaces in machine names.

Building Conventions

Naming conventions to Drupal developers are kinda the same thing as coding standards to programmers. Building conventions among Drupal developers is reaching consensus among a team of developers. The technical leader is responsible for building up conventions in his team.

Consumer Consensus Although consumers may not see any machine-readable names explicitly on web pages, human-readable names are visible to consumers in many menu items. Developers must realize that human-readable names are consumed by end users. Display names are not only meaningful to developers, but also meaningful to end-users. In addition, end-users are also concerned about entity descriptions.

Developer Consensus Developers may reach an agreement about naming conventions.

The above example about academic journals and articles is defined by use case. All journal-related items belong to the journal subsystem of the acad scope because journals and articles are designed within the journal subsystem. Along with the content types, developers can create journal related blocks and views following the namespace webinit_acad_journal_. For example,

The functionality domain is more useful for functions like node reference views and other assisting-purpose views. For example, in webinit_acad_journal_issue content type, it has a node reference field of journals from a view dedicated to listing journals. The view follows the pattern,
> * Name: Node reference view of journals
> * Machine: noderef_journal

However, this view can also be put into the namespace specified above,
> * Name: Node reference view of journals
> * Machine: webinit_acad_journal_noderef_journal

By packaging content types and views into the same namespace, users are able to focus on the problem scope and the set of features in Drupal provided by developers. Developers can easily find out bugs within the scope during maintenance.

Old design

New design

nginx

Recent nginx updates support try_files and internal location directives. These features make nginx more flexible as a web server for Drupal.

try_files checks for existence of files in order, and returns the first file that is found. In Drupal’s logic, try_files enables the server to check Boost-generated cache, imagecache images, and Drupal installation in order.

@location syntax for internal locations. Internal locations are not exposed directly via nginx. They are accessible by try_files, customized 40x messages, and rewrites.

drupal

Using try_files and @location syntax together provides an easier way to run Drupal.

1

2

3

4

5

6

7

8

9

10

11

12

location/{

try_files$uri$uri/@drupal;

}

location@drupal{

rewrite^/(.*)$/index.php?q=$1last;

}

location~.php${

fastcgi_pass127.0.0.1:3456;

fastcgi_index index.php;

fastcgi_param SCRIPT_FILENAME$document_root$fastcgi_script_name;

include fastcgi_params;

}

Most FastCGI parameters are in fastcgi_params which comes by default in nginx installation.

At UPEI our web pages are powered by an open-source web platform Drupal but served as static pages that are mirrored (in our terminology, scraped) by httrack to a front-end server. Most components of web pages are static except emergency messages, contact forms, and some bits of media files. All external access goes to the front-end server, while only a few requests reach our back-end server through the university firewall.

INFRASTRUCTURE

Our system is constructed by five different pieces: A front-end web server (at the same time a reverse proxy), a back-end web service and HTTP media server, a back-end production server, a development server and a database server. The front-end web server, the back-end production server, and the development server are all based on Debian Linux and an old but very stable Apache 1.3. The web service and media server is based upon a very fast and reliable HTTP server Nginx. Our database server is MySQL 5.1.

CHALLENGES

The original infrastructure has only the front-end static HTTP server and the back-end HTTP Drupal server. While most content is static on our website, we still need some dynamic content for feeds, emergency messages and forms. The back-end HTTP Drupal server handles too much PHP requests and is dying.

The major issues I am concerned about:

Performance. Our infrastructure must handle all hits for emergency situations. In other words, external access to Drupal must not rely on Apache.

Security. All external inputs must be filtered, monitored, and isolated from the production server.

Reliability. Production server down time must not affect public access.

Scalability. The infrastructure must be open to future expansion.

The bottleneck of our system was in the dynamic part.

HTTP SERVERS

The front-end server is a stable Debian Linux installation that serves all static pages and acts as a reverse proxy server to web services and legacy systems. Since our daily page views are well under 1 million per day, the server runs happily with Apache 1.3 as a static server. Small media files are reversely proxied to the back-end media servers and kept with Apache caching.

The back-end production server provides Drupal access to all content managers in the university. The development server is a sandbox server for theme development and module development. Both these two servers run on Debian Linux and Apache 1.3 and connect to separate database servers.

The media and forms server runs on Nginx to provide media file downloading/streaming and non-cacheable AJAX responses. It has restrictive access to the production database server and most POST requests are filtered and monitored. Nginx is well-known for its performance and scalability. WordPress.com runs on Nginx as a load-balancer.

OPTIMIZATION

Compression. All texts including html files, javascripts, and css stylesheets are encoded with mod_gzip in the front-end server.

Cache in the client side. All images, and fonts are cached in the browser by Expires header and Cache-Control header for at least 45 days. ETag is properly disabled for binary content. This optimization has significant improvement for the second visit. Our home page is significantly large in size (very graphics oriented for marketing purposes). The first visit may be slow (2.58MB in size). Client-side cache, however, improves the second visit to about 30KB to 50KB. Large images are also loaded in the background instead.

Cache in the server side. Small media files are cached in the front-end server to prevent proxy access to the back end.

Home page CSS refresh issue. HTTP cache control and expires headers are used in the front-end server to make client browsers load the home page every visit.

I have a fresh website based on Apache+PHP5 to be converted into Nginx and PHP5-FastCGI. What can I do?

Stage 1 CGI version of PHP5

Nginx only supports CGI version of PHP5 (not the Apache module). In FastCGI mode, PHP5 runs like a server that forks out a number of children to handle incoming requests. This number is indicated in the start-up script. It can be any number where necessary. Of course, we would not blow up our servers, so memory_limit*number of PHP children < available memory.

In Debian/Ubuntu systems, we can simply install php5-cgi in one line:

1

root@domU:~# apt-get install php5-cgi

This will install the CGI version of PHP5 that includes FastCGI support. Any modern Linux distribution would come with such a similar package management system. After installation, run the following command to confirm that PHP has FastCGI enabled.

Stage 2 Spawn the FastCGI server

PHP5-CGI binary supports to serve up as a FastCGI server. However, setting up the environment is complicated with PHP5-CGI binary. Instead, we can use a general FastCGI spawn-er from Lighttpd to help create the service. Download the latest version of Lighttpd from here, extract the package, run the configure script, make, and copy spawn-fcgi binary to /usr/bin.

1

2

3

root@domU:~/lighttpd-1.4.20# ./configure

root@domU:~/lighttpd-1.4.20# make

root@domU:~/lighttpd-1.4.20# cp src/spawn-fcgi /usr/bin

Then we can spawn the PHP5-FastCGI like this:

1

2

3

4

root@domU:~# /usr/bin/spawn-fcgi -f /usr/bin/php5-cgi

-a127.0.0.1-p16000-C5-F2

-P/var/run/fastcgi-php.pid

-uwww-data-gwww-data

This command will instantiate two PHP5 FastCGI processes (each of which have 5 children) and bind them to 127.0.0.1 (localhost) and port 16000. So we have ten processes listening for PHP requests. The PHP processes run under www-data permission.

Stage 3 Build Nginx

Imagine how one man can beat the world? Nginx (Engine X) is a blazingly super fast HTTP server written by Ignor Sysoev. According to Netcraft in December 2008, Nginx serves or proxied 3.5 millions of virtual hosts in the 3rd place of the market. 2 of Alexa Top-100 sites use Nginx.

Download Nginx from its official site and extract the tarball, then run:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

root@domU:~/nginx-0.7.34# ./configure --with-http_ssl_module

--with-http_realip_module--with-http_addition_module

--with-http_sub_module--with-http_dav_module

--with-http_flv_module--with-http_stub_status_module

--with-mail--with-mail_ssl_module

--http-log-path=/var/log/nginx/access.log

--http-client-body-temp-path=/mnt/nginx/client

--http-proxy-temp-path=/mnt/nginx/proxy

--http-fastcgi-temp-path=/mnt/nginx/fastcgi

--pid-path=/var/run/nginx.pid

--lock-path=/var/lock/nginx.lock

--sbin-path=/usr/sbin

--error-log-path=/var/log/nginx/error.log

--conf-path=/database/configuration/nginx/nginx.conf

--user=www-data--group=www-data--with-sha1=/usr/lib

root@domU:~/nginx-0.7.34# make &amp;&amp; make install

Nginx is configured with most useful modules. Note that –http-client-body-temp-path, –http-proxy-temp-path and –http-fastcgi-temp-path are cache directories used by Nginx. Default user and group can be configured to the system’s default user for http service instead of nobody, although they can also be configured at runtime.

Stage 4 Run Nginx

Starting up Nginx is simple and straight. After properly configuring your nginx settings, just type nginx and hit return. Then it will start. I also provide a set of Nginx configuration here to simplfy your process. There are several important pieces of code to make Drupal work under Nginx in the configuration.

First, we have websites that have RSS output, such as UPEI‘s website, so you can use Drupal to aggregate news and information from them. The mobile version should not generate content, but it serves only as an aggregator. Drupal’s cron job will automatically update feed items. UPEI’s mobile website aggregates feeds from UPEI websites, including media releases, department notices, and other feedable information.

Second, we use a mobile theme for Drupal as the basic theme for mobile browsers. This theme places blocks from top to bottom, including left sidebar, content top, and right sidebar. The navigation menu can be placed in the left sidebar. We also need to modify the template file page.tpl.php to suit our need, such as the header and footer and other signatures. We have to change

Third, we use an override stylesheet to provide extra styles for Webkit based browsers, such as MobileSafari on iPhone and Android’s browser. This stylesheet overrides font sizes and display element sizes and word break settings.

Amazon EC2 is an amazing service for those who want stability, scalability, and extensibility. Technically speaking, EC2 is an on-demand VPS (Virtual Private System) for which you pay when you need. EC2’s upside is that no customer service and additional payment transactions are involved if a server is “purchased.” EC2’s service is paid by instance hours. If an instance is not running, you do not need to pay for it. EC2’s instances support up to 8 cores and 17GB memory. Its elastic block store supports unlimited storage space that is pay-as-you-want.

Considering how unstable the MediaTemple (gs) that I am using, EC2 is the next round for me. EC2 provides better supported, more stable, flexible, and robust than any other VPS competitors in the market, iff you are geek-enough to use it.

Ragel is a State Machine Compiler that supports generating code from Ragel’s regular expressions. Ragel provides code generation for C, C++, Objective-C, D, Java, and Ruby. Regular expressions and finite automata can be used in protocol analysis, data parsing, lexical analysis, and input validation. Implementing Ragel’s C code is very easy. Here is an atoi implementation for C’s standard library. It is several times faster than C standard library’s implementation.