memory

Modern computers have insane amounts of processing power compared to computers from 5 years ago. Computer memory and storage is cheap but that is no excuse to design and develop bloated webpages and apps. Consumers and customers are very impatient and there are loads of statistics on users abandoning an app or website because it takes more than three seconds to respond or load an app.

Advertisement:

You can control the speed of software running on your home computer by upgrading it but you cannot guarantee the performance of apps that run on shared hosting platforms or web hosts. You can buy a CPanel based web-host or a dedicated server from $5 a month, how can they make money? They do this by virtually hosting your service (web server etc) alongside other hosts and running multiple services on a single processor core. Shared servers are very economical but you are sharing the resources with other users.

If you want maximum performance you can always buy a dedicated server from a cloud server provider but each provider may secretly share the resource’s of that server (more information here: http://blog.cloudharmony.com/2014/07/comparing-cloud-compute-services.html ) and performance may be impacted. Dedicated servers can be very expensive and can run into thousands of dollars per month.

So what can I control?

Writing (or installing) good code is essential, try and optimize everything and know your server’s limitations and bottlenecks. To understand bottlenecks, you need to know about computer hardware. A few lines of code can trigger millions or billions of actions inside a processor.

A computer has the following major components:

Hard drive (HDD/SSD): This is where your operating system, software and files are stored when the computer is turned off. Hard drives store magnetic charges (0’s and 1’s) onto spinning round metal platters. A zero is a negative charge and a 1 is a positive charge. Hard drives spin at 5400~15,000 RPM. Data is written with a read/write needle that needs to be positioned over the data bit to read and write. Hard drives are very slow but reliable and each data bit can be read/written to tens of thousands of times. Faster solid state drives don’t use spinning metal platters and work a bit like memory (see below). Solid State drives have limited writes per sector though. Read More: https://en.wikipedia.org/wiki/Hard_disk_drive

Memory (RAM): Computer memory is basically a large array or very fast storage that the processor reads and writes data (0’s and 1’s). Memory is like a massive spreadsheet grid and accessing data from memory is 1000x faster than accessing data from a hard drive. Memory stores data as static charges in silicon microchips and each storage bit can be changed millions of times. When a computer is turned off the memory is wiped. Read More: https://en.wikipedia.org/wiki/Computer_memory

Processor (CPU): This is the chip that does the primary calculations and controls just about everything. A processor can perform various pre determined functions and read and write to memory/hard drives or send data over a USB cable or network connection. Processor’s are quite dumb and it has to keep queues (pipelines) of things to do in it’s internal cache (memory) between cycles. A clock cycle is single step where the processor (and all of it’s cores) do one thing and get ready for the next clock cycle, all clock cycles in a software routine are linked and if one instruction fails all following linked instructions have to be cleared and dealt with or errors and blue screens can happen. A processors speed is a total of how may clock cycles it can perform in a second and a modern computer can process 3,500,000,000 (3.5 Ghz) cycles a second. A processor can calculate one complex instruction or multiple simple instructions in one cycle. Most processor have multiple cores that can each perform calculations in a clock cycle. But don’t be fooled many clock cycles are spent waiting for data to be read/written from memory/hard drive or loaded from the processors cache. A processors instruction pipeline has 4 main states for each possible action in a cycles execution pipeline (“Fetch”, “Decode”, “Execute” and “Write back”). (e.g The processor may be asked to add (fetch) variable1+variable2, the (decode) gets the values from memory, (execute) performs the calculation and “write back” writes the result back to memory. ) See a complete list of Intel instruction here and here ). Read More: https://en.wikipedia.org/wiki/Central_processing_unit

Processors are mostly waiting for data to be fetched to be processed. There is no such thing as 100% efficient code.

If software needs to read a file from a spinning hard drive has a mandatory latency period (https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics ) where the hard drives read needle moves in or out and reads the data form the right sectors and returns the data. A 3.5 Ghz computer has to wait an approximate 19,460,000 clock cycles for a sector on a hard drive to be under the read head. The data still has to be moved from the hard drive through the processor and into memory. Luckily processors have fantastic calculation branch prediction abilities ( https://en.wikipedia.org/wiki/Branch_predictor) and even though the software has asked for a file to be read the processor can work on 19 million other cycles before checking to see if the data has returned from the hard drive.

Caching content

One solution is to have software or servers cache certain files in memory to speed up the delivery of files. The latest DDR4 computer memory runs as blistering speeds of 2,400Mhz (2,400,000,000 cycles a second) so it should keep up with a 2.4Ghz computer? Memory is cheap and fast but computer memory has a huge limitation. You can’t just ask memory to return the value of a memory cell and expect it in a few cycles. The processor has to essentially guide the memory module to activate the required electrical columns and rows to allow that that value to be read and return it to a processor. This is like a giving instruction to a driver over a phone, it takes time for the driver to listen, turn a corner, drive down a street and then turn another corner just to get to the destination. The processor has to manage millions of memory read and writes a second. Memory can’t direct itself to the memory value, the processor has to do that.

Memory timings are called RAM timings and it is explained better here ( http://www.hardwaresecrets.com/understanding-ram-timings/ ). It takes modern DDR4 memory module about 15 clock cycles to just enable the column circuit for a memory cell to be activated, then another 15 clock cycles to activate the row and a whole load of other cycles to read the data. Reading a 1 MB file from memory may take 100,000,000 clock cycles (and that is not factoring in the processor is working on other tasks. A computer process is a name given to software code that has been handed over to the processor, software code is loaded into the processor/memory as instructions and depending on the code and user interactions different parts of the software’s instructions are loaded into the processor. In any given second a computer program may enter and leave a processor over 1,000 times and processors internal memory is quite small.

Benchmarking

Choosing a good host to place your website/mobile app or API’s is very important, sometimes the biggest provider is not the fastest. You should benchmark how long actions take on your site and what the theoretical maximum limit is. Do you need more memory or cores? Hosts will always sell you more resources for money.

http://www.webpagetest.org/ is a great site to benchmark how long your website takes to deliver each part of your website to customers around the world. You can minify (shrink) your code and images to reduce the processing time per page load.

Placing your website or application databases close to your customers. In Australia, it takes 1/5 of a second minimum for a server outside of Australia to respond. A website that loads 30 resources would also add the delays between your server and customers (30×1/5 of a second add’s up).

Consider merging and minifying website resources ( http://www.minifyweb.com/ ) to lower the number of files and file sizes that you deliver to users. Most importantly monitor your website 24/7 to see if it is slowing down. I use http://monitis.com to monitor server performance remotely.

Summary

I hope I have not confused you too much. Try some videos below to learn more.

PHP Opcache is a good caching plugin for PHP but what if you wanted a quicker in code way of selectively caching MySQL results in memory. More information on the underlying memcached.

Advertisement:

Official Description:

Description: A high-performance memory object caching system. Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.

Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.

Memcached optimizes specific high-load serving applications that are designed to take advantage of its versatile no-locking memory access system. Clients are available in several different programming languages, to suit the needs of the specific application. Traditionally this has been used in mod_perl apps to avoid storing large chunks of data in Apache memory, and to share this burden across several machines. Caching other content is a good idea too.