This is a guest post by Arnaud Granal, CTO at Adcash.
Adcash is a worldwide advertising platform. It belongs to a category called DSP (demand-side platform). A DSP is a platform where anyone can buy traffic from many different adnetworks.
The advertising ecosystem is very fragmented behind the two leaders (Google ... Continue Reading »

Hey, it's HighScalability time:
Ever feel like howling at the universe? (Greg Rakozy)
If you like this sort of Stuff then please support me on Patreon.
10 billion: API calls made every second in Google datacenters; $767,758,000,000: collected by Apple on iPhones sold to the end of June; 20: watts of power consumed by ... Continue Reading »

Yandex.Metrica is the world's second largest web analytics system. Metrica takes in a stream of data representing events that took place on sites or on apps. Our task is to process this data and present it in an analyzable form.
Processing the data in itself is not a problem. The real ... Continue Reading »

Hey, it's HighScalability time:
Earth received Cassini’s final signal at 7:55am ET. Let's bid a fond farewell. After a 13-year tour of duty, job well done!
If you like this sort of Stuff then please support me on Patreon.
12.9 million: DynamoDB requests per second on Prime Day; 4 billion: transistors on Apple's ... Continue Reading »

Thanks to zero marginal cost digital production methods, we're seeing content markets—for the first time—develop in conditions free from supply and price constraints.
In the process we've learned something: consumers have an unquenchable thirst for new content; content creators are willing to oblige with an equally prodigious stream of new content; ... Continue Reading »

The encoding of x86 and x86-64 instructions is well documented in Intel or AMD’s manuals. However, they are not quite easy for beginners to start with to learn encoding of the x86-64 instructions. In this post, I will give a list of useful manuals for understanding and studying the x86-64 ... Continue Reading »

The metadata checkpointing in HDFS is done by the Secondary NameNode to merge the fsimage and the edits log files periodically and keep edits log size within a limit. For various reasons, the checkpointing by the Secondary NameNode may fail. For one example, HDFS SecondaraNameNode log shows errors in its ... Continue Reading »

Hey, it's HighScalability time:
May you live in interesting times. China games swarming drone attacks. Portable EMP anyone? (Tech in Asia)
If you like this sort of Stuff then please support me on Patreon.
100GB: entire corpus of articles written at the NY Times; 80GB: data for one human genome; 3%: Linux desktop market share; ... Continue Reading »

Introduction
In general, if we want to debug Linux Kernel, there are lots of tools such as Linux Perf, Kprobe, BCC, Ktap, etc, and we can also write kernel modules, proc subsystems or system calls for some specific debugging aims. However, if we have to instrument kernel to achieve our goals, ... Continue Reading »

Introduction
As we know, network subsystems are important in computer systems since they are I/O systems and need to be optimized with many algorithms and skills. This article will introduce how QEMU/KVM [2] network part works. In order to put everything simple and easy to understand, we will begin with several ... Continue Reading »

Abstract
Most popular task monitor systems (such as top, iotop, proc, etc) can only get tasks’ disk I/O information like tasks’ I/O utilization percentage every seconds due to kernel timer/tick frequency and high time cost of system interfaces. This article presents I/O Microscopy, a new way to get tasks’ disk I/O ... Continue Reading »

Motivation
Recently, I find it is hard to know the percentage of time that one process uses to wait for synchronous I/O (eg, read, etc). One way is to use the taskstats API provided by Linux Kernel [1]. However, for this way, the precision may be one problem. With this problem, ... Continue Reading »

Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is ... Continue Reading »

Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts, recommendations, and stock levels that can be solved using data analysis methods. The rise of omni-channel retail that integrates marketing, ... Continue Reading »

Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for ... Continue Reading »

Benchmarks are important to understand the performance and quantitative and qualitative comparison of different systems. Many analytic frameworks, such as Hive, Impala and Shark, are designed and implemented these years and become fundamental software for processing big data. How to benchmark these big data analytic systems is an interesting problem.
The ... Continue Reading »

Please share if you like this post:

2 comments:

Note for blog authors: if you do not want your articles appear here (we just post a excerpt, not the full content), please drop me a message and I will delete them. If you have good suggestions on blogs/sites (with a RSS feed) to add to this list, please also let me know.

Yeah, the poll() function is broken on MacOS and therefore is not supported in Python for the Mac.The select library supports other polling mechanisms; it essentially exposes whatever the OS supports. Let me look into an update to the code that will use kevent on Macs.