I gave a lecture at Drexel this week on non-relational databases and “big data”. The slides are up. They are all new since last time; the world of NoSQL and Big Data has changed a whole lot in 2 years :)

The number of Java workloads running on virtualized infrastructure has been increasing exponentially over the last few years. Advancements in processors and hypervisor technology now make virtualizing Java a compelling proposition. However, there are still best practice provisos and considerations, particularly in the area of JVM memory management.

This talk will present a lot of the innovation, practical insight, and lessons learned gained from the last year by a senior engineer from VMware who recently developed a Java “ballooning” solution called Elastic Memory for Java (EM4J)

I really enjoy reverse engineering stuff. I also really like playing video
games. Sometimes, I get bored and start wondering how the video game I’m
playing works internally. Last year, this led me to analyze Tales of Symphonia
2, a Wii RPG. This game uses a custom virtual machine with some really
interesting features (including cooperative multithreading) in order to
describe cutscenes, maps, etc. I started to be very interested in how this
virtual machine worked, and wrote a (mostly) complete implementation of this
virtual machine in C++.

However, I recently discovered that some other games are also using this same
virtual machine for their own scripts. I was quite interested by that fact and
started analyzing scripts for these games and trying to find all the
improvements between versions of the virtual machine. Three days ago, I started
working on Tales of Vesperia (PS3) scripts, which seem to be compiled in the
same format as I analyzed before. Unfortunately, every single file in the
scripts directory seemed to be compressed using an unknown compression format,
using the magic number “TLZC”.

Continuing the Chrome extension hacking (see part 1 and 2), this time I’d like to draw you attention to the oh-so-popular AdBlock extension. It has over a million users, is being actively maintained and is a piece of a great software (heck, even I use it!). However - due to how Chrome extensions work in general it is still relatively easy to bypass it and display some ads. Let me describe two distinct vulnerabilities I’ve discovered. They are both exploitable in the newest 2.5.22 version.

In Maryland, job seekers applying to the state’s Department of Corrections have been asked during interviews to log into their accounts and let an interviewer watch while the potential employee clicks through wall posts, friends, photos and anything else that might be found behind the privacy wall.

Here’s an experiment anyone can do: Go get your Apple IR
remote. The LED emits at 980nm, or about 306THz, in the
near-IR spectrum. Relatively speaking, this is just outside
of the visible range. Take the remote into the basement, or
the darkest room in your house, in the middle of the night,
with the lights off. Let your eyes adjust to the
blackness.

Above: Apple IR remote photographed using a digital
camera. Though the emitter is quite bright and the
frequency emitted is not far past the red portion of
the visible spectrum, it’s completely invisible to the
eye.

Can you see the LED flash when you press a button
[4]? No? Not even the tiniest amount?
Try a few other IR remotes; most use an IR wavelength even
closer to the visible band, around 310-320THz. You won’t be
able to see them either, even though they would be
blindingly, painfully bright if they were in the visible
spectrum.

Above top: Frequency of an Apple IR remote emitter relative to the full visible spectrum.

These near-IR LEDs emit at about 20% beyond the visible
frequency limit. 192kHz audio extends to 400% of the
audible limit. Lest I be accused of comparing apples and
oranges, auditory and visual perception drop off similarly
toward the edges.

01192012 - The graphical models tab has links to video lectures on
tutorials on the subject (this is mainly for students who didn’t
get to attend the class by Mike Jordan and Martin Wainwright).

01182012 - The systems slides are available now (follow the systems link)

01182012 - Updated project guidelines

Overview

Scalable Machine Learning occurs when Statistics, Systems, Machine
Learning and Data Mining are combined into flexible, often
nonparametric, and scalable techniques for analyzing large amounts of
data at internet scale. This class aims to teach methods which
are going to power the next generation of internet applications.

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like CAP theorem are well applicable to the NoSQL systems. At the same time, NoSQL data modeling is not so well studied and lacks of systematic theory like in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.

To explore data modeling techniques, we have to start with some more or less systematic view of NoSQL data models that preferably reveals trends and interconnections. The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases: