AT&T Researchers — Inventing the Science Behind the Service

You are welcome to download and use the software tools appearing on this page that have been developed by AT&T Labs researchers. Please reference the individual project web pages for specific license agreements. If an available license agreement does not meet your needs, please contact attip@att.com for assistance with a customized license.

AST (AT&T Software Technology) OpenSource software is a collection of libraries and commands for UNIX and Windows. Included are re-implementations of many POSIX and X/Open APIs and utilities. It provides a portable and efficient environment that behaves consistently across a range of operating system and hardware implementations. portable because one collection of source builds unattended on all target architectures, efficient because the underlying algorithms are continually updated to match best in class performance.

Some of the more popular components include: cdt, dss, ksh93, nmake, pax, sfio, vcodex, and vmalloc. AST has been used internally in AT&T since the mid 1980's, and was released as OpenSource in 1999. Documentation, source and binaries are currently available under the OpenSource Eclipse Public License 1.0 at AT&T AST/UWIN OpenSource downloads.

A Container Data Type library. This provides all common containers such as list, stack, queue, ordered set/bag, unordered set/bag, etc. based on a uniform API. All containers can be used in a concurrent environment with either multiple threads or multiple processes (using shared memory). There is also a hash table data structure that provides lock-less reads and less-lock updates using atomic scalar operations.

ECharts is a state machine-based programming language for event-driven systems derived from the standardized UML Statecharts language. ECharts distinguishes itself from other Statecharts dialects by focusing on implementation issues such as determinism and code re-use. Like Statecharts, ECharts supports hierarchical state machines, concurrent machines and a graphical syntax. Unlike Statecharts, ECharts supports a simple textual syntax, machine reuse, multiple transition priority levels to minimize non-determinism, machine arrays, and a new approach to inter- and intra- machine communication. ECharts is a hosted language which means that it is dependent on an underlying programming language such as Java. ECharts has a proven track-record in a large-scale commercial deployment.

FastRWeb: FastRWeb is an infrastructure for web-based reporting, data analysis and visualization using R.

It leverages the capabilities of R such as graphics, statistical models, data access and manipulation to make it easily available on the Web. It can be also used to quickly create REST services based on data and/or analyses.

Several internal and external projects are built using FastRWeb including the MyVerse Recommender, Mobility forecasting and EPI at Yale

GGobi is an open source visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots. Plots are interactive and linked with brushing and identification.

Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. Automatic graph drawing has many important applications in software engineering, database and web design, networking, and in visual interfaces for many other domains. Graphviz is open source graph visualization software. It has several main graph layout programs.

Nanocubes are a fast datastructure for in-memory data cubes developed at the Information Visualization department at AT&T Labs – Research. Nanocubes can be used to explore datasets with billions of elements at interactive rates in a web browser, and in some cases it uses sufficiently little memory that you can run a nanocube in a modern-day laptop.

PADS is a system that simplifies processing ad hoc data sources. Its users can declaratively describe data sources and then use generated tools to understand, parse, translate, and format data.

Rserve: Rserve is a client/server infrastructure allowing the use of R from a large number of languages and environments.

Clients are available for C++, Java, R, Python, Ruby and other languages. It also support a variety of protocols such as HTTP, Websockets, QAP, TLS, ...

Rserve is used as a backbone for many projects including FastRWeb and RCloud. It can also be used for high-performance distributed computing (see Rserve.cluster) as well as computation on distributed storage (see RCassandra).

Data transformation platform. This can be used to compress, encrypt, checksum and transcode data. The platform is structured in three layers:

A command Vczip processes data by composing different transforms to tailor to needs.

A C library of data transforms. General transformations include encryption based on AES (sanctioned by NIST for top-secret data), compressing relational data files (records and fields), tables (2-dimensional arrays), delta compression, etc. Various special purposed compressors are available for compressing common formats of AT&T Billing data, Netflow, etc.

Programmers often need to use information on Web pages as input to other programs. This is done by Web Scraping, writing a program to simulate a person viewing a Web site with a browser. It is often hard to write these programs because it is difficult to determine the Web requests necessary to do the simulation. The Web Scraping Proxy (WSP) solves this problem by monitoring the flow of information between the browser and the Web site and emitting Perl code fragments that can be used to write the Web Scraping program. A developer would use the WSP by browsing the site once with a browser that accesses the WSP as a proxy server. He then uses the emitted code as a template to build a Perl program that accesses the site.

The Yoix scripting language is a general-purpose programming language that uses syntax and functions familiar to users of C and Java. It is not an object oriented language, but makes use of over 150 object types that provide access to most of the standard Java classes.

dss: dss (data stream scan) is a framework for describing, transforming, reading, querying, and writing streams of record oriented data.

It is implemented as a command and library API. The API is extended by plugins (DLLs / shared libraries) that define data domain specific I/O, type and query functions. dss compares favorably against perl, the typical recourse in the networking community, and against customized C/C++ code written to deal with single domain datasets. Supported data includes Netflow, BGP, HTTP proxy and server logs, OSPF LSA, HTML, Json, and generic flat file formats via XML schemas. Type support includes IPv4 and IPv6 address formatting and longest prefix matching (using the iv library), AS path regular expression matching, second and subsecond time querying and formatting, and numeric types including BCD and IBM floating point. (download)

fastshp: Tools for manipulation of shapefiles, optimized for speed in order to handle very large shapefiles (such as complete TIGER/Line databases).

iPlots provide highly interactive graphics (brushing, linking, direct manipulation, ...) for data anaysis. It leverages GPU acceleration to support visualization of large data. It can be used in conjunction with R or as a stand-alone software.

iPlots is a package for the R statistical environment which provides high interaction statistical graphics, written in Java. It offers a wide variety of plots, including histograms, barcharts, scatterplots, boxplots, fluctuation diagrams, parallel coordinates plots and spineplots. All plots support interactive features, such as querying, linked highlighting, color brushing, and interactive changing of parameters.

The iv algorithm uses interval matching which is about 2X slower than the lpm retrie algorithm for IPv4 addresses. However, for applications that must do both IPv4 and IPv6 LPM, iv may be the better choice because the same iv API may be used for matching addresses of any length (including but not limited to IPv4 and IPv6 addresses.) (download)

To download, click here. Uncompress. Read the included readme. Email us with any questions or suggestions. Check back for updates!

This version of SPaRKy is different from the first one in several ways.

More rules -- this version of SPaRKy comes with a large set of rules automatically extracted from the Penn Treebank. This set of rules may be extended by hand.

Different ways to apply the rules -- a user of this version of SPaRKy may rules that do all three sentence planning tasks at once (content ordering, sentence aggregation, and discourse cue insertion), or may use rules that do each task in turn.

Uses SimpleNLG -- this version of SPaRKy uses the SimpleNLG surface realizer rather than RealPro. SimpleNLG is available from here.

It is available as both a library and standalone command. BGP (Border Gateway Protocol) routers use LPM lookup on a table of IP address prefixes to determine the next hop address for each incoming packet. Routers implement LPM in silicon, but software implementations are still useful for offline analysis. Most published software approaches attempt to minimize memory size and accesses, but often at the expense of complexity. The lpm algorithm uses the AT&T retrie (radix encoded trie, recursive trie) data structure. A retrie has a simple layout and a simple search inner loop. Our timings and memory requirements match or beat the best published algorithms; we also feel that retries have the edge on simplicity. For IPv6 LPM see the iv (interval) library. (download)

A memory allocation platform. This provides a uniform interface for memory allocation that extends the well-used malloc/free/realloc interface. Memory allocation is done via general "regions" that can be built from heap memory, shared memory or from other regions. Concurrent accesses from both parallel threads and parallel processes are handled transparently. The data structure and algorithm for doing this is both faster and more space efficient than other known memory allocators.

The AT&T Statistical Dialog Toolkit (ASDT) enables developers to build spoken dialog systems that track a distribution over multiple dialog states. The engine provided by the toolkit updates this distribution efficiently, in real time, during the dialog. The engine is implemented in Python.