Data term glossary

Active data

Active data is information residing on the direct access storage media of computer systems, which is readily visible to the operating system and/or application software with which it was created and immediately accessible to users without undeletion, modification or reconstruction (i.e., word processing and spreadsheet files, programs and files used by the computer’s operating system).

Active records

Active records are records related to current, ongoing or in process activities and are referred to on a regular basis to respond to day-to-day operational requirements. An active record resides in native application format and is accessible for purposes of business processing with no restrictions on alteration beyond normal business rules.

Application

An application is a collection of one or more related software programs that enables a user to enter, store, view, modify or extract information from files or databases. The term is commonly used in place of "program" or "software." Applications may include word processors, Internet browsing tools and spreadsheets.

Archival data

Archival data is information that is not directly accessible to the user of a computer system but that the organisation maintains for long-term storage and record keeping purposes. Archival data may be written to removable media such as a CD, magneto-optical media, tape or other electronic storage device, or may be maintained on system hard drives in compressed formats (i.e., data stored on backup tapes or disks, usually for disaster recovery purposes).

Archive/Electronic archive

Archives are long term repositories for the storage of records. Electronic archives preserve the content, prevent or track alterations and control access to electronic records.

Attachment

An attachment is a record or file associated with another record for the purpose of storage or transfer. There may be multiple attachments associated with a single "parent" or "master" record. The attachments and associated record may be managed and processed as a single unit. In common use, this term refers to a file (or files) associated with an e-mail for transfer and storage as a single message unit. Because in certain circumstances the context of the attachment—for example, the parent e-mail and its associated metadata—can be important, an organisation should consider whether its policy should authorise or restrict the disassociation of attachments from their parent records.

Attribute

An attribute is a characteristic of data that sets it apart from other data, such as location, length, or type. The term attribute is sometimes used synonymously with "data element" or "property."

ASCII

(Acronym for American Standard Code) ASCII is a code that assigns a number to each key on the keyboard. ASCII text does not include special formatting features and therefore can be exchanged and read by most computer systems.

Author /Originator

The author of a document is the person, office or designated position responsible for its creation or issuance. In some cases, the software application producing the document may capture the author's identity and associate it with the document.

Backup

To create a copy of data as a precaution against the loss or damage of the original data. Most users backup some of their files, and many computer networks utilise automatic backup software to make regular copies of some or all of the data on the network. Some backup systems use digital audio tape (DAT) as a storage medium.

Backup data

Backup data is information that is not presently in use by an organisation and is routinely stored separately upon portable media, to free up space and permit data recovery in the event of disaster.

Backup tape

Backup tapes are portable media used to store data that is not presently in use by an organisation to free up space but still allow for disaster recovery.

Backup tape recycling

Backup tape recycling is the process whereby an organisation’s backup tapes are overwritten with new backup data, usually on a fixed schedule (i.e., the use of nightly backup tapes for each day of the week with the daily backup tape for a particular day being overwritten on the same day the following week; weekly and monthly backups being stored offsite for a specified period of time before being placed back in the rotation).

Bandwidth

The amount of information or data that can be sent over a network connection in a given period of time. Bandwidth is usually stated in bits per second (bps), kilobits per second (kbps), or megabits per second (mps).

Bates Production Number

A bates production number is a tracking number assigned to each page of each document in the production set.

Binary

Mathematical base two, or numbers composed of a series of zeros and ones. Since zeros and ones can be easily represented by two voltage levels on an electronic device, the binary number system is widely used in digital computing.

Bit

A measurement of data that is the smallest unit of data. A bit is either the "1" or "0" component of the binary code. A collection of bits is put together to form a byte.

Blog

A blog is a web site with frequent, chronological postings that appear at the top of the page.

Burn

Slang for making (burning) a CD-ROM copy of data, whether it is music, software or other data.

Byte

A byte consists of eight bits. The byte is a collection of bits used by computers to represent a character (i.e., "a," "1" or "&"). A "megabyte" is roughly one million bytes (1,048,576 actual bytes) and a "gigabyte" is roughly one billion bytes (1,073,741,824 actual bytes).

Coding

Compression

A technology that reduces the size of a file. Compression programs are valuable to network users because they help save both time and bandwidth.

Computer forensics

Computer forensics is the use of specialised techniques for recovery, authentication, and analysis of electronic data when a case involves issues relating to reconstruction of computer usage, examination of residual data, authentication of data by technical analysis or explanation of technical features of data and computer usage. Computer forensics requires specialised expertise that goes beyond normal data collection and preservation techniques available to end-users or system support personnel.

Cookie

Small data files written to a user's hard drive by a web server. These files contain specific information that identifies users (i.e., passwords and lists of pages visited).

DAT

(Digital Audio Tape) Used as a storage medium in some backup systems.

Data

Information stored on the computer system and used by applications to accomplish tasks.

Data compilation

Information in a format that cannot be read without first being converted or extracted. Data compilations are expressly included as ESI under Fed. R. Civ. P. 34(a) and discussed in Rule 34 Advisory Committee notes.

De-duplication

De-duplication ("De-duping") is the process of comparing electronic records based on their characteristics and removing duplicate records from the data set. This process can be done one of two ways. First, in universal or case (level) de-duplication, only a single copy of each document is retained across a whole case. Second, in custodian de-duplication, a single copy of each document is retained within any single custodian. In custodian (level) de-duplication, there will be duplicates across the whole case, but also visibility into which custodians had possession of a copy of a particular document. De-duplication may be done by comparing documents' hash values, which identifies exact copies, or by technology that identifies duplicates with only minor, non-substantive differences.

Deleted data

Deleted data is data that, in the past, existed on the computer as live data and which has been deleted by the computer system or end-user activity. Deleted data remains on storage media in whole or in part until it is overwritten by ongoing usage or "wiped" with a software program specifically designed to remove deleted data. Even after the data itself has been wiped, directory entries, pointers, or other metadata relating to the deleted data may remain on the computer.

Deleted file

A file with disk space that has been designated as available for reuse. The deleted file remains intact until it has been overwritten with a new file.

Deletion

Deletion is the process whereby data is removed from active files and other data storage structures on computers and rendered inaccessible except using special data recovery tools designed to recover deleted data.

Desktop

Usually refers to an individual PC – a user's desktop computer.

Digital

Storing information as a string of digits – namely "1"s and "0"s.

Disc/Disk

A floppy disk or a hard disk. Both types have a magnetic storage medium on which data is digitally stored. A disc may also refer to a CD-ROM.

Distributed data

Distributed data is an organisation's information that resides on portable media and non-local devices such as home computers, laptop computers, floppy disks, CD-ROMs, personal digital assistants ("PDAs"), wireless communication devices (e.g., Blackberry), zip drives, Internet repositories such as e-mail hosted by Internet service providers or portals, web pages and the like. Distributed data also includes data held by third parties such as application service providers and business partners.

Document

Fed. R. Civ. P. 34(a) defines a document as "including writings, drawings, graphs, charts, photographs, phonorecords, and other data compilations." In the electronic discovery world, a document also refers to a collection of pages representing an electronic file. E-mails, attachments, databases, word documents, spreadsheets and graphic files are all examples of electronic documents.

Document retention

The preservation of documents and data, including hard copy and electronic documents, databases and e-mails, that are created, sent and received in an organisation’s ordinary course of business.

Document retention policy

A systematic plan for reviewing, maintaining and destroying documents and data, including hard copy and electronic documents, databases and e-mails, that are created, sent and received in an organisation’s ordinary course of business.

Electronic discovery

The discovery of electronic documents and data including e-mail, web pages, word processing files, computer databases and virtually anything that is stored on a computer. Technically, documents and data are “electronic” if they exist in a medium that can only be read through the use of computers. Such media include cache memory, magnetic disks (such as computer hard drives or floppy disks), optical disks (such as DVDs or CDs) and magnetic tapes.

Electronic mail message

Electronic mail, commonly referred to as “e-mail”, created or received via an electronic mail system, including brief notes, formal or substantive narrative documents and any attachments, such as word processing and other electronic documents, which may be transmitted with the message.

Electronic record

Information recorded in a form that requires a computer or other machine to process it and that otherwise satisfies the definition of a record.

Email message store

A top most e-mail message store is the location in which an e-mail system stores its data. For instance, an Outlook PST (personal storage folder) is a type of top most file that is created when a user’s Microsoft Outlook mail account is set up. Additional Outlook PST files for that user can be created for backing up and archiving Outlook folders, messages, forms and files. Similar to a filing cabinet, which is not considered part of the paper documents contained in it, a top most store generally is not considered part of a family.

Encryption

A procedure/technology that renders the contents of a message or file unintelligible to anyone not authorised to read it.

ERP (Enterprise Resource Planning)

A way to integrate data and processes of an organisation into one single system. Usually ERP systems will have many components including hardware and software, in order to achieve integration. Most ERP systems use a unified database to store data for various functions found throughout the organisation.

Ethernet

A common way of networking PCs to create a LAN.

Extranet

An Internet based access method to a corporate intranet site by access through a security firewall. This type of access is typically utilised when two or more businesses want a common place to share electronic documents on an ongoing basis.

Family range

A description of the range of documents from the first Bates production number assigned to the first page of the top most parent document through the last Bates production number assigned to the last page of the last child document.

Family relationship

Two or more documents that have a connection or relatedness because of some common characteristics.

File

A collection of data of information stored under a specified name on a disk.

File extension

A tag of three or four letters, preceded by a period, which identifies a data file's format or the application used to create the file. File extensions can streamline the process of locating data. For example, if one is looking for incriminating pictures stored on a computer, one might begin with the .gif and .jpg files.

File server

A computer that is utilised as a storage location for files that are accessible to many computers networked together in a LAN. File servers may be employed to store e-mail, financial data and word processing information, or to back up the network.

File sharing

The ability to share files stored on the server among several users. File sharing is a key benefit of a network.

Firewall

A set of related programs that protect the resources of a private network from users of other networks.

Floppy

An increasingly rare storage medium consisting of a thin magnetic film disk housed in a protective sleeve.

Format

The internal structure of a file, which defines the way it is stored and used. Specific applications may define unique formats for their data (e.g., “MS Word document file format”). Many files may only be viewed or printed using their originating application or an application designed to work with compatible formats. Computer storage systems commonly identify files by a naming convention that denotes the format (and therefore the probable originating application) (e.g., “DOC” for Microsoft Word document files; “XLS” for Microsoft Excel spreadsheet files; “TXT” for text files; and “HTML” (for Hypertext Markup Language (HTML) files such as Web pages). Users may choose alternate naming conventions, but this may affect how the files are treated by applications.

Fragmented data

Fragmented data is live data that has been broken up and stored in various locations on a single hard drive or disk.

FRCP

(Federal Rules of Civil Procedure) Laws that set forth litigation protocol in the federal court system.

FTP

(File Transfer Protocol) An Internet protocol that enables you to transfer files between computers on the Internet.

GIF

Gigabyte (GB)

A gigabyte is a measure of computer data storage capacity and is roughly a billion (1,000,000,000) bytes (1,073,741,824 actual bytes).

GUI

(Graphical User Interface) A set of screen presentations and metaphors that utilise graphic elements such as icons in an attempt to make an operating system easier to use.

Hard drive

The primary storage unit on PCs, consisting of one or more magnetic media platters on which digital data can be written and erased magnetically.

Hash

A document's unique numerical value that can be used to validate whether a copy is an exact replica. The hash value is derived by applying a mathematic formula to a long string of characters. If any characters are changed, the resulting hash value would change, indicating that the document has been modified.

HRIS

(Human Resources Information System) A software or online solution for the data entry, data tracking and data information needs of the human resources, payroll, management and accounting functions within a business. Normally packaged as a database, hundreds of companies sell some form of HRIS and every HRIS has different capabilities.

HTML

(Hypertext Markup Language) The tag-based ASCII language used to create pages on the web.

Image

An exact copy of a storage device’s contents at a point in time.

Inactive record

Inactive records are those records related to closed, completed or concluded activities. Inactive records are no longer routinely referenced, but must be retained in order to fulfil reporting requirements or for purposes of audit or analysis. Inactive records generally reside in a long-term storage format remaining accessible for purposes of business processing only with restrictions on alteration. In some business circumstances, inactive records may be reactivated.

Instant messaging (“IM”)

Instant messaging is a form of electronic communication which involves immediate correspondence between two or more users who are all online simultaneously.

Internet

The interconnecting global public network made by connecting smaller shared public networks. The most well-known public network is the Internet, the worldwide network of networks which use the TCP/IP protocol to facilitate information exchange.

Intranet

A network of interconnecting smaller private networks that are isolated from the public Internet.

IP address

A numerical identification assigned to devices participating in a computer network connected to the Internet. The address is in the form of a string of four numbers, separated by periods.

IS/IT

(Information Systems or Information Technology) Usually refers to the team or people in an enterprise responsible for computers and making computer systems run.

ISP

(Internet Service Provider) A business that delivers access to the Internet.

JPEG

(Joint Photographic Experts Group) An image compression standard for photographs. The standard file extension for these image files is .jpg or .jpeg.

Keyword search

A search for documents containing one or more words that are specified by a user.

Kilobyte (KB)

A kilobyte is a measure of computer data storage capacity and is roughly a thousand (1,000) bytes (1,024 actual bytes).

LAN

(Local Area Network) Usually refers to a network of computers in a single building or other discrete location.

Legacy data

Important electronic information created by software and/or hardware that is outmoded or obsolete.

Legacy system

Software and/or hardware that has been rendered outmoded or obsolete. A legacy system is often retained despite being decommissioned "in case" its information is needed in the future.

Legal or litigation hold

A legal hold is a communication issued as a result of current or anticipated litigation, audit, government investigation or other such matter that suspends the normal disposition or processing of records. The specific communication to business or IT organisations may also be called a “hold,” “preservation order,” “suspension order,” “freeze notice,” “hold order” or “hold notice.”

Logical failure

In these situations the physical state of the hard drive is in working condition, however, something has gone wrong with the logical storage system on the drive itself. Logical failures can be due to many reasons, here are some of the most common:

Virus (see virus)

Formatted drive or disk partition: By formatting a drive or creating a new disk partition we reset the hard drive back into a factory-like setting with a brand new file system, ready for a new operating system to be installed

File deletion: When files are deleted they are not erased – the area of the drive that stores those specific files is marked available to be overwritten with new data, whereas erasing files actually resets the area of the drive where the files were stored back to a factory reset-like state

Megabyte (MB)

A megabyte is a measure of computer data storage capacity and is roughly a million (1,000,000) bytes (1,048,576 actual bytes).

Metadata

Metadata is information about a particular data set which may describe, for example, how, when and by whom it was received, created, accessed and/or modified and how it is formatted. Some metadata, such as file dates and sizes, can easily be seen by users; other metadata can be hidden or embedded and unavailable to computer users who are not technically adept. Metadata is generally not reproduced in full form when a document is printed. (Typically referred to by the less informative shorthand phrase “data about data,” it describes the content, quality, condition, history and other characteristics of the data.) There is a file system metadata, document metadata and e-mail metadata. Furthermore, e-discovery vendors often create and maintain vendor-added metadata as a result of processing a document.

Migrated data

Migrated Data is information that has been moved from one database or format to another, usually as a result of a change from one hardware or software technology to another.

Mirror image

Used in computer forensic investigations and some electronic discovery investigations, a mirror image is a bit-by-bit copy of a computer hard drive that ensures the operating system is not altered during the forensic examination. May also be referred to as “disc mirroring,” or as a “forensic copy.”

MIS

(Management Information Systems) MIS is a planned system of collecting, processing, storing and disseminating data in the form of information needed to carry out the functions of management.

Modem

A piece of hardware that lets a computer talk to another computer over a phone line.

Mount/Mounting

The process of making off-line data available for on-line processing. For example, placing a magnetic tape in a drive and setting up the software to recognise or read that tape. The terms “load” and “loading” are often used in conjunction with, or synonymously with, “mount” and “mounting” (as in “mount and load a tape”). “Load” may also refer to the process of transferring data from mounted media to another media or to an on-line system.

Native format

The source document, as collected from a source computer or server, before any conversion or processing. Electronic documents have an associated file structure defined by the original creating application. This file structure is referred to as the “native format” of the document. Because viewing or searching documents in the native format may require the original application (for example, viewing a Microsoft Word document may require the Microsoft Word application), documents are often converted to a standard file format (i.e., tiff or PDF) as part of electronic document processing.

Nesting

When a document or file has been inserted into a document (e.g., an attachment is nested within an email or graphics files are nested within a Microsoft Word document).

Network

A group of computers or devices that is connected together for the exchange of data and sharing of resources.

Node

Any device connected to network. PCs, servers and printers are all nodes on the network.

OCR

(Optical Character Recognition) Technology that takes data from a paper document and turns it editable text data. The document is first scanned and then is searched by OCR software for letters, numbers and other characters.

Offline

Not connected (to a network), or powered off.

Off-line data

The storage of electronic data outside the network in daily use (e.g., on backup tapes) that is only accessible through the off-line storage system, not the network.

On-line storage

The storage of electronic data as fully accessible information in daily use on the network or elsewhere.

Online

Connected (to a network).

Operating System (OS)

The software that the rest of the software depends on to make the computer functional. On most PCs this is Windows or the Macintosh OS. Unix and Linux are other operating systems often found in scientific and technical environments.

Paper discovery

The discovery of writings on paper that can be read without the aid of some device.

Parent-child relationships

A term used in e-discovery to describe a chain of documents that stems from a single e-mail or storage folder. These types of relationships are primarily encountered when a party is faced with a discovery request for e-mail. A “child” (e.g., an attachment) is connected to or embedded in the “parent” (e.g., an e-mail or Zip file) directly above it.

PC

An abbreviation for "personal computer" that generally refers to desktop workstations, but sometimes includes laptops as well.

PDA

(Personal Digital Assistant) Handheld digital organisers.

PDF

(Portable Document Format) An Adobe technology for formatting documents so that they can be viewed and printed using the Adobe Acrobat reader. Along with tiff, this is one of the most commonly used viewing formats in many review tools.

Petabyte (PB)

A petabyte is a measure of computer data storage capacity and is roughly one thousand million million (1,000,000,000,000,000) bytes (1,125,899,906,842,624 actual bytes).

Physical failure

A failure to the internal components or electronics of a hard drive. The causes can include: knocks/drops, water, power failure etc. This type of failure encompasses three main sub categories:

Physical media damage: Physical damage to the platters where the data is stored. This can be in the form of scratches or dents on the platter. This failure is caused when the reading head comes into contact with the surface of the platters either when it is stopped or when it is in operation

Electronic failure: This occurs when there is an anomaly in the power supply or an overload of the electrical circuit causing a power surge that burns the electronics attached to the circuit board

Mechanical failure: These are failures to the internal parts of the hard drive. There are many causes for a mechanical failure, but the most common is overheating of the hard drive which causes the platter to dilate, consequently the read/write head can be wrongly positioned

Plaintext

The least formatted and therefore most portable form of text for computerised documents.

Pointer

A pointer is an index entry in the directory of a disk (or other storage medium) that identifies the space on the disc in which an electronic document or piece of electronic data resides, thereby preventing that space from being overwritten by other data. In most cases, when an electronic document is “deleted,” the pointer is deleted, which allows the document to be overwritten, but the document is not actually erased.

Preservation notice, preservation order

See Legal Hold.

Private network

A computer network that is connected to the Internet but is isolated from the Internet.

PST

(Personal Folder File) The place where Outlook stores its data (when Outlook is used without Microsoft® Exchange Server). A PST file is created when a mail account is set up. Additional PST files can be created for backing up and archiving Outlook folders, messages, forms and files. The file extension given to PST files is .pst.

Public network

A network that is part of the public Internet.

RAM

(Random Access Memory) The working memory of the computer into which application programs can be loaded and executed.

Record

Information, regardless of medium or format, that has value to an organisation. Collectively the term is used to describe both documents and electronically stored information.

Record custodian

A records custodian is an individual responsible for the physical storage and protection of records throughout their retention period. In the context of electronic records, custodianship may not be a direct part of the records management function in all organisations.

Record lifecycle

The time period from when a record is created until it is disposed.

Records Hold

See Legal Hold.

Records management

Records Management is the planning, controlling, directing, organising, training, promoting and other managerial activities involving the lifecycle of information, including creation.

Records retention period, retention period

The length of time a given records series must be kept, expressed as a time period (e.g., four years), an event or action (e.g., audit) or a combination (e.g., six months after audit).

Records retention schedule

A plan for the management of records, listing types of records and how long they should be kept; the purpose is to provide continuing authority to dispose of or transfer records to historical archives.

Repository for electronic records

Repository for electronic records is a direct access device on which the electronic records and associated metadata are stored. Sometimes called a “records store,” “online repository” or “records archive.”

Residual data

Residual data (sometimes referred to as “ambient data”) refers to data that is not active on a computer system. Residual data includes (1) data found on media free space; (2) data found in file slack space; and (3) data within files that has functionally been deleted, in that it is not visible using the application with which the file was created, without use of undelete or special data recovery techniques.

Restore

To transfer data from a backup medium (such as tapes) to an on-line system, often for the purpose of recovery from a problem, failure or disaster. Restoration of archival media is the transfer of data from an archival store to an on-line system for the purposes of processing (such as query, analysis, extraction or disposition of that data). Archival restoration of systems may require not only data restoration but also replication of the original hardware and software operating environment. Restoration of systems is often called “recovery”.

Router

A piece of hardware that routes data from a local area network (LAN) between all the other connected computers, printers, phones and other devices.

Sampling

Sampling usually is used to refer to the process of statistically testing a data set for the likelihood of relevant information. It can be a useful technique in addressing a number of issues relating to litigation, including decisions as to which repositories of data should be preserved and reviewed in a particular litigation and determinations of the validity and effectiveness of searches or other data extraction procedures. Sampling can be useful in providing information to the court about the relative cost burden versus benefit of requiring a party to review certain electronic records.

Sandbox

A network or series of networks that are not connected to other networks.

Scanning

Scanning is the process of converting a hard copy paper document into a digital image for use in a computer system. After a document has been scanned, it can be reviewed using field and full-text searching, instant document retrieval, and a complete range of electronic document review options.

Server

Any computer on a network that contains data or applications shared by users of the network on their client PCs.

Sibling

A sibling is a document that shares a common parent with the document in question (e.g. two attachments that share the same parent email or are sibling documents in the same Zip file).

Slack space

A form of residual data, slack space is the amount of on-disk file space from the end of the logical record information to the end of the physical disk record. Slack space can contain information soft-deleted from the record, information from prior records stored at the same physical location as current records, metadata fragments and other information useful for forensic analysis of computer systems.

Spoliation

Spoliation is the destruction of records which may be relevant to ongoing or anticipated litigation, government investigation or audit. Courts differ in their interpretation of the level of intent required before sanctions may be warranted.

Software

Coded instructions (programs) that make a computer do useful work.

Standalone computer

A personal computer that is not connected to any other computer or network, except possibly through a modem.

System administrator

(Sysadmin, Sysop) The person in charge of keeping a network working.

Terabyte (TB)

A terabyte is a measure of computer data storage capacity and is roughly one thousand billion (1,000,000,000,000) bytes (1,099,511,627,776 actual bytes).

TIFF

(Tagged Image File Format) One of the most widely supported file formats for storing bit-mapped images and commonly used as the default viewing format in many review tools. Files in TIFF format often end with a .tif extension.

TCP/IP

(Transmission Control Protocol/Internet Protocol) A collection of protocols that define the basic workings of the features of the Internet.

VPN

(Virtual Private Network) A network of computers that uses public wires to connect nodes and uses encryption to secure the transfer of data among computers.

Web site

A collection of Uniform Resource Indicators (URIs, including URLs (Uniform Resource Locators)) in the control of one administrative entity. May include different types of URIs (e.g., file transfer protocol sites, telnet sites, as well as World Wide Web sites).

World wide web

The web is made up of all of the computers on the Internet which use HTML-capable software (Netscape, Explorer, etc.) to exchange data. Data exchange on the web is characterised by easy-to-use graphical interfaces, hypertext links, images, video and sound. Today the web has become synonymous with the Internet, although technically it is really just one component.

ZIP

An open standard for compression and decompression used widely for PC download archives. These archives are not only compressed in size but allow multiple documents to be archived into a single file. ZIP is used on Windows-based programs such as WinZip and Drag and Zip. The file extension given to ZIP files is .zip.