Category Archives: Multi-language

In the early days, many questioned whether technology assisted review (TAR) would work for non-English documents. There were a number of reasons for this but one fear was that TAR only “understood” the English language.

Ironically, that was true in a way for the early days of e-discovery. At the time, most litigation support systems were built for ASCII text. The indexing and search software didn’t understand Asian character combinations and thus couldn’t recognize which characters should be grouped together in order to index them properly. In English (and most other Western languages) we have spaces between words, but there are no such obvious markers in many Asian languages to denote which characters go together to form useful units of meaning (equivalent to English words). Continue reading →

As the world gets smaller, legal and regulatory compliance matters increasingly encompass documents in multiple languages. Many legal teams involved in cross-border matters, however, still hesitate to use technology assisted review (TAR), questioning its effectiveness and ability to handle non-English document collections. They perceive TAR as a process that involves “understanding” documents. If the documents are in a language the system does not understand, then TAR cannot be effective, they reason.

The fact is that, done properly, TAR can be just as effective for non-English as it is for English documents. This is true even for the complex Asian languages including Chinese, Japanese and Korean (CJK). Although these languages do not use standard English-language delimiters such as spaces and punctuation, they are nonetheless candidates for the successful use of TAR. Continue reading →

A surge in cross-border litigation and enforcement of antitrust and Foreign Corrupt Practices Act violations is subjecting many Asian-based companies to U.S. discovery obligations. While e-discovery is “business as usual” in the U.S., discovery involving companies in Asia is still relatively new—and rife with potential pitfalls.

When parties involved in cross-border litigation or investigations are faced with multi-language documents subject to discovery, including the challenging Chinese, Japanese and Korean (CJK) languages, they must understand how to accurately process and index CJK documents for proper search, review and analysis. Many Western search and review systems were not designed to capture the nuances of CJK language complexities. As a result, they offer sub-optimal search results, sometimes finding too many documents and sometimes missing important ones. An understanding of CJK differences can help you select the right technology and experts. Continue reading →

In a recent memorandum, a U.S. Department of Justice attorney questioned the effectiveness of using technology assisted review with non-English documents. While the DOJ “would be open to discussion” about using TAR in such cases, it is not ready to adopt it as a standard procedure, the memo said.

In an article published Sept. 1 in The National Law Journal, Catalyst founder and CEO John Tredennick responds to that DOJ memo. In the article, Yes, Predictive Coding Works in Non-Western Languages, Tredennick explains that TAR, when done properly, can be just as effective for non-English as it is for English documents. This is true even for the so-called “CJK languages” — Asian languages including Chinese, Japanese and Korean.

Predictive Ranking, aka predictive coding or technology-assisted review, has revolutionized electronic discovery–at least in mindshare if not actual use. It now dominates the dais for discovery programs, and has since 2012 when the first judicial decisions approving the process came out. Its promise of dramatically reduced review costs is top of mind today for general counsel. For review companies, the worry is about declining business once these concepts really take hold.

While there are several “Predictive Coding for Dummies” books on the market, I still see a lot of confusion among my colleagues about how this process works. To be sure, the mathematics are complicated, but the techniques and workflow are not that difficult to understand. I write this article with the hope of clarifying some of the more basic questions about TAR methodologies. Continue reading →

As e-discovery reaches across borders into Asia, global companies face new and often unfamiliar challenges. Whatever the nature of the case, if it involves electronic information stored in China, Japan, Korea or elsewhere in Asia, be advised: You’ll be managing case files differently than you would be if you were in the United States.

The challenges presented in managing electronic files in Asia stem from many causes—some geographical, some technical and some cultural.

In Asian countries, the laws governing data and privacy are quite different than in the U.S. For example, in China, collecting and exporting data involving “state secrets” can get you thrown in jail. In Japan, taking data out and hosting Continue reading →

Specifically, the magazine reported that enforcement actions under the Foreign Corrupt Practices Act (“FCPA”) nearly doubled in 2010, rising to 76 (with complaints against 23 companies and 53 individuals). In 2009, the SEC and Justice Department brought 45 actions (against 12 corporations and 33 individuals). That number was a significant jump again from 2008 when the government brought 37 actions against companies and individuals. Continue reading →

In part one of this post, we reviewed the history of character encoding, from the development of ASCII in the early 1960s to the eventual creation of an array of different code sets to accommodate an array of different languages. The effect of all these different code sets was to create a technological Babel that made it difficult to share and process data across borders.

For a time, all this special encoding worked passably well. In the early days, people were less concerned with passing documents from country to country. E-mail wasn’t anywhere near the universal communications medium it is today. Google hadn’t been invented and facebooks were still something students passed around college dorms.

By the early- to mid-1990s, however, people started feeling the pinch of all this encoding. A group of visionaries realized that the world needed some kind of universal encoding that could go beyond ASCII and embrace all possible languages. That realization was the impetus for the consortium that developed the Unicode Standard, the modern foundation for handling foreign language documents around the world. Continue reading →

“There is something wrong with your system,” the angry lawyer on the phone said to Laura, the project consultant working on her case. “I am looking at the screen and all I see are a bunch of question marks and boxes,” she continued, getting more exasperated by the minute. “How am I supposed to review these documents if I can’t read the words?”

“Let me see if I can help,” Laura answered, trying to be as calm as possible. “Perhaps your computer is just using the wrong code page to display the text. If so, we can probably fix the problem with a mouse click,” she offered hopefully. “If not, there could be a problem with how your data was collected or processed.”

“A code page?” responded the caller. “What the heck is a code page?”

Our caller’s confusion was not unusual. After all, most of us went to school to study law rather than technology. Many lawyers still have little interest in knowing more about technology than how to turn on their computers. Continue reading →

Contact

About Catalyst

Catalyst designs, builds, hosts and supports the world’s fastest and most powerful e-discovery platform. For 20 years, Catalyst has helped global corporations reduce the total cost of discovery and take control of complex, large-scale discovery and regulatory compliance.