Tagged Questions

UTF-8 (Unicode Transformation Format, 8 bits) is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?
I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs... I don't have that ...

I wonder why most modern solutions built using Perl don't enable UTF-8 by default.
I understand there are many legacy problems for core Perl scripts, where it may break things. But, from my point of ...

I'm trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus ...

I've got this very simple thing that just outputs some stuff in CSV format, but it's got to be UTF-8. I open this file in TextEdit or TextMate or Dreamweaver and it displays UTF-8 characters properly, ...

I have many "can't encode" and "can't decode" problems with Python when I run my applications from the console. But in the Eclipse PyDev IDE, the default character encoding is set to UTF-8, and I'm ...

What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but ...

I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ș (which is a UTF-8 character, ...and a strange name, I know).
Even though I save ...

I have some current code and the problem is its creating a 1252 codepage file, i want to force it to create a UTF-8 file
Can anyone help me with this code, as i say it currently works... but i need ...

I'm writing a crawler in Ruby (1.9) that consumes lots of HTML from a lot of random sites.
When trying to extract links, I decided to just use .scan(/href="(.*?)"/i) instead of nokogiri/hpricot (major ...

I have my database properly set to UTF-8 and am dealing with a database containing Japanese characters. If I do SELECT *... from the mysql command line, I properly see the Japanese characters. When ...

I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such ...

I'm trying to figure out what collation I should be using for various types of data. 100% of the content I will be storing is user-submitted.
My understanding is that I should be using UTF-8 General ...

I have a String with binary data in it (1110100) I want to get the text out so I can print it (1110100 would print "t"). I tried this, it is similar to what I used to transform my text to binary but ...

For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f |
while read ...

I have this very strange problem. I have a site that contains some German letters and when it's only html without php the symbols are property displayed with encoding when i change it to UTF-8 they ...

I am trying to convert a string encoded in java in UTF-8 to ISO-8859-1. Say for example, in the string 'âabcd' 'â' is represented in ISO-8859-1 as E2. In UTF-8 it is represented as two bytes. C3 A2 I ...

I know this has been asked before!
I have googled on this topic and I have looked at every answer, but I still don't get it.
Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using ...

The csv module in Python doesn't work properly when there's UTF-8/Unicode involved. I have found, in the Python documentation and on other webpages, snippets that work for specific cases but you have ...

My categories need to be named with Greek letters. I am using ggplot2, and it works beautifully with the data. Unfortunately I cannot figure out how to put those greek symbols on the x axis (at the ...