On 23/01/2013 2:53 a.m., Julian Reschke wrote:
> On 2013-01-22 14:40, Nicholas Shanks wrote:
>> On 17 January 2013 09:14, Julian Reschke wrote:
>>> On 2013-01-17 09:59, Roy T. Fielding wrote:
>>>> than there are servers that implement language negotiation and
>>>> actually want to resolve ties at random.
>>>
>>> They do not "want" to resolve at random; they do so because they have
>>> implemented what the spec says. There's no reason to create an
>>> ordered list
>>> structure when the spec says that an unordered list is sufficient.
>>
>> I think no implication of randomness should be permitted by the
>> specifications.
>> They should instead require that a deterministic process be used, and
>> that, other than requests to services which explicitly exist to
>> provide random results (e.g. Wikipedia's "Random Page" link), the same
>> request should generate the same result providing nothing pertinent to
>> the resource has changed on the server.
>>
>> Someone, I don't recall who, gave the example of a home page loading
>> blog posts via AJAX, where the blog posts are available in two
>> languages. Random selection between the variants, where (q * qs)
>> values are equal for both languages, or are being ignored, would
That would be me. Take a note of the Androids below...
>
> Can you please give an example of clients sending these kind of header
> field values?
>
> Clients that care can provide different qvalues, and as a matter of
> fact, they do.
Uhm. Lets see..... where shall I start ?
I think an overview of what happens what agents "care" enough to send
q-values.
Followed by a small sample of the 513 agents I have on record with no
q-values at all.
Judge for yourself which ones are interpreted better as sorted lists.
For starters I would like to say, that to be completely fair the
majority of agents that I have on record (~54% of unique language:agent
pair entries) *do* send q-values properly in accordance with the
specification - and that same 54% of unique agent entries is all
'voting' for the list to be ordered. I am presenting this sub-set as
what types of complexity/confusion issues we are introducing when we
rely solely on q-values to provide ordering semantics in the list.
WebKit ...
cs, en-us; 0.9, de-de; 0.8, ru-ru; 0.7
- Mozilla/5.0 (X11; U; Linux; cs-CZ) AppleWebKit/532.4 (KHTML, like
Gecko) Arora/0.10.1 Safari/532.4
+ do we consider that a list with q-values or not?
+ notice also how it is a much more "up to date" version the the
following...
en;q=1.0, en;q=0.5, zh-cn, zh;q=0.5, en;q=0.5
- Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/300.21.012;
Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like
Gecko) Safari/413
+ Nokia Symbian and SonyEricsson WebKit/ 4XX-532 derived agents across
the board seem to have 1 primary language set at q=1.0 followed by a
list of others all sharing q=0.5 or no q-value at all as seen above.
cs-CZ, en-US
- Mozilla/5.0 (Linux; U; Android 2.2; cs-cz; HTC Legend Build/FRF91)
AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
+ Starting with WebKit/533 all the mobiles seem to have moved to this
2-language model with something then "en-US"
da-DK, en-US
- Mozilla/5.0 (Linux; U; Android 4.0.4; da-dk; GT-P5110 Build/IMM76D)
AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30
en-us,en
- Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; Valve Steam Client;
) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.444.0 Safari/534.1
th-TH, en-US
- Mozilla/5.0 (Linux; U; Android 4.0.3; th-th; A1 Build/IML74K)
AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
... and then we have iTunes. A massive "WTF?" going out to the iTunes
developers if anyone is reading.
en;q=1.0,fr;q=1.0,de;q=0.9,ja;q=0.9,nl;q=0.9,it;q=0.9,es;q=0.8,pt;q=0.8,pt-PT;q=0.8,da;q=0.7,fi;q=0.7,nb;q=0.7,sv;q=0.7,ko;q=0.6,zh-Hans;q=0.6,zh-Hant;q=0.6,ru;q
=0.5,pl;q=0.5,tr;q=0.5,uk;q=0.5,ar;q=0.4,hr;q=0.4,cs;q=0.4,el;q=0.3,he;q=0.3,ro;q=0.3,sk;q=0.3,th;q=0.2,id;q=0.2,ms;q=0.2,en-GB;q=0.1,ca;q=0.1,hu;q=0.1,vi;q=0.1
- iTunes-iPad/5.1.1 (2; 32GB; dt:74)
en;q=1.0,fr;q=1.0,de;q=0.9,ja;q=0.9,nl;q=0.9,it;q=0.9,es;q=0.8,pt;q=0.8,pt-PT;q=0.8,da;q=0.7,fi;q=0.7,nb;q=0.7,sv;q=0.7,ko;q=0.6,zh-Hans;q=0.6,zh-Hant;q=0.6,ru;q
=0.5,pl;q=0.5,tr;q=0.5,uk;q=0.5,ar;q=0.4,hr;q=0.4,cs;q=0.4,el;q=0.3,he;q=0.3,ro;q=0.3,sk;q=0.3,th;q=0.2,id;q=0.2,ms;q=0.2,en-GB;q=0.1,ca;q=0.1,hu;q=0.1,vi;q=0.1
- iTunes-iPhone/5.0 (4; 16GB)
en;q=1.0,fr;q=1.0,de;q=0.9,ja;q=0.9,nl;q=0.9,it;q=0.9,es;q=0.8,pt;q=0.8,pt-PT;q=0.8,da;q=0.7,fi;q=0.7,nb;q=0.7,sv;q=0.7,ko;q=0.6,zh-Hans;q=0.6,zh-Hant;q=0.6,ru;q
=0.5,pl;q=0.5,tr;q=0.5,uk;q=0.5,ar;q=0.4,hr;q=0.4,cs;q=0.4,el;q=0.3,he;q=0.3,ro;q=0.3,sk;q=0.3,th;q=0.2,id;q=0.2,ms;q=0.2,en-GB;q=0.1,ca;q=0.1,hu;q=0.1,vi;q=0.1
- iTunes-iPhone/4.3.5 (3; 16GB)
... spiders are mostly doing a remarkably good job. At least it looks
that way until the q-values get involved.
ja-JP,ja
- Baiduspider+(+http://www.baidu.jp/spider/)
ja,en
- Mozilla/5.0 (compatible; Steeler/3.5;
http://www.tkl.iis.u-tokyo.ac.jp/~crawler/)
ru, uk;q=0.8, be;q=0.8, en;q=0.7, *;q=0.01
- Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
+ q=0.8 - Ukranian or Belarusian ?
en-us,en-gb,en;q=0.99,*;q=0.01
- TosCrawler/Nutch-1.5.1
(http://www.toshiba.co.jp/rdc/about/crawl_info.htm; <dc-crawler at ml
dot toshiba dot co dot jp>)
+ q=1.0 - English US or British? (no so much trouble for humans but
for a search engine it might cause indexing trouble).
Don't know if you would call some of the major search engine bots
popular or even "fixable problem"?
I host a translation server so it is likely that these below are from
actual users working on text translation. You know, the kind of person
who *really* objects to getting a randomly-wrong language displayed.
Also these people are highly knowledgeable about language codes and what
they mean, so if they entered these manually it was for a specific
reason according to how they or their tools author interpreted the
Accept-Language specs.
Note how the first entries have no q-value and are *sorted* as if they
were q=1.0, which is what the spec says to do when no q-value is
supplied remember ... Treat it as q=1.0.
ca,ca-ES,es-es;q=0.9,es;q=0.9,en-US;q=0.9,en;q=0.9,es-419;q=0.8,ca-AD;q=0.8,en-gb;q=0.8,de-de;q=0.7,de;q=0.7,ca-CA;q=0.7,cs-CZ;q=0.6,cs;q=0.6,it-it;q=0.6,it;q=0.6,es-CL;q=0.5,en-au;q=0.5,fr-FR;q=0.5,fr;q=0.4,ru-ru;q=0.4,ru;q=0.4,es-x-mtfrom-en;q=0.4,es-ar;q=0.3,ja-JP;q=0.3,ja;q=0.3,pt-PT;q=0.2,pt;q=0.2,do-es;q=0.2,do;q=0.1,es-x-mtfrom-it;q=0.1,nl-nl;q=0.1,nl;q=0.1,en-en;q=0.0
- Mozilla/5.0 (X11; Linux x86_64; rv:10.0.6) Gecko/20100101
Firefox/10.0.6 Iceweasel/10.0.6
+ q=1.0 - Catalan Valencian or Spanish Catalan?
+ q=0.9 - Spanish or English? Generic or nationalized grammar?
+ q=0.8 - Spanish or Catalan Andoran or English or German or Catalan
Valencian?
+ q=0.6 - want to try again with German or Catalan Generic?
+ q=0.5 - Spanish or Australian English or French?
+ q=0.4 - what about French or Russian?
+ q=0.3 - Argentine Spanish or Japanese?
+ q=0.1 - Spanish or Dutch?
de,de-DE,en-US;q=0.9,en;q=0.9,nl-nl;q=0.8,nl;q=0.8,en-gb;q=0.8,ro-RO;q=0.7,ro;q=0.7,fr-FR;q=0.6,fr;q=0.6,de-DE-1901;q=0.5,tr-TR;q=0.5,tr;q=0.5,pl-PL;q=0.4,pl;q=0.4,nl-NL;q=0.3,de-de;q=0.3,de-at;q=0.3,en-us;q=0.2,pl-pl;q=0.2,de;q=0.1,en-us;q=0.1,en;q=0.0
- Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15)
Gecko/20110303 Firefox/3.6.15
+ q=0.9 - English Generic or US-centric ?
+ q=0.8 - Dutch or English?
+ q=0.5 - German or Turkish?
+ q=0.3 - Dutch or German?
+ q=0.2 - English or Polish?
+ q=0.1 - German or English?
+ q=0.1 - oops Cancel that q=0.9 US English option.
+ q=0.0 - oops Cancel that q=0.9 generic English option.
+ I skip q=1.0 (none), q=0.7, q=0.6 and q=0.4 because these, while
being alternatives sharing a q-value, are in the ISO definitions
semantically equivalent aliases for the same language. So any selection
algorithm other than if-it-exists is a waste of CPU cycles but not a
user problem.
We have only a few agents sending "q=1.0", by my interpretation of 2616
these few are the "correct" users of q-values when q=1:
en;q=1.0
- w3m/0.5.2
also the YoudaoBot spider with a mix of language codes. It seems to be
trying to fetch different translations specifically for some reason.
en-us;q=1.0, es-ve;q=0.5
- Mozilla/4.1 (U; BREW 3.1.5; en-US; Teleca/Q05A/INT)
- NetFront/3.5.1 (BREW 5.0.1.2; U; en-us; LG; NetFront/3.5.1/AMB)
Sprint LN510 MMP/2.0 Profile/MIDP-2.1 Configuration/CLDC-1.1
there are a few other variations of this "NetFront/" framework from
Samsung and LG mobile devices.
The rest (~50 unique agent:language pairs) using q=1.0 somewhere in the
A-L header are all WebKit derived agents. We already covered how well
they handle q-values.
Still a fair few browser few browser agents around with no q-values.
zh-cn,zh-tw
- Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.0.1)
Gecko/2008070208 Firefox/3.0.1
zh-cn,zh-tw
- Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.3)
Gecko/20100401 Firefox/3.6.3
en,zh,fr,de,it
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.20) Gecko/20081217
Firefox/2.0.0.20 Novarra-Vision/8.0
ru, en-US, en
- Mozilla/5.0 (compatible; Konqueror/4.4; Linux) KHTML/4.4.5 (like Gecko)
ru, uk, en-US, en
- Mozilla/5.0 (compatible; Konqueror/4.4; FreeBSD) KHTML/4.4.3 (like Gecko)
HTH
Amos