Description

I've noticed, when using the beta version, the Wikidata description on top of the page starts with a upper-case letter, despite the one of Wikidata starting with a lower-case one. This is rather ugly in some languages, can this be un-forced?

Summary of problems

editors expect how descriptions display to match case when they edit them and this can cause edit confusion

descriptions are inconsistent despite guidelines; wikipedia clients want to be consistent with how they display them

from a design perspective it makes sense to render sentence case in this context as otherwise it will look like an incomplete sentence.

What user value do you propose this change adds? So far I see mostly hyperbole in this task (e.g. "rather ugly in some languages"), rather than a solid rationale for work to be undertaken. We can do better than that.

What user value do you propose this change adds? So far I see mostly hyperbole in this task (e.g. "rather ugly in some languages"), rather than a solid rationale for work to be undertaken. We can do better than that.

I think the same can be said about capitalizing it in the first place ;-)

What user value do you propose this change adds? So far I see mostly hyperbole in this task (e.g. "rather ugly in some languages"), rather than a solid rationale for work to be undertaken. We can do better than that.

I think the same can be said about capitalizing it in the first place ;-)

Fair point, so allow me to explain the rationale. This change was made because having a lower case description created a capitalisation inconsistency with the article layout; the article title and first character of the article were capitalised, but the description was not. It looked odd, and created an inconsistent scan line.

Fair point, so allow me to explain the rationale. This change was made because having a lower case description created a capitalisation inconsistency with the article layout; the article title and first character of the article were capitalised, but the description was not. It looked odd, and created an inconsistent scan line.

Yes, that's more or less what I understood from your previous explanation.

I'd rather let the communities decide about this rather than force a software solution.

The problem with "just let the contributor decide" is that, in this case, the editors don't seem to have a consistent policy or knowledge of how these will be used. They edit one description at a time, but this inconsistency only becomes problematic in contexts where many descriptions are displayed together along with titles, etc.

To me this is much like page titles, which can be special cased when needed, but where there is a default which is software enforced. By saying "just display inconsistent formatting if thats what editors write" we are shifting some cognitive work of parsing and reading the descriptions to our readers. We could, by policy and community engagement I suppose, ask editors to take that work back and use "Wikipedia case" for all descriptions (or whatever the consensus ends up) or we could use a software solution, in languages where it makes sense, to reduce the burden on both editors and readers.

For an example of why casing makes parsing a list of items more effort compare:https://en.m.wikipedia.org/wiki/Special:Nearby
Which uses the editors cases and mixes strings like "hospital" with "One of the 50 hills of San Francisco", vs. either Android or iOS's search presentation which normalizes for case. I don't have eye tracking studies or dwell time to prove it, but I feel strongly that the latter is easier to read, visually more pleasing and makes the use of the descriptions look much more intentional.

If we leave it alone, some will be capitalized and some will not and there is a cost to inconsistency for the reader. This is a cost we bear throughout the projects, but the top of the page is where the user first lands orients themselves and begins digesting information. The cost of having inconsistency here is much higher than anywhere else. This is one reason why lead images are either on or off and not opt in (if there is an eligible photo). For this reason, this is one place where I think order is more important than individual preferences. I also do not like forcing a software solution, but I would like to keep it as is until a community standard evolves (the way it has with titles).

The standard for entry of the descriptions is quite well defined indeed. That said, rigorously applying something that's quite literally a guideline outside its original context (i.e. display of the descriptions within Wikidata) seems unwise to me.

Lest I get back on my hobby horse of auto-generated descriptions...
It seems like what we have here is a conflict of use cases. According to the official guidelines, the purpose of the description is to "disambiguate items with the same or similar labels," which is subtly but crucially different from our use case of "a one-line summary of the subject." I fear that if we're not aligned on these use cases at these early stages, then we'll have deeper issues than capitalization later on.

Fair point, so allow me to explain the rationale. This change was made because having a lower case description created a capitalisation inconsistency with the article layout; the article title and first character of the article were capitalised, but the description was not. It looked odd, and created an inconsistent scan line.

I have a question for/need help from language folks (ping @Amire80). Is there any language we can think of where keeping the capitalization will break reading comprehension? Here's an example where you can edit the html with other languages and test it with native speakers http://jsbin.com/xexeqix/edit?html,output

I'm trying to understand if this is really a blocker for showing the descriptions on stable.

I'd also like the perspective/rationale from Design (ping @Nirzar). Design direction and consistency is very important for providing a useful reading experience, given we don't break other important things like language comprehension.

the first letter being lowercase gives a sense of incompleteness in a sentence < like this.

It's difficult to quantify or rationalize this but sentence case has a better sense of human intervention. in branding otherwise, sometimes companies use all lowercase to suggest the "casualness" of the company. that's why facebooks F is lowercase. If you see, sentence case is used in as a standard in English prose and our communication also follows it throughout the product.

thoughts on consistently just keeping the user-provided casing?

I strongly believe we should use "Sentence case" descriptions. but the bigger problem is, CSS doesn't have sentence case as an option. it has Capitalize which makes the first letter of every word capitalised. that's just title case.

overall this is a obvious choice. Communications dept within WMF also uses Sentence case.

as far as I know, cases don't exist in other scripts like devnagari (for hindi, marathi) and if i am not wrong, hebrew according to wikipedia.

@Nirzar Just a note that the current CSS implementation actually sentence cases the sentence by leveraging the :first-letter pseudo-selector so it is actually doing proper sentence casing and not title casing.

@Esc3300 I believe mobile edit and mobile app edit are edits from the native Android/iOS apps, and this task is about mobile web, which doesn't have a wikidata description editing functionality or is even rolled out, it only is on beta.

If this is the case, mobile phones have autocorrect enabled by default, so people typing edits on their phones will be submitting sentence cased descriptions because the operating system's are correcting the text to be that way. So that would probably be the reason why most descriptions from mobile apps are capitalized.

hi @Jhernandez@Esc3300 - actually the keyboard has been made to default to lowercase when adding/editing Wikidata descriptions specifically to reduce the incidence of incorrect capitalization. In addition we have included a point in the help text explaining not to capitalize unless the first word is a proper noun.

Currently the ability to edit Wikidata descriptions is in the Android app only, and has now been slowly rolled out to all languages except English, with edits being monitored for quality.

editors expect how descriptions display to match case when they edit them and this can cause edit confusion

descriptions are inconsistent despite guidelines; wikipedia clients want to be consistent with how they display them

from a design perspective it makes sense to render sentence case in this context as otherwise it will look like an incomplete sentence.

I still see this an editing problem. When editing a wikidata description it should guide me to not use sentence case if that is indeed a policy or it should remove any leading uppercase letter . That solves 1 and 2. I liken this problem to code linting. Some developers like to use tabs and some like spaces. The only way you can make consistency happen is invalidating when the rules are broken and enforcing it.

Wikidata is a data store. Just as we wouldn't expect clients to have to render dates mm/dd/yy we shouldn't expect them to have to use case. We should be caring about the content not how it's used. I think #3 is up to the client. Rather than say it's wrong it would be helpful to point out examples where it doesn't work. Right now these seem to be hypothetical and/or rare.

I'm not sure about (2.): obviously there are descriptions with caps, possibly due to Android auto-completion, but the bulk of descriptions at Wikidata are bot generated and are unlikely to have incorrect caps.

Georgian is a particularly troublesome example. There is a long controversy about the usage of capital letters in its alphabet, and about their technical implementation. Until this is cleared up, let's not mess with user input.

As suggested a few times above, automatic capitalization just shouldn't be applied anywhere. It may make English and Dutch look better, but it isn't necessary. And in some languages it is just harmful. This should be completely removed.

Sigh.
Georgian is a particularly troublesome example. There is a long controversy about the usage of capital letters in its alphabet, and about their technical implementation. Until this is cleared up, let's not mess with user input.
As suggested a few times above, automatic capitalization just shouldn't be applied anywhere. It may make English and Dutch look better, but it isn't necessary. And in some languages it is just harmful. This should be completely removed.

In Georgian, we don't use capitalization. In Georgian grammar there is no such understanding at all.

^ If we were to make this change across the project, this is all that would be needed on Minerva side. Alternatives would limit this styling to latin based languages with a more complicated CSS selector/translateable LESS variable.

The question remains: Do we want to keep capitalisation on languages where it is useful e.g. Deutsch, English, francais.
I'd argue no as it creates tech debt and confusion, but let's make a decision promptly.

The question remains: Do we want to keep capitalisation on languages where it is useful e.g. Deutsch, English, francais.
I'd argue no as it creates tech debt and confusion, but let's make a decision promptly.

Editing clients shouldn't allow/encourage capitalization (and should in documentation make it clear not to capitalize in languages where that is possible). The normal form should be lowercase for entry and storage. Display is context, language and even user preference dependent, just as with dates. I don't see the need for any further work or decision to be made...

Exposing the real, in some cases “wrong” case would make all sense, if there's a connection on where to edit it.
Let's not forget we're automatically uppercasing every article in article namespace, which has provided some kind of order with the drawback of enforcing non-language correct casing.

Pros for uppercasing first letter in Deutsch, English, français:

We would be inline with article namespace handling

It provides a more orderly interface in some cases

Cons:

There are cases where CSS uppercasing could result in incorrect overwrites, for example in names

Hiding from editors leaves the descriptions “incorrect” in source

How are we dealing with uppercase/lowercase mIXinG in entries? Do we leave all of those for editors and don't approach to correct them by software?

I'm a bit confused because I see people expressing agreement with each other, however it is not clear to me what we're agreeing on. So here is another attempt to clarify things:

Clear

Capitalization does not make sense in certain languages (e.g. Georgian). We should never force capitalization of Wikidata descriptions on the front-end in such languages.

This requires a change to the current functionality (which incidentally, and accidentally, slipped through recently but is not live yet).

Any remaining, or new, code that modifies the capitalization of Wikidata descriptions should be language specific (i.e. no more global rules).

Confusing

We want consistency in how we display Wikidata descriptions. Specifically, we want Wikidata descriptions to be displayed with the first letter capitalized in Latin languages.

The guidelines for Wikidata descriptions in English specify that Wikidata descriptions should start with a lowercase letter — "Descriptions begin with a lowercase letter except when uppercase would normally be required or expected". So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.

Even if the Wikidata guideline was changed, it seems fair to assume that there would never be perfect consistency among Wikidata descriptions (which is probably okay, but worth noting).

Open questions

Assuming we fix the issue with languages that are being incorrectly capitalized, does anyone have an issue with capitalizing Wikidata descriptions in Latin languages on the front-end of Wikipedia?

Is anyone familiar with Wikidata's general policy in terms of opinionated vs. unopinionated data? Or, said another way, how do we resolve the tension between:

@JdlrobsonT131013#3544541
editors expect how descriptions display to match case when they edit them and this can cause edit confusion

and

@JdlrobsonT131013#3544541
Wikidata is a data store. Just as we wouldn't expect clients to have to render dates mm/dd/yy we shouldn't expect them to have to use case. We should be caring about the content not how it's used.

I am either misunderstanding this, or there is a tension/contradiction. If an editor expects their Wikidata input to match the output/display on Wikipedia, then the "data store" would indeed be opinionated, specifically towards what Wikipedia wants. This seems like a larger conversation. It also seems like one that is probably ongoing somewhere.

In conclusion
It seems like we then have two options (I'm assuming we fix the obvious issue discussed above with languages like Georgian):

Continue formatting Wikidata descriptions on the front-end of Wikipedia for Latin languages. The benefit of this option is that we retain the level of consistency we have now. The drawback is that Wikipedia does not mirror what is on Wikidata (although it's unclear that this is even desirable).

Start a conversation with Wikidata to see if they are willing to update their guidelines for item descriptions. If they are willing, we can drop the code that forces capitalization in Latin languages. The benefit of this option is that Wikipedia mirrors exactly what is in Wikidata (again, unclear that this is actually desirable). The drawback is for an indeterminate amount of time where we'd have inconsistency in how Wikidata descriptions look on Wikipedia, however theoretically this would eventually sort itself out.

So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.

Sorry if it was already mentioned and I missed it, but where is it written that Wikipedias in the Latin alphabet want this?

My apologies for not explaining this statement. The design recommendation is for consistency, specifically capitalization (as stated in the comments). Have there been any concerns raised by people using Wikipedias in the Latin alphabet? I had not seen such concerns (aside from ones that are conflated with the issue of languages that don't have capitalization) so was assuming that what we do currently is agreeable to most folks (with the understanding that there are edge cases).

I am either misunderstanding this, or there is a tension/contradiction. If an editor expects their Wikidata input to match the output/display on Wikipedia, then the "data store" would indeed be opinionated, specifically towards what Wikipedia wants. This seems like a larger conversation. It also seems like one that is probably ongoing somewhere.

Yes, this is the heart of the remaining disagreement. There is a false assumption that the user entering the data will have presentation control across all context (wikipedia, apps, APIs, etc etc) this isn't an accurate intuition and will grow increasingly inaccurate. Wikidata as a central data repository does not nearly imply it is also the determiner of presentation.

So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.

Sorry if it was already mentioned and I missed it, but where is it written that Wikipedias in the Latin alphabet want this?

My apologies for not explaining this statement. The design recommendation is for consistency, specifically capitalization (as stated in the comments).

Whose design recommendation?

Have there been any concerns raised by people using Wikipedias in the Latin alphabet? I had not seen such concerns (aside from ones that are conflated with the issue of languages that don't have capitalization) so was assuming that what we do currently is agreeable to most folks (with the understanding that there are edge cases).

It's a rather anecdotal statement, but I'll make it anyway: In the larger languages written in the Latin alphabet, there's an overlap between people who complain about bugs and people who strongly prefer to use the desktop site (often even on their smartphones). My guess is that in these larger languages written in the Latin alphabet the experienced editors don't care very much if it's lowercase or uppercase.

I'm quite annoyed by people who use lowercase and uppercase letters incorrectly in sentences and personal names, but in this case, I see no problem with using a small letter. The description is not really a sentence.

Assuming we fix the issue with languages that are being incorrectly capitalized, does anyone have an issue with capitalizing Wikidata descriptions in Latin languages on the front-end of Wikipedia?

Yes because as things are right now edits made to the descriptions from Wikipedia are wrong because of this. People are under the mistaken assumption (because of how it is displayed before editing) that they should always capitalize a description.

Is anyone familiar with Wikidata's general policy in terms of opinionated vs. unopinionated data? Or, said another way, how do we resolve the tension between:

@JdlrobsonT131013#3544541
editors expect how descriptions display to match case when they edit them and this can cause edit confusion

and

@JdlrobsonT131013#3544541
Wikidata is a data store. Just as we wouldn't expect clients to have to render dates mm/dd/yy we shouldn't expect them to have to use case. We should be caring about the content not how it's used.

I am either misunderstanding this, or there is a tension/contradiction. If an editor expects their Wikidata input to match the output/display on Wikipedia, then the "data store" would indeed be opinionated, specifically towards what Wikipedia wants. This seems like a larger conversation. It also seems like one that is probably ongoing somewhere.

There is a difference between storage and display here. Wikidata generally doesn't care how you display the data it provides you. You can do calendar model conversion and more. The issue starts when editing. The current transformation only works in one direction.

In conclusion
It seems like we then have two options (I'm assuming we fix the obvious issue discussed above with languages like Georgian):

Continue formatting Wikidata descriptions on the front-end of Wikipedia for Latin languages. The benefit of this option is that we retain the level of consistency we have now. The drawback is that Wikipedia does not mirror what is on Wikidata (although it's unclear that this is even desirable).

If it was only display we wouldn't care. If we could make users not associate the way it is displayed with the way they put it in we wouldn't care.

Start a conversation with Wikidata to see if they are willing to update their guidelines for item descriptions. If they are willing, we can drop the code that forces capitalization in Latin languages. The benefit of this option is that Wikipedia mirrors exactly what is in Wikidata (again, unclear that this is actually desirable). The drawback is for an indeterminate amount of time where we'd have inconsistency in how Wikidata descriptions look on Wikipedia, however theoretically this would eventually sort itself out.

The policy change is extremely unlikely.

In general: If you're just worried about some mistakes in the data then please please don't hide them. Show them. Expose them to people. Otherwise they'll not get fixed and everyone who doesn't implement your workaround is still exposing their users to the mistakes. We need to make the data quality better for everyone.

As one of the people behind capitalising the descriptions in the first place, I (reluctantly) agree that they shouldn't be capitalised any more. I stand by the original decision, and think it was the correct decision at the time, but circumstances have changed.

The statement of the problem is that capitalising descriptions encourages new editors to write descriptions that are capitalised, but Wikidata policy says that descriptions shouldn't be capitalised (with limited exceptions for descriptions beginning with proper nouns, etc.)

Back when descriptions were first added, description editing in the apps wasn't even something that had seriously crossed our minds, so capitalising them kept scan lines consistent, normalised the display, and so on; see T131013#2289870 for @Nirzar's comprehensive explanation of the benefits. I stand by that decision. But, now, description editing is a serious proposition. If we're going to take description editing seriously, then having all the descriptions capitalised is going to push people in the direction of capitalising any descriptions they write. As a movement we already struggle with onboarding new editors due to overly complex policies and unrealistically high standards—there's an entire programme in the annual plan dedicated to new editors for a reason—and we're only going to make it worse by intentionally setting up people to fail by subtly suggesting they should do things one way when they really the policies say the other. It's also important for adoption of description editing more generally, as having serious acceptance of client-side editing of descriptions by Wikidatans is unlikely to happen if app users keep (unintentionally) violating site policy when writing descriptions.

The way I see it, there's a few ways to solve this problem:

Change Wikidata's policy so that descriptions should be capitalised.

Attempts to make even less drastic changes to the description policy have failed in the past, so this is unlikely to happen.

Keep the descriptions capitalised, and change the editing experience to tell people that they shouldn't capitalise descriptions.

This gets confusing really fast, and people aren't likely to really understand. ("They're capitalised, but don't capitalise it yourself, we'll capitalise it for you, until you try editing it again in which case we won't, but we'll do it again afterwards...")

From my perspective, the third solution is really the only viable one.

P.S. The mobile apps were the first product to use Wikidata descriptions below article titles and in search results, and I was the product owner of the apps at the time, so I guess you could say that all the blame for this rests on my shoulders. ;-)

Yes because as things are right now edits made to the descriptions from Wikipedia are wrong because of this. People are under the mistaken assumption (because of how it is displayed before editing) that they should always capitalize a description

Is there actual evidence of this? In the past there has been a strong bias against mobile based editors and claims of "bad" edits seem to be anecdotal.

Again, I don't agree that data entry and data display must align in order for editors to "do the right thing". There are other solutions than making the UX for readers and expectations of how subtitles work in a language/culture a global demand based on an anecdotal sense of data purity.

so I guess you could say that all the blame for this rests on my shoulders.

Not even remotely Dan. Use of descriptions as subtitles is widespread and subtitles in latin languages are generally capitalized. I don't think a signle designer or product manager (or non-Wikidatan) has ever suggested these be lower case in EVERY display context globally across all prodcts. Again, there is no need for such an arbitrary policy. So this is on the shoulders of more than a dozen professional user experience designers and product managers.

Yesterday, offline, I mentioned to @Jdlrobson that I think we are discussing too many different things in this task simultaneously, and we should break the conversations out into separate tasks. His response was, ironically, that originally these were separate discussions and we combined them into one. However at this point I wonder if it's worth considering separate tasks again?

Conversation 1) Don't capitalize because some languages have no notion of capitalization (@Amire80 & others)
-thankfully this is somewhat easy to fix

Conversation 2) Don't capitalize because descriptions/subtitles aren't sentences (@Amire80, @Sjoerddebruin)
-this is up for debate. I think of the Wikidata description as a subtitle, and even though subtitles aren't sentences, they are typically capitalized. @Amire80 the opinion to capitalize seems to be supported by some web design and product members: T131013#2244140, T131013#2289870, T131013#4932344, (and myself).

Conversation 3) Don't capitalize because there are some edge-cases for which capitalization might cause awkwardness (@Jdlrobson, @Volker_E )
-this seems rare enough that maybe we can maybe defer?

Conversation 4) Don't capitalize because Wikipedia editors will get confused and think (incorrectly) that when editing Wikidata (via the Wikipedia apps) they should capitalize the descriptions (and in general will be confused about the relationship between the two) (@Lydia_Pintscher, @JMinor)
-It seems there is consensus that data storage (Wikidata) and presentation of content (Wikipedia) should be de-coupled. However while there continue to be interfaces that allow for editing Wikidata off-Wikidata there needs to be additional work to clarify this relationship. @Deskana, @JMinor and @Jdlrobson have proposed recommendations here.

@Deskana I wonder if there's a variation of the option 2 you proposed:

Keep the descriptions capitalised, and change the editing experience to tell people that they shouldn't capitalise descriptions.

This gets confusing really fast, and people aren't likely to really understand. ("They're capitalised, but don't capitalise it yourself, we'll capitalise it for you, until you try editing it again in which case we won't, but we'll do it again afterwards...")

What if we allowed people to edit descriptions, but instead of telling them not to capitalize we just automatically forced the first letter to be lowercase when we stored it in Wikidata? Of course this would be obscuring things from editors, however maybe it is the best compromise. The end result would be what's desired for both services: descriptions wouldn't be capitalized in Wikidata, and they would be in Wikipedia.

What if we allowed people to edit descriptions, but instead of telling them not to capitalize we just automatically forced the first letter to be lowercase when we stored it in Wikidata? Of course this would be obscuring things from editors, however maybe it is the best compromise. The end result would be what's desired for both services: descriptions wouldn't be capitalized in Wikidata, and they would be in Wikipedia.

The problem with this is that there's a sizeable number of descriptions that are capitalised at the start, such as almost any description about a person or collection of people: