From dev-return-130096-apmail-lucene-dev-archive=lucene.apache.org@lucene.apache.org Mon Apr 1 16:35:31 2013
Return-Path:
X-Original-To: apmail-lucene-dev-archive@www.apache.org
Delivered-To: apmail-lucene-dev-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 123B2FB9A
for ; Mon, 1 Apr 2013 16:35:31 +0000 (UTC)
Received: (qmail 38756 invoked by uid 500); 1 Apr 2013 16:35:19 -0000
Delivered-To: apmail-lucene-dev-archive@lucene.apache.org
Received: (qmail 17755 invoked by uid 500); 1 Apr 2013 16:33:22 -0000
Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: dev@lucene.apache.org
Delivered-To: mailing list dev@lucene.apache.org
Received: (qmail 16152 invoked by uid 99); 1 Apr 2013 16:33:15 -0000
Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 16:33:15 +0000
Date: Mon, 1 Apr 2013 16:33:15 +0000 (UTC)
From: "Steve Rowe (JIRA)"
To: dev@lucene.apache.org
Message-ID:
In-Reply-To:
References:
Subject: [jira] [Resolved] (SOLR-4658) In preparation for dynamic schema
modification via REST API, add a "managed" schema facility
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
[ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe resolved SOLR-4658.
------------------------------
Resolution: Fixed
Fix Version/s: 5.0
Committed to trunk r1463182 and branch_4x r1463193.
Thanks Robert!
> In preparation for dynamic schema modification via REST API, add a "managed" schema facility
> --------------------------------------------------------------------------------------------
>
> Key: SOLR-4658
> URL: https://issues.apache.org/jira/browse/SOLR-4658
> Project: Solr
> Issue Type: Sub-task
> Components: Schema and Analysis
> Reporter: Steve Rowe
> Assignee: Steve Rowe
> Priority: Minor
> Fix For: 4.3, 5.0
>
> Attachments: SOLR-4658.patch, SOLR-4658.patch
>
>
> The idea is to have a set of configuration items in {{solrconfig.xml}}:
> {code:xml}
>
> {code}
> It will be a precondition for future dynamic schema modification APIs that {{mutable="true"}}. {{solrconfig.xml}} parsing will fail if {{mutable="true"}} but {{managed="false"}}.
> When {{managed="true"}}, and the resource named in {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade the schema to "managed": the non-managed schema resource (typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}.
> Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}} REST API, and can use this as the basis for modifications which can then be used to manually downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{}} to {{solrconfig.xml}} (or remove the whole {{}} element, since {{managed="false"}} is the default).
> If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}} will include {{}}.
> For a discussion of rationale for this feature, see [~hossman_lucene@fucit.org]'s post to the solr-user mailing list in the thread "Dynamic schema design: feedback requested" [http://markmail.org/message/76zj24dru2gkop7b]:
>
> {quote}
> Ignoring for a moment what format is used to persist schema information, I
> think it's important to have a conceptual distinction between "data" that
> is managed by applications and manipulated by a REST API, and "config"
> that is managed by the user and loaded by solr on init -- or via an
> explicit "reload config" REST API.
> Past experience with how users percieve(d) solr.xml has heavily reinforced
> this opinion: on one hand, it's a place users must specify some config
> information -- so people wnat to be able to keep it in version control
> with other config files. On the other hand it's a "live" data file that
> is rewritten by solr when cores are added. (God help you if you want do a
> rolling deploy a new version of solr.xml where you've edited some of the
> config values while simultenously clients are creating new SolrCores)
> As we move forward towards having REST APIs that treat schema information
> as "data" that can be manipulated, I anticipate the same types of
> confusion, missunderstanding, and grumblings if we try to use the same
> pattern of treating the existing schema.xml (or some new schema.json) as a
> hybrid configs & data file. "Edit it by hand if you want, the /schema/*
> REST API will too!" ... Even assuming we don't make any of the same
> technical mistakes that have caused problems with solr.xml round tripping
> in hte past (ie: losing comments, reading new config options that we
> forget to write back out, etc...) i'm fairly certain there is still going
> to be a lot of things that will loook weird and confusing to people.
> (XML may bave been designed to be both "human readable & writable" and
> "machine readable & writable", but practically speaking it's hard have a
> single XML file be "machine and human readable & writable")
> I think it would make a lot of sense -- not just in terms of
> implementation but also for end user clarity -- to have some simple,
> straightforward to understand caveats about maintaining schema
> information...
> 1) If you want to keep schema information in an authoritative config file
> that you can manually edit, then the /schema REST API will be read only.
> 2) If you wish to use the /schema REST API for read and write operations,
> then schema information will be persisted under the covers in a data store
> whose format is an implementation detail just like the index file format.
> 3) If you are using a schema config file and you wish to switch to using
> the /schema REST API for managing schema information, there is a
> tool/command/API you can run to so.
> 4) if you are using the /schema REST API for managing schema information,
> and you wish to switch to using a schema config file, there is a
> tool/command/API you can run to export the schema info if a config file
> format.
> ...wether of not the "under the covers in a data store" used by the REST
> API is JSON, or some binary data, or an XML file just schema.xml w/o
> whitespace/comments should be an implementation detail. Likewise is the
> question of wether some new config file formats are added -- it shouldn't
> matter.
> If it's config it's config and the user owns it.
> If it's data it's data and the system owns it.
> : is the risk they take if they want to manually edit it - it's no
> : different than today when you edit the file and do a Core reload or
> : something. I think we can improve some validation stuff around that, but
> : it doesn't seem like a show stopper to me.
> The new risk is multiple "actors" (both the user, and Solr) editing the
> file concurrently, and info that might be lost due to Solr reading the
> file, manpulating internal state, and then writing the file back out.
> Eg: User hand edits may be lost if they happen on disk during Solr's
> internal manpulation of data. API edits may be reflected in the internal
> state, but lost if the User writes the file directly and then does a core
> reload, etc....
> : At a minimum, I think the user should be able to start with a hand
> : modified file. Many people *heavily* modify the example schema to fit
> : their use case. If you have to start doing that by making 50 rest API
> : calls, that's pretty rough. Once you get your schema nice and happy, you
> : might script out those rest calls, but initially, it's much
> : faster/easier to whack the schema into place in a text editor IMO.
> I don't think there is any disagreement about that. The ability to say
> "my schema is a config file and i own it" should always exist (remove
> it over my dead body)
> The question is what trade offs to expect/require for people who would
> rather use an API to manipulate these things -- i don't think it's
> unreasable to say "if you would like to manipulate the schema using an
> API, then you give up the ability to manipulate it as a config file on
> disk"
> ("if you want the /schema API to drive your car, you have to take your
> foot of hte pedals and let go of the steering wheel")
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org