Hi Vijay,
The short answer is yes, you can combine almost anything you want into a single collection. But, in addition to working out your queries, you might want work out your data life cycle.
In our application, we have comingled the structured and unstructured documents into a single collection for initial development purposes. The only field they have in common is the unique ID. Works fine.
In production, however, we see things like query rates, access controls, load balancing, availability, shard keys, overall document counts, update frequency, etc. will drive us to use separate collections. For us, the deciding factor is less about "structured vs. unstructured" and more about "public vs. private". We have developed our app so that splitting the collection will have minimal impact by executing separate queries, in parallel, at runtime.
Of course, your application is different. YMMV, etc.
hth,
Charlie
-----Original Message-----
From: Jack Krupansky [mailto:jack.krupansky@gmail.com]
Sent: Sunday, March 29, 2015 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Structured and Unstructured data indexing in SolrCloud
The first step is to work out the queries that you wish to perform - that will determine how the data should be organized in the Solr schema.
-- Jack Krupansky
On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy < vijaya.bhoomireddy@whishworks.com> wrote:
> Hi,
>
>
>
> We have a requirement where both structured and unstructured data
> comes into the system. We need to index both of them and then enable
> search functionality on it. We are using SolrCloud on Hadoop platform.
> For structured data, we are planning to put the data into HBase and
> for unstructured, directly into HDFS.
>
>
>
> My question is how to index these sources under a single Solr core?
> Would that be possible to index both structured and unstructured data
> under a single core/collection in SolrCloud and then enable search
> functionality over that index?
>
>
>
> Thanks in advance.
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use
> of the intended recipient. If you receive this e-mail in error please
> delete it from your system immediately and notify us either by e-mail
> or telephone. You should not copy, forward or otherwise disclose the
> content of the e-mail. The views expressed in this communication may
> not necessarily be the view held by WHISHWORKS.
>
*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete it.
TIAA-CREF
*************************************************************************