Pinned topicInput to Reference Match stage

‏2012-04-05T16:51:09Z
|Tags:

Answered question
This question has been answered.

Unanswered question
This question has not been answered yet.

Hi ,
I am building a job with reference match stage . I have data source which is in a standardised format ( obviously it appends lot more columns to input data after standardization) . My question is , should the reference data source also be standardised ? If not how will both the colums match ?

Ex : I have Name , address fields
My data source ( standardised) will have other columns like MatchFirstname, MatchPrimaryWordNYSIISS etc..
My refrence match does not have those standardised colums .
If I want to find matches for a record in ref data source , it is not matching if i select to match on Name and Matchprimaryword_NYSIIS not even as a duplicate .

Re: Input to Reference Match stage

Yes, you want to have your reference source parsed and standardized to the same level as your incoming data.
That statement has a couple implications:
1) before bringing in new input data, you will need to build at least a standardization process (and ideally a deduplication process) that will parse and generate the standardized output you want for subsequent reference matching.
2) you will need to store the parsed/standardized output either in the existing table containing the data you want to match to or in an associated cross-reference table.

Also from a matching perspective, data comparisons are very sensitive to the presence (or absence) of additional data, which is why parsing and standardization are important pieces of the process. This is what you are seeing in your current process. Be aware as well of the Match Comparison type and associated parameters you are using. A CHAR comparison for instance requires two strings to match exactly - a situation you will almost never have been standardized and non-standardized data. If you have a high disparity between contents of two fields, the best option is to use a MULT_UNCERT comparison to try and find some overlap between the contents regardless of data order.

Re: Input to Reference Match stage

HI Harald ,
Thankyou very much for the reply . I got my ref job compiled and run . I will now need to replce my reference data source ( which was previously a csv file ) with a web service . From what I have read , it seems like we should use ISD Input stage to send webservice input . But i dont see much configuration parameters ther to lookup for a web service . Basically I want to send a request ( as refrence dat source ) from portal . Please suggest if there is a good material . Thanks!

Now , if i want my ref source to be a web service request ( instaed of csv file ) which comes from portal , how do i implement it ?

requirement : search for a member in the DB . we will enter the members name ( only 1 record at a time - which i am calling as ref data source) in portal and click search . This search shud invoke a QS job which performs member match .

Questions :
1. Should i convert the entire job in to a service ( which i do from server console ) and expose it as a webservice so that portal can invoke it - which means this web service will accept 1 input and outputs several matched rows ?
2. Is there a way we dont need to convert this job as web service and still can send portal input as reference source to my match job .

Re: Input to Reference Match stage

Ok, so that provides more insight into what you want to accomplish (btw, I would generally refer to the name entered in the portal as the input data and the member DB to be the reference source since that is what you are matching against).

There are generally 2 key questions to ask when you are looking at this type of requirement:
1) do you need a response back? (I believe you are indicating that you do.)
2) if yes, do you need the data/response back quickly? (Generally if I think of a portal where you are entering member information, I expect someone wants to see the result fairly soon.)

Sending information from a service into a job which performs a match is going to have 3 requirements:
1) you have to get the data to the job, which means it has to go to a file, queue, or other transient data set
2) you have to start the job (which has overhead), process the data, and complete the job (possibly more overhead)
3) you have to get the data from the job back to the portal, which again means it has to go to something the portal can read
It's a very decoupled process, and will result in more latency then you are likely to want (particularly assuming requirement 2 above is true).

Coming back to your question, while I certainly don't know all specifics, I suspect you want to convert the job into a web service. That has several advantages:
1) the service is always on so minimal startup time (happens when the service is started only) and latency.
2) the data can be passed from service to service so it does not require additional landing to a file or queue.

Re: Input to Reference Match stage

HI Harald ,
Thanks for the reply . From the material you provided i could finally convert my job into webservice . Here are answers to your qsns :

) do you need a response back? - yes
2) if yes, do you need the data/response back quickly- yes , it should be a synchronous call . We enter some search criteria on screen and hit enter , I expect this search to invoke my match web service ( that I converted just now ) and return all matched records on the portal screen immediately.

Now here is what I am doing :

I want to test this service from SOAP UI . I loaded the wsdl and entered some input data . I dint get a response back in SOAP UI . It says java.net.SocketTimeoutException : Read timed out. at the bottom of the screen.

When I looked at the Information director , there are so many instances of this job ( about 50 ) some running , some aborted and some are in compiled state . I dont know what's happening . Why are so many instaces created ( i invoked only 2 times) ? Why din't I get response , where else can i chk logs to figure out . Please help .

Re: Input to Reference Match stage

Hi Harald ,
HOw do we undeploy a deployed quality stage service ? I could undeploy an application but dont see an option to undeploy only service . Even if i undeploy application , I could still see the service is up and running fine . Also , in director , I see that the instance associated with the job i still running . HOw do i stop/delete that instance ? Though I click on stop ( red button above ) it still shows as running.Also couple of more questions ( could get much info in redbook/IBM site ) .

when is an instance of job created when it is enabled as information service ? Is it as soon as it is deployed ? or when it is invoked by a client ?and how many instaces are created per invocation ( i hope its only 1 per invocation).And is there any other best way to integrate with websphere portal other than exposing this job as service and invoking this service from portal ?

Re: Input to Reference Match stage

Ernie Ostic
Hi Harald ,
HOw do we undeploy a deployed quality stage service ? I could undeploy an application but dont see an option to undeploy only service .

Applications are deployed and un-deployed. Services, which "belong" to an application, can be "disabled". Applications and Operations can also be "disabled".... Disable/Enable (done at the Deployed Application Workspace --- look for the edit button on the bottom right and then a new button that appears towards the bottom left) is the preferred way to just "stop" a job instance in QS if you just want to change a transform or something and then re-compile (and then re-enable). It is far quicker. You only need to "Re-deploy" if you change something in the signature (the ultimate input and output from ISDInput and ISDOutput Stages) of the Service.
Even if i undeploy application , I could still see the service is up and running fine .

Something might have gone wrong during undeployment. It's better to disable first, but usually, if you "undeploy" it should also shut things down.

Also , in director , I see that the instance associated with the job i still running . HOw do i stop/delete that instance ? Though I click on stop ( red button above ) it still shows as running.Also couple of more questions ( could get much info in redbook/IBM site ) .

best way to fix it now is to probably try and deploy the application again at the regular Applications Workspace. Hopefully it will cycle things through....though you may need to cycle your Info Server.

when is an instance of job created when it is enabled as information service ? Is it as soon as it is deployed ? or when it is invoked by a client ?and how many instaces are created per invocation ( i hope its only 1 per invocation).And is there any other best way to integrate with websphere portal other than exposing this job as service and invoking this service from portal ?

These are more complex questions that depend on the topology of the job and the minimum number of instances. Assuming that you have an ISDInput Stage (we call this an "always on" Job) and the minimum number of instances is at least one, then the QS Job will be started as soon as the Application is fully deployed. Invocations by a client will all flow thru that same Job. Things change if you deploy a Job that does NOT have an ISDInput Stage, but that's a whole other subject.

As far as integrating with webSphere, there is no better way! Using SOAP or EJB, there are lots of tools that can/should be able to generate an EJB for you that can invoke this service. Alternatively though, you might consider using MQSeries if you have that tooling. That's not an ISD based method, but it would work for real-time sharing of functionality.