GeoTools : Opperations API - IM

Created by Jody Garnett, last modified by Adrian Custer on Jul 12, 2006

cholmesny: Hey rob.
rob_cto_sco: hi ya!
jodygarnett: I am afraid ext/view is a bit of a mess, I have been hacking hard to try and get my thoughts in order.
jodygarnett: I think I have a bit of a plan though.
rob_cto_sco: Do you want me to try voice?
rob_cto_sco: happy to leave experimentation later tho
cholmesny: I've tried voice with ym and it actually works pretty well.
cholmesny: Though I have no mic at the moment...
rob_cto_sco: ok lets text
jodygarnett: I don't have that set up at work
jodygarnett: Most of the important information in ext/view is in the comments.
rob_cto_sco: anyway, latest geotools ang geoserver chouckouts actually compiled. so im ghappier tyhis morning
jodygarnett: I have tried to do code example of what the "result" will look and act like.
jodygarnett: Sweet
rob_cto_sco: yeah - read the comments
jodygarnett: The ext/view thing is so good on the expression creation front I have started active development with it to handle catalog queries and metadata.
jodygarnett: so good with respect to what geotools has going right now.
jodygarnett: Chris have you had a chance to look at what is going on?
cholmesny: Awhile ago, I've sorta forgotten though.
jodygarnett: roba the metadata work did sneak in to ext/view - I have not been able to set up the geotools community with a 2.1 branch for us to work on yet (so all my 2.1 development has snuck into that module).
rob_cto_sco: guess the bit that has lost me is the "join" semantics - the requirements we have is very much for complex properties that just joining scalar attributes isny enuff
jodygarnett: The general idea is to create a unified Expression/Filter/Query/Join tree - called Expr.
jodygarnett: Yes
jodygarnett: I was calling that "AS" expressions - cause that is how SQL does it.
jodygarnett: I was going to do a series of AS expressions defining each attribute in the derived feature type ( attribute AS expression)
jodygarnett: So you could do simple math and so on for creating your derived attributes?
jodygarnett: Was that the correct understanding?
rob_cto_sco: sometimes we want to join master/detail and sometimes we want to join to other fesatures
rob_cto_sco: so the derivation needs query strategies - lazy reads from rowsets for detailsa etc
jodygarnett: I sent out an email recently with a Join example - let me find it.
jodygarnett: (I am not quite sure what you mean by query strategies so I am hoping a code example will help).
jodygarnett: Did we want to interactively compose a code example on the wiki?
jodygarnett: http://docs.codehaus.org/display/GEOTOOLS/Expr%2BExamples
jodygarnett: copied from email
jodygarnett: I am not quite sure what you mean by lazy reads from rowsets for details.
rob_cto_sco: lets say we have a feature type roadjunction, and I want to populate a property which is "roadsConnected" - which is multivalued.
rob_cto_sco: what I will want to do is insert into the output GML a series of properties (using the xlink pattern) to related features which I will need to list in the output as well.
rob_cto_sco: See the Ordnance Survey MasterMap GML for thsi pattern if you find it difficult to believe...
jodygarnett: Ah I understand - I have read the GML spec many times.
jodygarnett: So yeah I belive.
rob_cto_sco: anyway, we may not want to do a SQL select for every junction, but rather have a list of roads returned as a rowset where they intersect with the junctions of interest
rob_cto_sco: so we may want to lazy-read from this rather than issue many select statements
jodygarnett: That example I have posted is the worst case senario - where we need to do it all on the client side.
jodygarnett: I would really rather have such things reduce to a single SQL statement and let postgis do the hard work (or oracle spatial)
jodygarnett: It is a difficult problem, chris how well do you know your gml?
cholmesny: But there's the case where those features are across different datastores...
rob_cto_sco: I see. and agree with the last, though in practice SQL often isnt smart enough fro real world objects with relationships.
jodygarnett: Would we need to write all the roads out first, and then write the junctions out second (with xlink expression created in terms of feature id?)
cholmesny: I know my gml decently well, but I still haven't gotten all the way through gml 3.
jodygarnett: The Object Oriented Databases that Galdos pushes would though.
rob_cto_sco: its certainly nice to be able to handle the corss-datastores as long as we can work out how to push down into SQL where ity makes sense.
cholmesny: agreed.
rob_cto_sco: I feel this is a configure-time issue - and the manual joins would be used as a last resort where no supported query strategy makes sense.
jodygarnett: That is what I think we can do from the ext/view work. Need to create a vistor/walker api and then try our hand at making some sql writers.
rob_cto_sco: This pattern in nor GML3 specific by the way - OS uses GML 2
jodygarnett: roba have you looked at how the postgis datastore handles the isse within the limitied context of a Query/Filter?
jodygarnett: The define a Expression tree, and walk it twice: once for the SQL statement, and once for the post processing opperations.
jodygarnett: In short I think you have two issues (one is the data access for joins) and the other is GML output.
jodygarnett: I am afraid I have not looked into the GML output, but a co-worker has/is.
jodygarnett: Questions for RobA/Chris: how would you like to proceed?
rob_cto_sco: I have looked into the JDBCDataStore
cholmesny: Rob, do you feel you could code up some of your ideas? I think Jody and I are both having a bit of hard time really grasping exactly what you want to do, and putting it into code would likely make it a lot more clear.
rob_cto_sco: I agree that there are two issues - certainly wish to keep them separate - I think the Feature implementation is good enough that we can do so - as long as the attributes are Readers we should be able to serialise OK
cholmesny: I imagine the both of us could probably get you commit access tomorrow to work just somewhere in ext/
jodygarnett: I am catching up, but he will be pushing the geotools state of the art (I don't think we have an implementation of nested Features right now)
cholmesny: Yeah, I'm getting more of a grasp of it too, but it'd be nice if we could get even more geotools developers thinking about this, and starting the code's the best way to do it.
jodygarnett: I have been sending out a lot of email - and getting no response. Chris do people relize we are serious about this and are moving ahead quickly?
rob_cto_sco: I think the difficult bit for starting is to decide where the API is going to go - we could start with attributeReader in JDBCDataStore
jodygarnett: And just do these things against a single DataStore for step 1?
rob_cto_sco: I tjhink the Geotools features is able to handle nested features - its just the input/output bits are not generalised.
jodygarnett: I was thinking of doing the postprocessing join opperations, and asking chris to do the sql builder for postgis?
rob_cto_sco: Our project only requires handling a single JDBC data store
cholmesny: I think attributeReader is a good place to start. In geotools stuff gets refactored fairly often, the philosophy is definitely to just start and rework later if it doesn't work quite right.
rob_cto_sco: also, we need to do against Oracle spatial - so I want to promote to JDBC
jodygarnett: The geotools Feature is capable (as an interface) you will need to make a new implementation. On the bright side jesse made a XPATH class last week to help with such an implementation (and Metadata).
jodygarnett: Does postgis work for you (or are you using oracle?)
rob_cto_sco: I havent had a chance to look into the metadata stuff FYI - what do I need to know.
jodygarnett: Nothing
jodygarnett: It is work I need for WMS and WFS and WRS work
cholmesny: Yeah, that hits on the other thing missing from Feature, is xpath queries - you can't request an attribute like transportation/roads.
jodygarnett: It is what I ment when I said geotools 2.1 work was being dumped into ext/view
cholmesny: So that'll need to be improved to allow filtering against nested features.
rob_cto_sco: My preferred target dev platforms are Oracle (client) and MySQL (to target a large number of people with Observation and Measurement data with no spatial db)
jodygarnett: Jesse has the class now, we just need to do the Feature implementation.
cholmesny: Cool.
jodygarnett: Okay good to know, I have access to the postgis guys hense my preference
rob_cto_sco: good coverage of the bases.
rob_cto_sco: I also need to get to speed with how to work with the code in ext/view - how do I build and test using this? sorry to be such a newbie on this front.
jodygarnett: No worries there - the developers guide is a great starting point.
jodygarnett: Nobody knows ext/view (it is new and being tested as we write it).
jodygarnett: I actually wanted to try and talk about the breakdown between: Expression / Filter / Query / As / Join with you guys
jodygarnett: Make sure that we have carved out the right abstraction to do what you need.
rob_cto_sco: dev guide allowed me to get core build working, but not easy to see how the ext/ code is handled
jodygarnett: I am totally winging this - I actually expect to work with you two to figure out what we need.
rob_cto_sco: Yes - lets have a shot at that. Do i need to dive into the Postgres implementation of can I work from JDBC and abstract DataStores?
rob_cto_sco: of=or
jodygarnett: Right now I have only started implementing a small subset of "the vision" as postprocessing opperations.
jodygarnett: So any DataStore will do.
jodygarnett: I would like Chris/Andrea to have a go at writing the SQL Builder (so I know I am not screwing up).
jodygarnett: I had pictured mirroring the existing SQL statements:
cholmesny: Ok, I've been holding off on thinking really about this stuff, but it looks like we could use the extra brainpower. So I'll try to really dive into this stuff this week, and probably push GeoServer 1.2.0 back a bit, as I'm tight on time these days.
jodygarnett: What is your timeline RobA? I hate to mess with GeoServer 1.2.0? But we want to keep you happy?
jodygarnett: Chris I cannot talk a lot about this at tomorrows IRC (I may miss most of it). But We could do an irc anytime after tuesday.
rob_cto_sco: We need to get an implementation up in a matter of weeks... been hanging off a bit hoping to have some of this abstraction sorted out.
rob_cto_sco: also, IRC time is not friendly to Aussies...
jodygarnett: And I was waiting for feedback (IRC time can be made OZ friendly - chris and I are on the west coast)
jodygarnett: (breakout irc that is)
jodygarnett: rob a - all is not lost. I think I could have the postprocessing version done in a couple of days (like three).
cholmesny: Yeah, the thing that sucks is I'm going to europe from 7th to 17th, and want to do a lot of documentation/testing for 1.2.0, so I may just do rc3 the 6th and then release for real when I get back.
rob_cto_sco: Do we want to get dirty with the DataStore/Query/FeatureType abstraction
jodygarnett: If I had the freedom of having you test/correct this work would go faster.
jodygarnett: Yes, although if you check the javadocs on ExperimentalRepository you should see a bit more of my thinking.
jodygarnett: Actuall a good arangement woudl be to have RobA make a new FeatureType implementation (using that XPATH class). Because even if I make the post processing implementation I will need a datastructure to store the results.
jodygarnett: (and that plan does not even touch writing out nested FeatureTypes nicely as GML)
cholmesny: I think the current gml writer should handle it decently if the structure is right.
jodygarnett: I would think so, but I doubt they woudl do the xlink thing that RobA described.
cholmesny: It may need a bit of tweaking, but it's written with nested stuff in mind.
cholmesny: Yeah, probably not.
jodygarnett: For junctions that "share" a road as a subelement.
cholmesny: Though that could just be a special attribute - like a 'reference attribute' that is just a type and an fid.
jodygarnett: RobA for that couple of weeks - what is the feature set you have in mind?
cholmesny: And it would know to turn that into an xpath and that the fid should come later.
rob_cto_sco: specifically need to be able to handle tables with sampling locations, samples, measurements on that sample and what is being measured.
jodygarnett: Idea: Rob one of the best thing you could do is prepair a sample dataset we could use for writing junit tests.
rob_cto_sco: so I can then go and get the sampling locations where, for example, pH > 7
rob_cto_sco: yes - am working on getting sample data and GML schemas from my sponsors. The one saving grace I have is that they are running behind schedule on this
jodygarnett: I would need to see the datamodel do figure out exactly what is required in terms of required capabilities to fufill your needs.
jodygarnett: Even some toy shapefiles would be perfect - we don't need anything real.
rob_cto_sco: shapefiles dont work here - they are flat..
jodygarnett: I was thinking that you would have two shapefiles - and we we join them to not be flat.
rob_cto_sco: will post samples to the wiki as soon as I can.
jodygarnett: But if needed we can code up an in memory non-flat set of testdata to work against.
jodygarnett: (That is extend MemoryDataStore to handle xpath expressions).
jodygarnett: Sweet.
rob_cto_sco: this would force us to deal with cross datastore joins earlier rather than later - not sure about that.
rob_cto_sco: was thinking of memeory data store sample to handle the serialisation side of it.
jodygarnett: I think I can do it - that wiki has code that will do the trick right now (using post processing).
rob_cto_sco: this is the xpath stuff? must go look..
jodygarnett: We have done this a couple times as part of validation tests (basically an inner and outer loop working with FeatureReaders). Not pretty though.
jodygarnett: The xpath stuff is used to figure out which attribute expr gets bonded to which Feature (from with the inner or outter loop).
rob_cto_sco: where can I find the xpath stuff? it does sound like a good start.
jodygarnett: And we can write out a "real" sql statement for the inner loop.
jodygarnett: in ext/view.
jodygarnett: Did you want to walk through the code example at http://docs.codehaus.org/display/GEOTOOLS/Expr+Examples
jodygarnett: It hinges around two methods - resolve and reduce.
jodygarnett: resolve binds all xpath expression that start with a given name to a Feature.
jodygarnett: Basically turns them into literal expressions.
jodygarnett: Reduce is used to knock off a xpath name, so we can write out a real geotools Filter for generation into SQL.
jodygarnett: Do you guys have that link on screen?
rob_cto_sco: yeah
jodygarnett: the expression is SQLish: RIVER.name = HAZZARD.river
rob_cto_sco: cnfused by the fact that inner is declared to be an Expr ans then a FeatureReader
jodygarnett: opps
jodygarnett: Any confusion is probably a mistake on my part
jodygarnett: And then confluence went down on me so I cannot fix it.
rob_cto_sco: so talk me through it line by line and correct it as you go...
rob_cto_sco: please!
jodygarnett: Sure I just need to find the origional email.
rob_cto_sco: do you want to do this by phone?
jodygarnett: no it is okay
jodygarnett: Plus this is volunteer stuff for me
jodygarnett: (although admittedly I am at work right now)
jodygarnett: Chris wanna do a confernce call?)
jodygarnett: Okay does this make sense:
jodygarnett: FeatureType RIVER = river.getSchema();
> FeatureType HAZZARD = hazard.getSchema();
jodygarnett: basically I am grabbing the FeatureTypes (in case I need them to build an SQL expression)
cholmesny: I'm afraid I don't have much to add, as I just haven't thought about this enough yet, and I'm pretty distracted right now...
cholmesny: Hold on, I'll be back in a bit.
jodygarnett: OkayL
jodygarnett: Expr joinExpr = Exprs.attribute("river/name").eq( Exprs.attribute("hazzard/river") );
jodygarnett: This is supposed to be RIVER.name = hazzard.river
jodygarnett: check?
rob_cto_sco: yep
jodygarnett: Okay then we have the otter loop:
jodygarnett: FeatureReader outer = river.getFeatures().reader();
while( reader.hasNext() ){
..
rob_cto_sco: got the loop OK - to Expr..
jodygarnett: that outer loop could of been something more fun - with an expr.
jodygarnett: Feature aRiver = outer.next();
> Expr innerExpr = joinExpr.resolve( "river", aRiver ).reduce( "hazzard" );
rob_cto_sco: where would that expr have come from?
jodygarnett: crap
jodygarnett: oh wait - it was the joinExpr we define a couple lines back
jodygarnett: This turns RIVER.name = HAZZARD.river into a "murry" = river
jodygarnett: first of all I bind it to the murry river (the first feature in the otter loop).
jodygarnett: And then I reduce "murry" = hazzard.river to "murry" = river.
jodygarnett: check?
jodygarnett: bind is a better name the resolve I think?
rob_cto_sco: thinking...
jodygarnett: reduce is painful - I should probably combine innerExpr.filter( HAZZARD ) and reduce together.
rob_cto_sco: get the first bit - binding the expr to the current "outer" feature.
rob_cto_sco: yep?
jodygarnett: One thing here RobA - I never intend for client code to see this stuff - I want to construct a big Expr that includes joins - this nested for loop is supposed to show how such a join can be implemented as postprocessing.
jodygarnett: okay - so binding is okay.
jodygarnett: Does reduce make sense?
rob_cto_sco: understand - we need this binding to be specified at configure time IMHO and then the client just access the features
jodygarnett: What if we did: FeatureReader inner = district.getFeatures( innerExpr.filter( "hazzard", HAZZARD ) );
jodygarnett: well in sql this binding is part of the join syntax....
rob_cto_sco: I'm not yet across what the filter is in "hazzard", HAZZARD
jodygarnett: filter is lame
jodygarnett: filter creates a "real" geotools filter (that all the datastores understand).
jodygarnett: In a perfect world (cough 2.1) we will teach the datastores about Expr and get rid of Filter)
rob_cto_sco: walk me through what "hazzard" as a literal means please
jodygarnett: You can view it as setting up the FeatureReader.
jodygarnett: It was the name of a DataStore.
jodygarnett: So I was turning all the new AttributeExpr( "hazzard/river" ) expressions into new AttributeExpr( "river" )
jodygarnett: and then when I made the filter it binds all AttributeExpr to the provided FeatureType (HAZZARD).
jodygarnett: making the SQL SELECT * FROM HAZZARD WHERE river="murry"
rob_cto_sco: ok - i think i get it
jodygarnett: this is low level stuff, tyring to make joins work using the FeatureReader api geotools already has.
rob_cto_sco: ahhh

rob_cto_sco: whereas I have been think of it at the AttributeReader level
jodygarnett: Okay - cool.Now I want to take the the extra step and make a FeatureView API (so you can have a FeatureType for you derived data).
rob_cto_sco: gotcha
jodygarnett: So I would like to "declare" what you want as FeatureType changed.
rob_cto_sco: Yes - I would only want the FeatureView exposed to the outside world in Capabilities not the component sub-Features (which may not be real features with identity)
jodygarnett: so define your FeatureView in terms of a bunch of AS expressions in the SQL sense.
rob_cto_sco: And the joinExpr read from the FeatureView definitions configuration
jodygarnett: Yes and ends up with nested loops like the one we have been looking at
jodygarnett: but only as a last resort.
rob_cto_sco: AS because we are packing the Xpath into the fray to define joins to get the attributes from
jodygarnett: (you can write a FeatureReader that generates an SQL statement for Oracle/MySQL)
jodygarnett: the SQL does the same thing:
jodygarnett: SELECT field1, field2, field3
FROM first_table
INNER JOIN second_table
ON first_table.keyfield = second_table.foreign_keyfield
jodygarnett: combined with a bit of SQL aliases:
jodygarnett: SELECT column AS column_alias FROM table
jodygarnett: And we have a winner.
rob_cto_sco: these are different though - SQL returns a denormalised rowset - whereas we want to reconstruct the real objects (1 result per feature)
jodygarnett: this would let you define expressions for new attributes: SELECT area AS area( geometry)
jodygarnett: yep that is a trouble.
rob_cto_sco: its critical for defining the API in particular...
jodygarnett: They should have the same FeatureIDs?
jodygarnett: It will make our implemention of that inner loop more interesting.
rob_cto_sco: and what if we have 3 attributes with multiple values - you get an exponential increase in the number of results with each complex attribute.
jodygarnett: I would much rather not issue two sql statments and do a client side join (for starters everytime we do GIS data sizes break everything).
jodygarnett: These issues are why I wanted to do SQL when needed.
jodygarnett: I am actually more worried about inner/right/left join issues.
rob_cto_sco: I'd be inclined to join only where the outermost filter requires you to select across the joins, and the atteributes are from the same datastore
rob_cto_sco: After that, be tempted to try this as a default strategy:
jodygarnett: Any solution that restricts us to one datastore won't fly,
jodygarnett: people really want to add data columns to shapefiles (because arcview lets them do it).
rob_cto_sco: query constraints on datastore with root feature
jodygarnett: not sure I understand that last one?
rob_cto_sco: query related data stores on feature attributes where constraints supplied
rob_cto_sco: (arbitrary order)
rob_cto_sco: Reduce ioriginal set as a result (inner join outcome)
rob_cto_sco: create selects per attribute
jodygarnett: ah yes - I think that would be an acceptable limitiation.
jodygarnett: Now I confess I had not thought through your use case - of roads and junctions.
rob_cto_sco: read features, extracting results for attributes from the rowsets (lazy reading) only when serialising into GML
jodygarnett: Ah this is your lazy reading, we usually describe the same thing as "streaming"
rob_cto_sco: Back to your point - yes we do want to join attributes to shapes - but thats not my immediate requirement.
jodygarnett: (because we started with files, not rowsets).
rob_cto_sco: Sorry - I was attempting to use the terms I had seen used in the discussions - I think thats the way chris refers to the issue.
jodygarnett: Okay I am getting a good feel for where you are coming from, and I think you understand what technology I have been trying to build to get you there.
rob_cto_sco: Thanks for your help. A question - what I think I'd like to do is this: