You can get the centroid of the building foot print with arcpy then implement some logic to move it a bit over and over again until it fits some criteria. More details about specifics may be helpful.
–
JustinFeb 7 '12 at 19:33

@Justin That sounds like what I'm looking for. Is there a tutorial online for arcpy that you know of? I suppose I'd define the angle direction and lenngth that the copy and move would repeat for?
–
EightyTwentyFeb 7 '12 at 23:01

1

I apologize for not responding to your request for an Arc10 solution. The reason is that I am not an Arc programmer anymore: after ESRI unilaterally abandoned years of user contributions in its change to ArcGIS, it became apparent that investing any effort in learning ESRI technology is likely to have only short-term gains and is not a good strategy for the long term. But I do appreciate that there are skilled GIS professionals who must use ESRI products and am happy to help them out. I believe that porting the Avenue code should be easy, because it (purposely) uses very basic geometric ...
–
whuber♦Feb 9 '12 at 20:17

2

This is a close call, Matt, due to the clear and close relationship between the two questions. On balance, I think the current structure is best for our site: one question asks for a (generic) method; this one asks for code on a specific platform. The first question has been asked and answered, so formulating this one as a mechanism to encourage replies--which had been requested several months ago--makes sense (although there are ways to do the same without creating a new question, such as by putting a bounty on the old one).
–
whuber♦Feb 10 '12 at 14:26

3 Answers
3

As alluded to in the comments of my previous answer (kept intact instead of edited, for comparison purposes), performance could be a lot better with additional optimizations (using the in_memory workspace instead of using geometry objects, move GP calls outside of loops where possible) as well as by utilizing multiprocessing.

Here is another version of the script that runs much faster, especially if you use all of your machine's processors. It uses the standard multiprocessing module in Python, using the Pool construct to queue up jobs that are partitioned by row ranges and processed in parallel by a set number of processes. It is configurable to use all, some, or 1 of your CPU processors, along with specifying the row partition lengths. See the comments in the "configuration" section for more details.

Basically if you tell it to use multiprocessing, it will partition the input feature class into ranges of OIDs, create multiple intermediate shapefiles, one per range, and finally merge them at the end. It also cleans up after itself by removing the intermediate shapefiles. There is some overhead to this, but the benefit of using all of your CPU power far outweighs the overhead in my testing (getting approximately 80% scaling efficiency with a dataset of 32k features).

Notes:

Because it uses the in_memory workspace for the computationally intensive shadow calculations, take care to specify an appropriate partition size (procfeaturelimit) if you are sending in extremely large datasets. With the default settings the partition size will be the count of input rows divided by the number of cores, which is probably the most efficient configuration unless you start running out of memory.

I have had zero luck using this as a script tool in ArcMap/ArcCatalog, even when set to run out of process. It appears that cursors are extremely slow when run out of process and called from a script tool, and subprocesses never seem to get created. So I am just running it from the PyScripterremote engine or at the Windows command line. If you can get it to work as a script tool, please let me know! Edit: It does work fine as a script tool if you configure it to not use multiprocessing (cores = 1, procfeaturelimit = 0) and run it in process.

As for performance, using the same dataset as my previous answer that originally took 1.25 hours, this version completes in 8 minutes, 30 seconds using a single process and no partitioning, and 2 minutes 40 seconds using 4 processes and 4 partitions (jobs) of ~8k features each. RAM usage is also much more reasonable, using around 250MB for a single process, and around 150MB per process for multiprocessing.

For comparison purposes, here is a test dataset we can benchmark against. It's ~22k records from the City of San Francisco's building footprints dataset (~40k polygon parts totaling 1,076,060 vertices, including one duplicate closing vertex for each ring), and I kept only one field with the height in feet of the building (prsizeh). With 4 processes it took 5 minutes and 30 seconds to complete, and with a single process it took 17 minutes, 30 seconds. I think they are fairly complex geometries because they were converted from 3D models rather than digitized from aerial photographs. (6277 features have multiple parts. One has 36 parts, many of them slivers.)

Superb - thanks for posting this. I've learnt a lot from looking at your code, not only relating to using the multiprocessing module, but also in terms of improving my coding style generally. It would be interesting to know how @whuber's Avenue script performs with your test dataset... Cheers.
–
JamesSFeb 15 '12 at 11:23

2

Avenue code scales primarily according to the number of requests executed (because the overhead per request is huge). This indicates the time will be proportional to the total number of polygon vertices processed (rather than to the number of polygons). Let's look at three metrics for the AV 3 scripts: (1) Max RAM, 47 MB. (2) CPU time (single thread), 7:05 (create project, read data, create shadows, save dataset, display final results). (3) Lines of code, 17 (200+ for the optimized ArcGIS solution). That last one is related to how much of the programmer's time is needed :-).
–
whuber♦Feb 16 '12 at 19:30

1

Thanks for the stats! Amazing to me how the Avenue code is almost as fast single threaded as the arcpy code running 4-way multithreaded! All we need now is someone to do something similar with a FOSS solution like GDAL.
–
blah238Feb 16 '12 at 19:56

I made a few improvements to @Luke's version of the script, mainly for better performance. The GP methods really don't like being called in a tight loop; by removing an unnecessary GP call and moving the necessary one outside of the innermost loop I sped up performance by 10x at least.

This also fixes shadows not being created for inner rings, degrees are converted to radians, the original FID is stored as an attribute of the output feature for joining purposes, and progress is reported at 10% increments.

NOTE: Performance is still pretty bad: Roughly 1.25 hours for ~34k buildings on a fast machine, and something is leaking memory at an alarming rate -- roughly 1MB/sec -- it will max out the process's 2GB RAM limit fairly quickly at which point performance will grind to a near standstill. Using gc.collect() does not help, and it sticks around even after the script has completed, so I suspect it is the geometry objects not being released internally, as in this ESRI forum thread. To confirm this, one could adjust the script to use temporary feature classes on disk instead of geometry objects for the intermediate geometry storage.

I think that this algorithm could also benefit greatly from multiprocessing. Since the algorithm does not require features to be grouped, one could partition by rows instead of by geographic areas.

+1 It is wonderful to see how this community can incrementally improve (and critically evaluate) a solution. Keep it up! (BTW, now I'll have to go resurrect that Avenue script and do some timing to see how well 20 year old technology keeps up...)
–
whuber♦Feb 12 '12 at 0:02

+1, shocking. Hopefully the supposed cursor improvements in 10.1 will help out here.
–
blah238Feb 12 '12 at 5:05

2

Is it the dissolve call that is wrecking performance? The overhead for that call is large as it is designed for larger dataset handling rather than one off calls. Perhaps write the parts to scratch (or in memory workspace) and then dissolve the whole fc in one call.
–
Craig WilliamsFeb 15 '12 at 7:47

1

@Craig, yes that is probably one of the main culprits is GP methods being called in a tight loop. I was calling Dissolve once per row -- the new answer I posted calls it much less frequently, depending on how the rows are partitioned. I still think that the geometry objects are what are leaking memory though.
–
blah238Feb 15 '12 at 9:59

Edited to handle exceptions caused by inner rings. It won't produce nice shadows from the inner "walls" but will just ignore them.
–
LukeFeb 10 '12 at 4:49

1

The right way is to double-loop: for each shape part, run an outer loop over the rings and keep the inner loop the same as you have it now. This creates a set of shadows for all walls, whether inner or outer.
–
whuber♦Feb 10 '12 at 14:30

1

There's a simpler way: use the null point as a sentinel to start over again. But building the set of rings may lead to clearer code (and will more closely parallel the pseudocode).
–
whuber♦Feb 10 '12 at 22:05