I'm having a memory leak issue. Upon investigating this more, it seems the file based geodatabase I write to in a loop, is growing very large, and as it grows, it significantly degrades the performance of the scripts I am running.

Any ideas how to optimize the configuration of the fgdb? Or how to speed it all up? I am not writing 'in_memory', I am using aggregatePoints to create a temp featureclass (which I delete), and I buffer this FC, which I keep.

geom is a collection of Points, scratch the scratch area (local gdb I am using).

Just looping through a list of files, I call a procedure that creates alist of geoms (and doesn't degrade) and then call this. Doing this, will see this function, createGeom, degrade significantly, and the previous one, not a bit.

The RAM Disk option is low-hanging fruit. And unfortunately, I can't link to whuber's: comment on that question, but I think it highlights where the most performance gains can be made (so maybe you should post your code in your question).

whuber: Even a high-end (ordinary) disk drive can make an appreciable
difference. However, by far the most dramatic improvement in
long-running GIS processes is made by improving the algorithm. In
many, many cases, if your computation is taking noticeably longer than
the time required to read all inputs and write all outputs, chances
are you are using an inefficient algorithm.

Yeah, I have been through the code a few times and even had it code reviewed by ESRI, who said it couldn't, really, be improved. I think a hard ware review is out of the questions, but I might push for a solid state drive lol. I saw those posts before, but thanks for adding them here for completeness. Thanks for the help.
–
HairyAug 9 '11 at 11:46

it's actually a fair point! I put a test harness together for ESRI, which simply looped through a list of text files containing a collection of points, to which it went to one procedure to aggregate the points. One call to arcpy caused the memory to get all beat up.
–
HairyAug 9 '11 at 12:15

When the process starts, it uses very limited memory and performs incredibly quickly ~2 seconds a polygon. However, this slowly creeps up as data is added to the fgdb until it's almost 30 seconds a polygon for more or less the same size unit.
–
HairyAug 9 '11 at 13:39

Although ArcPy and ArcObjects are syntactically different, semantically it is the same. You asked if performance of FileGDB degrades as it fills up. The answer is no it does not. What degrades is when cursors you are using whether explicit (i.e by declaring them yourself) or implicitly (by using a canned arcpy function that creates them) are not released properly. There is a price for using the canned functions - a trade off of simplicitly for resource control (you barely have any). This is not inherent to FileGDB, but to ArcGIS architecture as a whole. The moment you choose to use ArcPy, you are making the statement "I'd rather trade-off the complexity of using a lower level API for performance". The higher you go, the more you trade. ArcPy is pretty much at the top of it all... one function and you are done! Nevertheless, that thread still went to the ArcObjects initialization process, and the associated workspace factory allocation, and the associated file handle open for every scratch workspace (a full gdb!) and until the internal memory freeing routines kicks in (in .net you can explicitly call the gargabe collector and in C++ is inmediate, not sure about the Python-Com memory model honestly) you will have these gdbs and resources staying resident. So you probably have several dbs, and fc handles in memory every time you loop. That is the price of ArcPy and it is up to you to decide whether it is worth it.

I know they are, which introces a lot of issues. However, the use of cursors is 'syntactically' different, so I don't have the same options, it also isn't the issue, as I get this by simply calling the AggPoints function inside the call, without a cursor.
–
HairyAug 10 '11 at 8:57

There's a memory leak in ArcGIS 10 which is being fixed in SP3 apparently.

Also, I decided I would delete the 'in_memory' data and compact the database on each loop. which actually sped the application up. Then, when I run the script again, I delete the fgdb and recreate it. it's sped it all up by 30%. However, once the memory leak has been fixed, we expect much better gains in performance. Arcpy is a pig in loops...

I know an answer has been accepted already, but I thought I'd add my experience and some code I use to get around it. I have a project that generates a large volume of intermediate data. The size is small, but the number of features/rasters is significant. After having my program benchmark where the performance loss was occurring, it was clear that it was happening on the FGDB writes. At the start of the program, an FGDB write would take ~2 seconds. After adding about 100-150 features, it would take about 6 seconds, and increase approximately linearly from there.

I solved this by creating a simple gdb class that tracks the number of features and creates a new gdb for that purpose if we hit the threshold. NOTE that it requires a bit of adaptation because it takes a variable name in another module as a parameter and sets that variable's value to the path to the gdb. In this case the module is "config" as seen in the method switch() below. If you correct that to your own module (or just switch it to set a name in the current module), this can be adapted. It also generates names by splitting off the FGDB name from the input variable, so you'll need to seed that first. Otherwise, you can adjust the code in init to behave differently.

I've found 150 to be a good tradeoff between complexity and performance for my project. Write times tend to fluctuate between about 3 and 7 seconds with that threshold. Set db_size_threshold to whatever balance you are looking to strike on your own hardware.