8 Answers
8

ORMs are not intended to completely take over access to your database. Use them for that 80% of code that's CRUD, the stuff that's too tedious to write on your own. Use stored procedures, dynamic SQL, or whatever you want for the remaining 20% that needs to be carefully optimized.

That would work if database abstraction was not one of the main reason why you decided to use an ORM.
–
user2567Nov 10 '11 at 16:28

@Pierre303, I'm having a difficult time understanding your comment. What do you mean?
–
Mark CanlasNov 10 '11 at 23:54

@MarkCanlas: I think he means "abstracting away the database," in the sense that you could change out the database (e.g. go from SQL Server to MySQL) if you wanted to do so. In practice, this use case hardly ever occurs.
–
Robert HarveyNov 11 '11 at 1:05

1

You can still create abstractions. Most ORMs that actually support multiple providers/dialects have support for provider/dialect-specific code. You can implement operations as bulk insert/array binding/TVP/whatever for specific databases and let it fall back to slow-by-slow for unsupported providers like SQLite. At worst you can break out the might-be-bulk functionality into a separate interface/class and sub in a different implementation based on build or config parameters.
–
AaronaughtNov 11 '11 at 3:34

Yes custom dialects can help, as well as specific code for specific problems. However for this to be viable on a financial point of view, this has to be limited to the strict minimum. Our customization trought custom functions (dialects) represent less than 0.1% of the total data access code base. I would be really concerned if it was more than that.
–
user2567Nov 11 '11 at 13:13

I use an ORM (nHibernate) in an application that requires high performance and handles billions of records. With time we noticed that most significant performance problems were related to our own way of using the ORM rather than due to the ORM alone.

The ORM should not replace your mandatory database knowledge. It's a tool you use to get more productivity and flexibility in your code, but you'll need to know underlying processes to optimize your performance.

You did not specify a specific ORM so here are the things we did to improve performance:

We used an ORM profiler. (we used nhprof)

We used a database profiler. (we used SQL Server Profiler)

We read as many articles as we can on the subject. (Many were available for nHibernate in addition to the entire chapter on the subject in the documentation)

We bought specific books on performance & scalability.

We created benchmarking system to test our own optimisations.

and more importantly, we were able to test our code with real life customers with huge data. That last thing alone helped us spot most problems in our application.

We managed to do it with Entity Framework, but our application did a lot of batch-style operations (we'd write large numbers of records to individual tables), so it was a good fit. I'd definitely see if it would be possible to retain the ORM framework if possible, just to reduce the amount of special-purpose code in your app. Is it possible to buffer writes, then execute them as a group? You lose transaction semantics, but if you're going with bulk operations I assume you've already come to terms with that.

ORMs do nothing magical. They translate object access methods into SQL. The SQL statements they execute are not necessarily slower than the SQL you'd write manually. Having said that, there are a few issues that you might stumble upon:

Transactions: One large bulk operation is almost always faster than many small transactions that together accomplish the same thing. Hence, if your ORM method calls use fine-grained transactions (the active record-style methods in Spring Roo entities for example are annotated as @Transactional by default), bulk operations will be slow. If that is the case in your application, you should look at your transaction logic.

Caching: In Hibernate, a first-level cache allows your entity manager to avoid unneccesary round-trips to the database. Good thing in general, but bad for bulk inserts, where it leads to unneccessary cache-clogging, resulting in degrading application performance. If that's your problem, you should look at the Batching pattern suggested above by ChrisAnnODell. We use it in our importers and it speeds up bulk inserts a lot.

There is nothing wrong with using native SQL to improve performance. But first make sure you understand what is slowing you down.

As mentioned by umlcat, there are some ORMs that will let you use bulk operations.

Even better, many ORMs are extensible, so you can just write your own method for running bulk operations, if not supported already. If the bulk operation in your application is something you can factor out, I'd add it as a layer on the ORM (to do that, you'll probably need to write raw SQL), but then in the application, use the ORM method you've implemented.

This also makes unit testing and debugging easier. Once you have good test coverage for your ORM methods, you're free to use it in your apps. Otherwise, debugging raw SQL (especially big ones with transactions and many JOINs) can be a pain.

It once took me almost a day to spot a bug in a raw SQL call that was almost 100 LOC, and the bug was just a single character! Since then, I try to avoid having raw SQL in the app, and have all SQL procedures separately unit-tested.

Bypass the ORM. Not only that but bypass "regular" sql as well. Use a bulk utility of your database to insert extremely large data sets to a staging table. Then use sql to perform your staging activities.

Well there aren't any design patters that I'm aware of. My guess is that you made the decision for the ORM for a reason, so abandoning the ORM is likely not what you want. However, in these cases I think there is room for mixing both solutions. There's nothing wrong with that, as long as you do it conciously and document why you deviate from the default use of the ORM in you software.
Next to that, some ORM frameworks have some facilities for doing bulk operations. I know nHibernate (ORM for the .NET framework) has socalled StatelessSessions, which have a lot less overhead, but this might still not give you the perfomance boost you're looking for. In that case, just use raw SQL.