A large number of rows are to be copied from one SQL Anywhere 11.0.1.2276 database to another database in the same folder on the same computer via simple INSERT t SELECT * FROM proxy_t statements. Millions of rows, gigabytes of data, hours of processing time using two SQL Anywhere engines.

What are the performance pros and cons of starting both databases in one SQL Anywhere engine?

One major con to using a single server would probably be with respect to the number of workers available. Since you are using 11.0.1, the number of workers is fixed (i.e. there is no MPL support prior to SA 12). Now consider the following:

you connect to the server and begin the insert-select

the local connection gets a worker assigned to it to process the request and that worker will remain assigned to the connection for the duration of the request (i.e. for the entire insert-select)

the remote data access layer now makes a remote connection and executes the select portion

the remote connection will now get its own worker assigned whenever it is active (i.e. whenever it returns the next block of rows)

if the server decides to execute the "select" portion in parallel, then it could utilize several workers to fetch the rows from the "remote"

So, for the duration of the insert-select, the "remote query" will utilize at least 2 workers and possibly many more than 2 workers. Having the remote database on a separate server will change things so that the local server only utilizes one worker for the entire insert and the remote server would utilize 1 or more workers for the select. If the local server is also servicing other connections, then pushing the select to a different server and leaving the additional workers available for the other "local" connections would be very beneficial. At the same time, if the select portion of the query can be executed in parallel, then having a remote server that is not as busy as the local server and having a full set of workers available to handle the select would also be very beneficial.

So, my opinion is that having two servers with each having its own set of workers (and cache etc.) is probably the better approach here; even if both servers are running on the same machine.

Breck says: A wonderful discussion! Now... what if there are no client-server connections to the local database at all; i.e., what if the entire process is being executed by a DatabaseStart event, and client-server connections are forbidden? (this is a "batch" process that is part of an embedded database upgrade process).

The event would still use a temporary connection locally and a "real" connection would still get created to the remote. With respect to workers, the event connection would still tie up a worker and the remote connection would also tie up one or more workers depending on whether the select could be executed in parallel or not. So I am not sure much changes other than the fact that the connections would both get dropped once the event completes.

My biggest concern with respect to workers is the fact that because one worker is guaranteed to be tied up by the local connection for the entire duration of the insert-select, I am worried that the server may decide that it does not have enough workers to run the select in parallel when in other cases it would run the query in parallel.

By the way, when you say client connections are forbidden, do you mean to both databases? If so, then things won't work at all since as I said above, the connection to the remote database is and must be a "real" client connection. If that is one of the restrictions, then you have no choice but to go to two different servers. Unless ofcourse, I am missing the point entirely.

Yes, technically speaking the proxy connection from local to remote is a "client connection" and it is allowed. In fact, the mechanism for preventing client-server connections is [gosh, I can't remember]... but hey, the code works, I'm looking for performance improvements :)

BTW, you have 1000+ reputation points now, you can edit your reply if you want. My edit of your reply is an experiment, to see what an in-place conversation looks like... until now, editing of other people's stuff has been rare on SQLA.

Both DBs in the same folder will probably lead to IO as your main problem. Spreading IO accross multiple devices for this task will bring you probably more, than thinking about optimisations regarding the db engine. E.g. use for each db file its own (SAN, Raid, ...)
And by the way, increase the DB file size of the target before starting the copy ;-)

@Martin: This is an embedded application where the new database must end up in the same folder as the previous one. There is no "DBA control" over this process, and no knowledge of drives etc on the host computer; in particular, there is no guaranteed that a second physical drive even exists. Plus, I am guessing that the savings gained by using separate physical drives would be lost by the required final move of the new database back to the original folder (or the initial move of the old database to some other drive).