How to design and Create an apex job to process a million records

If you are using an enterprise org and have a requirement to process a million records in an automated fashion, apex job provides the solution to process those records. But at the same time, we need to be careful of the governor limit and use the following considerations to design an apex job which would scale and process million records easily.

1. Designing the query locator
For any reports or any processing, we need a soql query . We need to create a soql query and set it at the query locator. Now
a. If you are having multiple objects which need to be queried, you need to make sure that there is a relationship defined between the objects which can be a lookup or master detailed. This would help to query more than one custom object.
b. There could be scenarios where there are 2 custom objects which may not have relationship defined but still some loop up data exists in common between the objects. For eg if we have to process say a count of applications for each state or county and let us say the application is a custom object and there are some check box fields which are flagged for type like change in address, change in birth date etc on the application object and the application object has a look up code for the county custom object which has a id and name and description. Now in this scenario, we might have to query the counties first and then loop through the application object in the execute method. So the design principle is to choose the lookup custom object in your query locator and use the data entity like application, order in the execute method which would help to get the result done quick.
return Database.getQueryLocator([
SELECT LKUP_CD__c, LKUP_GRP_CD__c, SORT_ORD__c FROM CNFG_SELECT_OPTIONS__c
where LKUP_GRP_CD__c =:COUNTY_GRP_CODE
AND LANG_CD__c =: LANG_CD and
LKUP_CD__c!= 'SEL'
ORDER BY LKUP_CD__c )]

2. Implementing the execute batch and bypass governor limits.
If you use the execute method, you would know that salesforce has the capability to create a separate thread for each execution of the batch for a maximum of 2000 records. So salesforce would break the query into 2000 records and run several batches.
Things to watch out for.
1. If you use soql queries inside the loop in the execute method, you would end up with 200 maximum soql limit exception. So to avoid this, what you need to do is limit the scope to 50 or 100. To calculate this, let us say if you have 10 soqls inside the for loop, then your maximum scope limit would be 20. To do this , you would use the following code snippet while executing the batch.
Database.executeBatch(batch, 9);

2. If you are doing counts or using aggregate functions to do some reporting and let us say there is one custom object where you would have to group all the fields and get a count, you would try to use one soql query to get all the counts.
SELECT Count(ASSET_CHG_SW__c) cnt , Count(ADD_CHG_STAT_SW__c) addchg,
FROM RMC_RQST__c
WHERE application__r.APP_CPLT_TMS__C >=: rptStartDate AND
application__r.APP_CPLT_TMS__C <=: rptEndDate AND
application__r.cnty_num__c =: countyCode and (
ASSET_CHG_SW__c = true OR
ADD_CHG_STAT_SW__c = true)
group by application__r.cnty_num__c, ASSET_CHG_SW__c , ADD_CHG_STAT_SW__c;
Design Principle
If you want to use count fields on a custom object, ensure that the field is not nullable. In other words, checkbox, radio button fields will not work on count(Field) queries. So you need to make a separate call like count(id) to get the count values.
3 . Never use dml statements inside the execute batch as you are going to hit the 150 DML LIMIT exception. Use a list and add all the objects.
4. If you do any delete in the execute batch method which is executed as part of the loop, ensure that the deleted record is not added to the list where you would do the upsert or update at the end in the final method.
5. If you want to have the line numbers printed on your stack trace in the execute method which would fail, then always put a try catch block at the start of the execute method and in the catch exception, assign the exception error message to a string and log it to a custom object or email.
6. Create a custom object called BatchMaster which would have job id, start date, end date, exception message and count of records to indicate the number of records processed in the batch.
catch(Exception e)
{
System.debug('Exception in executing Batch BTCH CHG RPT'+e);
isBatchRunSuccesfully = false;
batchErrMessage = e.getMessage();
}

3. Reporting and debugging at the final method.
The final method is the method which would be executed at the end. So here are things which you want to do on the final method.
1. Create an email method to send an email with batch status, number of records processed, input passed etc which would atleast tell you in a quick glance whether the apex job succeeded.
2. Do the dml statements on the final method by using insert , upsert or update batch methods which would take list as input.
3. Update the batch status at the end of the final method if you are using a custom object.

So to summarize, here is a quick list.
a. Use the scope parameter to limit the number of records in the execute batch to avoid soql query limit.
b. The query locator can scale to 50 million records and would process 2000 records in a batch.
c. Do the dmls in the final method and send an email on the job status.
d. Do a try catch on the execute batch get the exception error message on to a custom object.
So please let me know if these principles helped you to design your batch job better.Do not do list
1. avoid dmls in the execute batch and use a list to populate it.

About The Author

Hi there!
Thanks for visiting my blog. My name is Buyan Thyagarajan (Buyan47) . I am a Salesforce consultant specializing in higher education and helping businesses to measure business results , be proactive with their customers , prevent problems before hand and make the right decisions using Salesforce CRM. This blog would help you in the following way.
a. For Universities, my articles and tips would help you to increase your student recruitment, get better insights on your marketing campaigns and ensure student success by maximizing the features available in your salesforce org, integrate with your student information systems ( Banner, Ancor etc) and create a connected campus on the digital world.
b. For Salesforce admins, my articles would help you to be proactive with your salesforce crm, prevent problems and make the right decisions on different problems with salesforce.
c. For Business executives, my articles would help to measure business results with your salesforce crm, increase adoption and provide solutions for your business problems.
d. For IT executives and developers, my articles would help you on best practices on change management, provide insights on increasing your performance with your current development teams and guide you to make better architectural decisions with salesforce and salesforce1.
I am passionate about the following.
a. Software is meant to automate work but in reality IT is the most manual industry where programmers have to type those 100,000 lines of code everyday to make things work. Programmers get lost in the code and business folks wonder why the programmers are not getting it. I have some insights on this and in the process of publishing a book which should help in the following.
a. Help IT teams to be innovative and offer business value
b. Career pathway for programmers
c. Relevant things a business user can provide to IT to get things done quick.
Please feel free to post your comments on the site on my articles and any questions you have and I would be glad to answer it for you.
To reach me
Ph: 302-438-4097
m: buyan47@gmail.com