100% Fill Factor. When to use it?

I have a table that's heavily queried and updated. Deletes and inserts are probably less than a half dozen times a year and the system is locked to users at those times. I have three unclustered indexes set at 100% fill factor. Would there be any advantage to reducing the fill factor? It is probably 50/50 split on the #of rows updated and the #of rows queried.

You can reduce the number of page splits by setting the appropriate fill factor option to reserve free space on each index page. The fill factor is a value from 1 through 100 that specifies the percentage of the index page to be left empty.

The fill factor percentage is used only at the time the index is created. If the table contains read-only data (or data that very rarely changed), you can set the 'fill factor' option to 100.
Keep in mind about page splits problem, if there is more free space means that SQL Server has to traverse through more pages to get the same amount of data.

Even if these are inserts into the tables, one, the inserts are 6 times a year. Two, the users are locked out of the database during these loading and maintenance periods. Three, during this lockout period you can rebuild indexes to defrag them if need be.

It would only make since that this fill factor would be 100% for optimal read performance given the fact that that is the primary use of these tables and that tables are offline during load periods which as stated above allows for maintenance on these tables.

Sometimes it can be really helpful to read closely through the postings. [<img src='/community/emoticons/emotion-5.gif' alt='' />]<br />Although INSERTs and DELETEs are only 6x a year, he write the table is heavily UPDATEd. <br />So, I think, before suggesting anything on the fillfactor, the more important question is, if there is a clustered index on that table. [<img src='/community/emoticons/emotion-1.gif' alt='' />]<br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Ich unterstÃ¼tze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>) <br />

Monitor the fragmentation to determine the ideal configuration on a per-tableclustered index basis; the amount of storage required can be dramatically different for configurations with a 5% to 10% difference. The lower the fill factor (i.e., 40% to 50%), the more storage is needed and the more pages an index will have to scan or seek to fulfill the query request. With a high fill factor (90% to 100%), less storage is needed and fewer pages are scanned, but costly page splits can occur causing performance degradation and index fragmentation.

If columns that are part of non-clustered index are not updated you can keep 100% fill factor. If they are, it is better to have lower fill-factor. To find optimal you will have to test, it depends too much on usage pattern, so nobody can tell you which to use without knowing much more about your db and process.

There was a difference if update is done "in place" or not. It really depended on presence of varchar/nullable column. I don't remember how it actually works in case of mssql server 2000, can someone find a chapter about the topic? I don't have "Inside mssql server" or any other sql server book with me anymore. <img src='/community/emoticons/emotion-6.gif' alt='' /><br /><br />Besides the question how update affects data rows, I am not sure if delete/insert of data row when clustered index is present affects index in case no index column is updated.

quote:
Updating in Place
In SQL Server 2000, updating a row in place is the rule rather than the exception. This means that the row stays in exactly the same location on the same page and only the bytes affected are changed. In addition, the log will contain a single record for each such updated row unless the table has an update trigger on it or is marked for replication. In these cases, the update still happens in place, but the log will contain a delete record followed by an insert record.

In cases where a row can't be updated in place, the cost of a not-in-place update is minimal because of the way the nonclustered indexes are stored and because of the use of forwarding pointers. Updates will happen in place if a heap is being updated or if a table with a clustered index is updated without any change to the clustering keys. You can also get an update in place if the clustering key changes, but the row does not need to move. For example, if you have a clustered index on a last name column containing consecutive key values of 'Able', 'Becker', and 'Charlie', you might want to update 'Becker' to 'Baker'. Since the row would stay in the same location even after changing the clustered index key, SQL Server will perform this as an update in place.

Updating Not in Place
If your update can't happen in place because you're updating clustering keys, it will occur as a delete followed by an insert. In some cases, you'll get a hybrid update: Some of the rows will be updated in place and some won't. If you're updating index keys, SQL Server builds a list of all the rows that need to change as both a delete and an insert operation. This list is stored in memory, if it's small enough, and is written to tempdb if necessary. This list is then sorted by key value and operator (delete or insert). If the index whose keys are changing isn't unique, the delete and insert steps are then applied to the table. If the index is unique, an additional step is carried out to collapse delete and insert operations on the same key into a single update operation.

As far as I understand this, "done in place" is almost meant literally and doesn't have something to do with VARCHAR or NULLs. But I must correct myself. I was thinking inside replication (the newest baby I'm playing around with).

quote:As far as I understand this, "done in place" is almost meant literally and doesn't have something to do with VARCHAR or NULLs.

I'm pretty sure it was like that in sql server 6.5, maybe even 7.0. Anyway, for 2000 it looks like it is not the case, so updating columns that are not stored in non-clustered index leafs should not cause page splits at all.

As you can see, the table isn't very wide, but it is hit a lot during the day. If I speed up the queries to this table, then my whole application speeds up. The DTE, MO, DOM, HR, and MI fields are calculated before insertion to help speed up future selects. I can't really normalize the table any further because only the OFF_COD, VACYEAR, SAVAIL, SFULL and VACDAY fields are the core of the table.

Pretty much all the columns are heavily queried, but the ones that are in EVERY query are VACYEAR, OFF_COD, (VACDAY or DTE), SAVAIL, SFULL. I have done a ton of testing on this table, I just want to know if there is anything obvious that I have missed.

The AvailCal index fully covers the most used query. I would say that the query it covers is probably ran about 5000-6000 times a day on average. The data that the query returns must be current at that point in time, so putting the results of the query in another table is not an option. Unfortunately the AvailCal query is also the most complex.

I've got the AvailCal query to run in under a second most of the time, but it it pulling about 3000 rows out of this table and there is a subquery that hits another table in the query. The subquery is also fully covered, but that table is going to be another thread once I have this table optimized.

The query already runs fast, but I am trying to get every little ounce of speed out of it. Like I said, I speed up access to this table, my whole application speeds up. Here's the AvailCal query just in case anyone's curious:

SELECT DISTINCT O.DTE, O.SAVAIL-O.SFULL AS SLOTS, DATEPART(dw,O.DTE) AS WD,
O.MO, O.DOM, O.YR, RIGHT('0'+LTRIM(RTRIM(CONVERT(CHAR(2),O.HR))),2) AS H, O.MI,
(SELECT COUNT(*) FROM VPVAC AS V
WHERE V.OFF_COD='XXXX' AND O.VACDAY>=V.VACSTART AND O.VACDAY<=V.VACSTOP AND
V.WLIST=1 AND V.VACYEAR=2005) AS WL,
O.HR FROM VPOFF AS O WHERE O.OFF_COD='XXXX' AND O.YR=2005 AND
O.MO=5+1 AND O.VACYEAR=2005
ORDER BY O.MO, O.DTE, O.HR, O.MI

The subquery is only 19% of the query processing and 63% is sorting the data. My first instinct was to put a clustered index on the table to eliminate the sort, but I didn't want to kill the updates and inserts. If I went to a fill factor of 50%, would I be able to put a clustered index on the table?

A few questions without analysing too much info from the last post (I'll need some time for that):

1. Why DTS is char(10) instead of smallDateTime? You may gain performance by changing data type.
2. What is the meaning of dates recorded?
3. Why is date changed? Are new values much different then old ones?
4. Do date parts recorded in columns yr, mo, dom, hr, mi come from DTE or vacDay?

5. Why do you have both vacYear and the VacDay?<br />6. Why do you order by o.mo when it is fixed by where condition o.mo = 6? Why 5+1 instead of 6? [<img src='/community/emoticons/emotion-1.gif' alt='' />]<br />7. What is the purpose of returning 3000 rows? If they are supposed to be displayed so user can review them, better display first N (e.g 50) and offer to show more on request. The query runs fast but client application needs some time to fetch and display them.

1. Why DTS is char(10) instead of smallDateTime? You may gain performance by changing data type.

DTE is a date in the form YYYY-MM-DD.

2. What is the meaning of dates recorded?
3. Why is date changed? Are new values much different then old ones?

DTE and VACDAY is calendar day, IT IS NOT CHANGED. My mistake!!! I got it confused with the vacation table. So the columns, OFF_COD, VACDAY, DTE, VACYEAR, YR, MO, DOM, HR and MI aren't updated. So here is the new list:

DTE stands for DATE, I don't like to use reserved words for column names. The format for the DTE column is CONVERT(VACDAY, CHAR(10), 21). DTE is not a calcuated column, but it is calculated off of VACDAY when the row is initially inserted.

I know that I could eliminate the derived columns, but they are never changed once inserted and it speeds up the select queries. If they never change then it serves to preprocess the DATEPART(MO, VACDAY) and similar processes.

Not sure what you guys are meaning by asking what the table pk is. I don't see anywhere that I talk about table pk.

quoteTE stands for DATE, I don't like to use reserved words for column names. The format for the DTE column is CONVERT(VACDAY, CHAR(10), 21). DTE is not a calcuated column, but it is calculated off of VACDAY when the row is initially inserted.

You don't need that column at all.

quote:I know that I could eliminate the derived columns, but they are never changed once inserted and it speeds up the select queries. If they never change then it serves to preprocess the DATEPART(MO, VACDAY) and similar processes.

It makes sense if you ever select rows by monts without specifying year or by day without specifying year and months. In that case you can make indexes starting with such column. But you can make this column computed and define index on it. If you allways select rows by date range then you can get rid of derived columns and you don't need indexes on date part.

quote:Not sure what you guys are meaning by asking what the table pk is. I don't see anywhere that I talk about table pk.

I guess you haven't created pk constraint on that table, but it still may exist subset of columns identifying single row. Maybe off_code + vacDay? if so create clustered pk on these columns.

quote:If I go fill factor 0, doesn't that just mean database default?

It is default unless you changed it. It is similar to fill factor 100% but there is space for a couple of items in each non-leaf node.

DTE column - if I am pulling all records for one day and I pull based on the DTE column, it is faster than doing the CONVERT on the VACDAY column. That is the whole reason for keeping the DTE column.

Isn't having an index on a computed column a thing for SQL 2005?

pk - primary key constraint. Gotcha. I thought you guys were refferring to something else. If I am not doing a key join in the structure of the database, then why do I need a primary constraint? You can't have a pk wihtout a clustered index, right? I will look into setting up a primary constraint if I decide to make the AvailCal index clustered.

I know that this thread isn't resolved yet, but I just wanted to take the chance to say thanks for all of the input that everyone has had on this thread.

quoteTE column - if I am pulling all records for one day and I pull based on the DTE column, it is faster than doing the CONVERT on the VACDAY column. That is the whole reason for keeping the DTE column.

Another correction, you may need non clustered index on vacDay only on top of what I already proposed. Another possibility is to create pk on VacDay and off_cod in that order and non-clustered index on off_cod and another non-clustered index on VacYear.

Frank, you are right regarding the reason I mentioned before figuring out it was wrong, so I edited my post. So the reason is not having hours and minutes other then 0, it is just the oposite: if I have a row '20050930 0:00:00' then query will return that row too. If we add:<pre id="code"><font face="courier" size="2" id="code">insert into #t('20050930')</font id="code"></pre id="code">two queries will return different resultset.<br /><br />I made too many sloppy mistakes recently [<img src='/community/emoticons/emotion-6.gif' alt='' />]

Will, take a look at link Frank posted, it's great guide about datetime queries.<br /><br />Frank, fortunately weekend is just a few hours away, I think I'll stop posting and rest a litle bit and it looks like you need a rest too [<img src='/community/emoticons/emotion-1.gif' alt='' />].

Thanks for all the responses. I copied the production table over to another database and I am running tests on different index changes based on this thread. I found some duplicate records, so after I get the application developers to clean them up, I will work on putting a pk on the table.

I'm not sure what I was thinking about when I posted that you couldn't put an index on a computed column. I have done that before.

I will look over all the posts today and do some testing and post my results later.