The BLE stack is already optimized to the best of options (optimization size) and then the library is created. But what you can expect to reduce is the flash requirement of the rest of the application. If your code is already optimized, then I would suggest you to use ARM MDK compilers (you can buy the license).

But again, there is only so much code you can optimize and if you anticipate a very big application, then I suggest you to use the PSoC 4 BLE 256K BLE modules, that provides you 256K of flash and 32K of SRAM. You can buy samples from http://www.cypress.com/products/psoc-4-ble-bluetooth-smart.