FSMC - Disable "burst" writes

I'm using the FSMC in the STM32L4, but can't figure out how to disable "burst" writes.
By this, I mean that multiple writes happen within a single CS.
When I do multiple writes in a row to external memory, sometimes these writes get "grouped" into a single chip select. If I add some __NOP()'s or enough instructions between, the writes occur correctly.

Here's my configuration, which is basically taken straight from the CUBE example code:

hsram.Instance = FMC_NORSRAM_DEVICE;

hsram.Extended = FMC_NORSRAM_EXTENDED_DEVICE;

/* SRAM device configuration */

SRAM_Timing.AddressSetupTime = 1;

SRAM_Timing.AddressHoldTime = 1;

SRAM_Timing.DataSetupTime = 1;

SRAM_Timing.BusTurnAroundDuration = 0;

SRAM_Timing.CLKDivision = 2;

SRAM_Timing.DataLatency = 0;

SRAM_Timing.AccessMode = FMC_ACCESS_MODE_A;

hsram.Init.NSBank = FMC_NORSRAM_BANK1;

hsram.Init.DataAddressMux = FMC_DATA_ADDRESS_MUX_DISABLE;

hsram.Init.MemoryType = FMC_MEMORY_TYPE_SRAM;

hsram.Init.MemoryDataWidth = FMC_NORSRAM_MEM_BUS_WIDTH_8;

hsram.Init.BurstAccessMode = FMC_BURST_ACCESS_MODE_DISABLE;

hsram.Init.WaitSignalPolarity = FMC_WAIT_SIGNAL_POLARITY_LOW;

hsram.Init.WaitSignalActive = FMC_WAIT_TIMING_BEFORE_WS;

hsram.Init.WriteOperation = FMC_WRITE_OPERATION_ENABLE;

hsram.Init.WaitSignal = FMC_WAIT_SIGNAL_DISABLE;

hsram.Init.ExtendedMode = FMC_EXTENDED_MODE_DISABLE;

hsram.Init.AsynchronousWait = FMC_ASYNCHRONOUS_WAIT_DISABLE;

hsram.Init.WriteBurst = FMC_WRITE_BURST_DISABLE;

hsram.Init.ContinuousClock = FMC_CONTINUOUS_CLOCK_SYNC_ASYNC;

hsram.Init.WriteFifo = FMC_WRITE_FIFO_DISABLE;

hsram.Init.PageSize = FMC_PAGE_SIZE_NONE;

When I do a bunch of writes in a row, here's what I get. (I've inserted some __NOP()'s toward the beginning to show you what it should look like.)

Edit: In case it isn't clear, I'm using the FSMC asynchronously, so I thought this sort of bursting would be disabled. And furthermore, I have this: FMC_WRITE_BURST_DISABLE.

I slowed the clock speed down to 1MHz (it was 48MHz) and I still see the same issue:
(On the right side, I'm writing out: 0, 1, 2, 3 as you can see on the D[3:0] data lines. But they all happen within a single chip select. My logic analyzer is sampling at 100MHz, so 100x oversampling for this case.)

Edit: It seems like the above behavior is the way it is supposed to work. (It's just not how I envisioned it in my head, based on the timing diagrams presented in the datasheet/reference manual).
I think CS staying low for multiple writes should be fine for my application.