I'll look what I can do with the compression thing (is it helping much or just a little?) and splitting the vertexBuffer.
I implemented your suggestion on the textures. I'm not only drawing terrain, I draw a full world with models, etc.
I can see that when I got 20 draws on models with the same texture, only setting it once then draw all of them will boost the FPS.
But I have to set the textures for terrain every frame, because I need to set the textures for all models too, right? So I dont really see how I can save calls here.

A question: How do I set an array in my shader, just with the usual constBuffer?
Are the vertices processed by the shader in the order you save them into the vertexBuffer?

Notice how nothing has been sent to DirectX 11 yet.Textures, constant buffers, samplers, etc. only need to be sent when it is actually time to draw.So there should be another function that is called just before an actual render:

It is really not that complicated. All it is doing is checking for the fewest possible calls it can make to PSSetShaderResources() on each render call by comparing the currently active textures with the textures active on the last render. This is the only location where it is valid to call PSSetShaderResources(), so the local record of the last textures sent to DirectX 11 is accurate.

You basically need a similar system in place for everything. Samplers, textures, constant buffers, etc.And then you need to make this actually useful by implementing a render queue to maximize the number of times the same texture, shader, etc. are used in repeated render calls.

I would like to ask again, what hardware are you running on?
If this is integrated laptop graphics or similar that explains the low FPS, and vertex count can become more important.

I certainly don't mean to argue against the good points about state changes, but considering the code posted they will probably have close to zero impact in this case. All chunks in one buffer is probably better, but if you get bad FPS from drawing 9 chunks of 256x256, and the terrain drawing is actually the bottleneck, then you'd need like a thousand state changes per chunk to notice a major difference.

Just to set a performance baseline, create a small test program that draws a 1024x1024 or similar terrain with a single vertexbuffer and a single draw call and absolutely nothing else, to determine what your computer is capable of.

I'm using my gaming notebook which can run even BF3. So that shouldn't be the problem.

I'm very new to directx and all I can do I learned from books and tutorials, there never was anything mentioned about wrappers I simply create them with D3DX11CreateShaderResourceViewFromFile( d3d11Device, "tt1.jpg",NULL, NULL, &slopeTexture, NULL );and set them later as in my code aboth.

EDIT: I think I have an idea where the low FPS could come from: The function which is called every frame to look where new chunks are needed, the terrain::updateLet me post it, I'm sure you will find tons of things to be fixed:

What it does is:If the player is inside of the size of the terrain, look for each chunkposition around him if there already is one. And if not, create a new one and pass it the part of height and shadowmap. Then check if there are more chunks then I want to hold in memory and erase the first of them which are not visible.Could it be that the checking is timeconsuming and therefor slowing everything down? (I'm just using one thread at the moment!)

But Erik Rufelt is also correct. These redundancy optimizations are important, especially for the long run, but right now it seems clear that you have bigger issues at hand.

Put all of those chunks into one buffer and draw with one call. If the FPS remains mostly similar, it means you have a bandwidth problem, and the 16-bit vertex data optimization should be a large help.

Also, draw your terrain normally, but move the camera out so that it is only a small part of the screen.
If the FPS increases dramatically, you have a fill-rate problem related to your pixel shader. You could then start examining that.

When I move the camera out, FPS remain the same.
The FPS are about the same on a single vertexBuffer-drawCall.
16-bit optimization could really help, but I dont have the slightest idea how to do that.

According to your blog, I have to find the size of my vertexBuffer and then pad it to 16,32 or 64, but how do I pad a vertexBuffer?
The size of each element of the vertexstructure (3x sizeof(float) * 2x sizeof(float).... + padding) = 32, eg?

To be clear, are you saying that zooming out so far that the terrain occupies only about 100-500 total pixels on the screen results in a similar framerate?
And you are sure that the rest of whatever you are drawing is not causing this slow framerate?

If so, you definitely have a bus-transfer problem, and the 2 main optimizations would be to use 16-bit vertices and a second stream for the Y, and compressed textures.
Compressed textures are the easiest to implement so start there.

When my site talks about padding, yes, it is as in your example. Add some fake bytes so that the next element in the buffer is 32 bytes after the previous element.
But while this will help, it is not going to give you results you will find acceptable.
This and redundancy checks should be put on hold while you address your most major issue: bandwidth.

Yes, when I zoom out very far, I just get ~10+ fps. I outcommented everything but the terriain, so it's the only thing to mess with right now.

By bandwidthproblem you mean that the stuff I pass from cpu to gpu is much to huge?
So I split the vertexBuffer and update only the Y (using a constantbuffer? I'm not sure what you mean by a stream)
I'll start with the compressed textures right now.
EDIT: Compressing the textures (using BC2) gave me a slight FPS increase of 5-10.
I added a screenshot of it to my first post!

Bandwidth = the transfer from the CPU to the GPU.
It means the total amount you send to the GPU is too large. This includes textures, index buffers, vertex buffers, etc.
That is why using compressed textures can help.

How many elements are in your vertex buffer?
How many bits in your index buffer?

//vertexBuffer:
D3DXVECTOR3 pos;
D3DXVECTOR3 normal;
D3DXVECTOR2 texcoord;
D3DXVECTOR4 color;
D3DXVECTOR4 shadowColor;
//257*257 vertices per chunk
//indexbuffer: unsinged long 256*256*6
// as optimised as I was able to get it

Your vertex buffer is huge. Why do you need so much data? Why shadow color?
Why is your index buffer * 6? You should be using a triangle strip, not a triangle list.
Since you are bandwidth limited, this is one of the major issues you need to handle.
If you switch to triangle strips, your index buffer should become 257 * 257 + 2.
That in itself is much smaller but restricting your index buffers to 16 bits (while increasing the number of draw calls) often proves worth the extra draw calls.
But before reducing your index buffer to 16 bits, start by using a triangle strip. You should see a noticeable gain in performance.

I can't really see how to cut the vertexBuffer shorter. I could cut the shadowColor and just use bool for shadowed/not shadowed, but that would be all.I'm using a trianglelist because I was told that doing so reduces the number of vertices while increasing the number of indices, which is a good thing they said, I didnt know that a triangleleist is more performant.

A 16-bit indexbuffer can simply be created by, instead of unsinged long, using unsinged short and IASetIndexBuffer( indexBuffer, DXGI_FORMAT_R16_UINT, 0);?To go with a trianglestrip, I have to rebuild the entire terraincode, which could take a while. But if you say that it is worth it, I'll go for that.

Cache-friendly vertex buffers is another issue worth investigating.
By the size of your vertex buffer it appears you have repeating vertex data. If you are using an index buffer, your vertex buffer should be much smaller (257 × 257).
Are you eliminating duplicate vertices? If not, this would be the first thing to do.

You got me wrong, I think. My vertex buffer is only 257*257. There are no doubled vertices.Because I'm using a trianglelist, I need more indices. Most tutorials and posts said that would be worth it by far.

The problem I'm currently facing is that I cant figure out how to stream the y-values to the shader properly.Another thing I just noticed is that, when I pass the y values per chunk to the shader, I have to do the same thing for shadows and normals. Will this still be a performance increase?