High concurrency for master and high reading from slave slowing down insert

Hi All,
I am having a system which receive data from around 30K gps . Each unit will be inserting concurrently around 1 minute.
1.What tweak should I add to cater for this? I target my ram to be 48Gb with 24 cpus.

Once the data grow should I do partition or leave it . I am worried as the data grow the insert will be slow?

What should I anticipate and handling for heavy traffic?

I plan to have replication data so that the reading can be done from the slave? But what I am worried is that if I delete data from the master then it will effect slave too right?

Is possible to delete the partition from master but not from the slave?

I wonder if by gps you meant GPS? If so I can share a little but not all about a fleet tracking system that I was a team member on. The GPS data would only go down to about 5 second intervals on the product so this was more of … Read More

@newbie. We never saw it. I didn't want to bring this up but how is your internet connection concurrent? That is, at the data center we used the packets would arrive one after another. Yes, to a human it seems concurrent but it really isn't.
So to test load you … Read More

I wonder if by gps you meant GPS? If so I can share a little but not all about a fleet tracking system that I was a team member on. The GPS data would only go down to about 5 second intervals on the product so this was more of a bottleneck in comms (communication) than the SQL and app that filed the data.

As to heavy traffic this system still runs on a virtual host and you are 100X more than what we use on a fleet of about 1K vehicles.

My view is you are overdesigning. Or maybe you didn't try it on a simple server yet.

Yes I mean gps device. Our interval will be one minute data. The comm server will be java. I plan to have multiple comm server. I have tried so far with around 2K devices. So i want to prepare now for bigger traffic. What I am worry can mysql handle this traffic?

I'm going with yes. Why? On a virtual host with MySQL and some JAVA code filing data away, 1K vehicles seem fine. The fleet uses cellular modems to send in location and the virtual host is on some hosting company.

I can only share so much about this but you with dedicated machine and 100 times the machine we use the only reason I could see it fail is a coding/config error. Or "Pilot error."

@rprofitt. What I mean is my how I collect the gps data is via a java application. In the java application I do connection pooling and prepared statement. That is what I mean so for no clarification. Should I put my code snippet?

Sorry newbie14 but your questions are unclear. For example while I worked on a team that setup and have a live GPS tracker with so many features your questions are unclear and maybe you are not an English speaker. You wrote "What java tios ..." and didn't explain what tios means. It's not a word I encountered in Java or during our development cycle.

So back to the top. Your hardware specification of " ram to be 48Gb with 24 cpus. " sounds like sheer overkill to me. The system we ended up with is a virtually hosted app and handled thousands of vehicles. I can't imagine how many your server will handle but it must close to over a million.

@rproffitt sorry for my earlier post I got some typo error. Let me share below is how my tcp communicator snippet code looks like. Tcp communicator decodes the GPS location and inserts that data into the database and before inserting we do some selects and updates but all I do it using prepared statements. Right now, one thread of TCP communicator serves one device request.Immediately after creating the thread we get one connection from the pool. After decoding the GPS data is where we perform the multiple select, update and insert for each data. As number of devices are increasing, the number of concurrent connections to our Mysql database are also increasing. We are now anticipating 30 to 50 thousand devices pumping data every minute.Currently below is how the snippet of the whole tcp communicator looks like. I know eventually we will facing both insert bottleneck into the database. What will the best solution to over come this scenario? Will Java also be able to handle this many concurrency ?

@N so you need to do load testing. My bet is you won't find this to bottleneck to RAM or CPU. And if we had a few thousand with a virtual host, and I'm repeating, the hardware you listed looks good from a million clients if the internet and disk bandwidth holds up.

@rprofitt I wonder how to do a parallel 50 thousand load testing ? What tool is available but if I write my own socket client that will be sequential not parallel. If you notice at the tcp communicator I will be doing a number select and updates and finally only do the insert. So in one minute can it handle up to nearly 50 thousand of these type of select,update and insert queries? How many thousands you had and what was your hardware specs? I am sure I will need to do some tweak at the db level like innodb tweak and query indexing ? Over time the db will increase its size and insert will slow down right?

Slow downs in most MySQL systems I've dealt with are usually caused by simple things like SELECT * statements.

I'm a little amazed you are approaching this scale of a system without a programming team. Here we had about a dozen working on the product at one point. I guess there are companies that would hope you as the lone programmer will do it all.

@rprofitt did you face slow down while insert multiple concurrent data which already have hundred of millions of data. Any idea or experience handling those traffic ?We are a team not just alone programmer. The issue now we are looking into solution for this as we anticipate the growth in near future we dont want to be caught later and not knowing that to do when we are facing the traffic so we are doing research and study from now onwards. I dont get you when you said you crushed the ram and cpu you mean its too much for the ram?

@cereal thank you I have been reading about it. The problem is that I cant send the same data for the 50K connection which I need to make I need to simulate as different device id for that purpose can mysqlslap handle this?

@newbie. We never saw it. I didn't want to bring this up but how is your internet connection concurrent? That is, at the data center we used the packets would arrive one after another. Yes, to a human it seems concurrent but it really isn't.

So to test load you just load up the input and see how many packets your system can process per second. Bingo, you have your throughput numbers. My bet is your internet feed or drives will saturate before the 24 CPUs do.

@rproffitt ok I will go with your advice. So in other words everything is buffered at the socket level is that what your trying to explain to me? So how to test for this "How many requests per second can my system process?" What about the code snippet which I have put some to do the stress test? So if its trully not parallel then is all sequential why do we need multithread just would like to understand moer details?

@N, I am not trying that hard to get you unstuck from the usual new to this view or parallel processing. You can imagine what would happen to the internet if there was no buffer in systems today. I've given you a view to where your monster machine is overkill by sharing a little about the system the company here went with.

I've seen this before and usually the persons that go down this path burn out as they pile a million bucks into servers and run out of cash before the company lifted off.

Here the company went with a cloud so it could scale later. The group is a lot more seasoned and we knew to not overdesign this. We went with a simple desktop server during demos, later it fit on a laptop and finally went into a virtual host in a cloud system. It runs fine with thousands of clients.

PS. Get a measure of throughput. How many transactions per second can you achieve. Then think where the bottleneck is. With 24 CPUs it won't be the CPUs.

@rprofitt actually I am trying to get unstuck from the parallel processing view. Let me understand at the lowest level say now I got 10 devices trying to make connection to my socket server. So what is happening is that all the data is line up in the buffer right. So then how is the socket start here.

Can I say that is this done in sequential method? So is it safe for the device to keep holding the socket and not close for a long time or is better to close?

Actually the monster machine we have not setup we are planning to setup not yet on the cloud too.

Can you give me more tips on this " Get a measure of throughput. How many transactions per second can you achieve. " How get this metrics any method or can I work based on my stress test script ? Please give more hints as I am willing to learn and change accordingly.

AHH, now that's where I can see a design blow up. We went with UDP rather than TCP to avoid a pile of open sockets. Using UDP meant we developed our own protocol and you may not ask me for that. But if you were to try to open thousands of TCP sockets I can see where you would blow up.

@rproffitt ok I got it unfortunately the device itself support tcp only so not much of choice left here. No problem I wont ask you about your udp. I want to first understand so you said every tcp which comes in is buffered rite. Then it pick up the socket.accept and create a new socket thread is that correct? There after the device is connected and send the data over the socket which have been build ? Am I right here? So thereafter the socket is build is it still sequentially done right. So how I do a stress test on how many thousand of socket it can handle in one second or one minute to know the throughput?

@N, TCP only? You're headed towards a small disaster. I can't know what product you are using and if your device's coder learned about this and took steps to mitigate it. We went UDP and our own protocol to competely avoid that issue plus drop communication costs.

Look at the byte count on a simple (?) open close of a TCP socket and then how may byte (cost) of the UDP send. Since our system is cost averse we needed UDP and then our own protocol. There are companies that go TCP and you see it affect the monthly fees.

-> About the testing. I see you have replies above and it appears you are in the market for more team members. In short (again) start testing. You have no CPU limitation from what I see from your top post.

@rpfroffitt yes I know tcp is kind of more costly cause we need to setup the communication etc. Before going further on testing I want know more the sequential buffering for tcp I am not so clear on that? Can you bried me more on that kind of confuse?

I know about testing need tips what and how to test? Kind of very lost here?

I see the testing tips were given so my advice is to craft up your tests now as best you can.

-> I want to broach another subject as well. We found that no single client went over a few thousand vehicles. This meant we could spread the load to other virtual servers and databases. Let me be clear.

NO CUSTOMER wanted their data mixed into a "central" database. They wanted their data to be their data. This is another factor in why you'll never get over a few thousand clients per database.

@rproffitt seriously I am still lost on the understanding of the sequential part when you say that at the socket.accept and how they get it running in multithreaded. The issue I am still not clear is the testing how to craft up the test still lost at these 2 points.

I agree this not a single client exactly is more like a semi govt application which has many mini customer and later we need to do some big data stuff on this data. That is the reason we need one single central database.

@n. There are so many priors about handling multiple connections that I won't broach that here. For such a big design, I think you need to do more research, code more, test more and talk to my contact on the system I worked on.

You could have left out a big secret like this was your homework, thesis or something else. For such a big system you never find a team of one.

@rprofitt yes I know there many prior to handling multiple connections e.g. bandwidth, servers etc. Yes I will be willing to talk to your contact or to you on personal email. For now I just want to do a load test and throughput test to find what will be exact server requirements when we reach that traffic and scale. So that is what to be exact I am trying to learn and understand.

I told you this not a team of one. We have designer, programmer, map guys, server guys etc. So definitely not a single person running is crazy no way to achieve it either.

@n, thanks for telling me about the team. I do think you may run into some awful issue with TCP/IP only. My contact went pretty deep why but for the most part it was all about cost. I saw issues with holding a TCP connection open and then the mobile client moving from tower to tower. I never heard if the OSes and carriers found a way to fix it so your apps don't have to handle that.