Hi all,An outline of what I have is....An RF network of RFM12 nodes with 1-wire temp sensors and an RFM12-enabled 4 output mains power switch (for lights or whatever) . These communicate to a 'gateway' that collects the info from all the RF nodes and deals with command/responses for the power-switch box. The gateway links to an Ethermega by RS485.

On the Ethermega, I run a webserver that displays some of the temperatures and lets you turn on/off the 4 power outputs.Also on the Ethermega, a client logs the temperatures to both nimbits and thingspeak (every 60-sec for Thingspeak, 90-sec for nimbits).On startup, the time is retrieved from a NTP timeserver and each midnight thereafter.

What I notice is that when using the client connect with a name lookup, ie cloud.nimbits.com, NOT a fixed IP, the client side of the Ethermega eventually hangs up. This can take up to 6-hrs or even after 20-min , but the server side continues to work when the client side stops.

I am currently trying the client side (logging) with fixed IP addresses (which is not a long term solution) and at the same time the ethermega server is being worked by having a autorefresh request from firefox every 10 sec.Since moving to fixed IP addresses for the logging, the system has been stable for 24 hours (longest ever!) now of logging to two sites and serving up a webpage every 10 seconds without a hitch ( over 8600 page refreshes).

While trying to investigate this I came across a Wiznet errata document (W3150A+/W5100 Errata Sheet) on the W5100 chip about possible errors relating to ARP requests but I am a very inexperienced at C and I lack the understanding to figure out if the fix for the errata has been included in the ethernet library, or even if the client failure while using lookups could be caused by this so I'm hoping for someone with better C/C++ understanding who has looked into this and may be willing to share their experiences and understanding.

I include the logging code here so perhaps someone may spot something I'm doing wrong. It is only a section of the complete code.

When the client side accumulates 5 successive failures, the whole system is reset by the WDT. I have never had a problem with the ethernet not coming back after a reboot.

Is there any way to check how many W5100 sockets are currently available?

Uses the PString.h library.Instead of making many connections to the server using the usual Arduino ethernet functionslike 'client.print' (every statement is a separate trasnaction), all the data to send in the POST trasaction is 'assembled' into one large string using the Pstring library, which then gets sent in one hit. This reduces the IP traffic considerably*/

str.begin(); // reset the into-string pointer. This is the 'main' string being assembled. cont.begin(); // and for the actual content ie payload string

// Create the 'content' of the string to send. Its assembled from the user details, then the sensor data// 1st part of content is access details cont.print("email=XXXXXXXXXXXXXXXX&key=YYYYYYYYYYYYYY"); // store in the string 'cont'

You're referring to erratum 2/3 which deals with ARP traffic. As ARP traffic is also used in non-DNS requests I don't think this is the source of your problems. But erratum 1 is a possible source. It describes a race condition where the reception of a UDP datagram occurs almost simultaneously with the sending of a packet. The code of the Ethernet library only check for the SEND_OK bit in the interrupt register, which may never happen (at least the erratum says that). The recommendation in the errata document is NOT in the Ethernet library yet.

I'm not sure if your problem is related to that erratum, because unfortunately the erratum says nothing about the TIMEOUT interrupt. If the TIMEOUT occurs as specified in the datasheet, the code in the Ethernet library is correct.

Thanks for your input I have tweaked and adjusted and tried many things but the end result is the same which is after one or two days, it locks-up.To keep things happening, the watchdog reboots the board and it all seems happy again. Its transparent to the user, the only way I know its happened is an uptime timer on the web-page.Perhaps I'm just trying to do too much on the one board..NTP lookup, web-server for control/status and client for posting logging data.I just dont understand C or C++ well enough to dig deep down. In time perhaps.....Meantime, I have written a server and client for the wiznet in assembly language. Its a fraction of the size, but not as comprehensive. DNS and NTP still to do....Driving the W5100 at register level has been quite educational. I certainly learned a lot about the structure of web transactions.With the low-level access, its easier to see whats going wrong. This highlights the lack of useful diagnostic and status info that is 'missing' from the ethernet library, which is basically a go/no-go affair but nothing available to show whats going on in the background, or at what point something failed.

Now that I know a lot more about the chip, I can have another try at the C-code and see if it makes more sense.

Your problem seems to be a bit like mine, but I'm not too sure about the racetrack cause. I'd like to know how you get on with coding for the 5100 chip.

My set up is a Uno and ethernet shield with sd card and relay outputs. The sketch converts the resistance of Analoug PT1000 sensors into temperatures which then set the relays. Every 10 mins all the data is dumped to the SD card for record purposes. and a Web server section allows this data to be viewed over TCP/IP on another computer, It is alos possibe to alter the thermostat settings from the pc. Also the clock is set by NTP via UDP at startup and then suposedly once a day ther after. all the bits of the programme work (Thermostat settings, A/D conversion, Data to SD card, SD card to Web page, NTP update) but when they are all together somthing causes a crash after a few hours.

I've looked at Power supplies, SD card formatting, SD card maloc bug, EthernetUDP memory creep, and FreeRam()) nothing seems to make any diference, and its been driving me crazy for 6 months.

Hi book_woorm,I have never been able to find a definitive answer but I have noticed that the length of time before lockup/crashes is maximised by not using DNS lookups so I use hard coded IP addresses.Its not good as a long term fix but it makes a crash rate of hours turn into days.The only section I still use a lookup is once every 24hrs I do a lookup for a timeserver at oceania.pool.ntp.org. If my home router had a timeserver function I could even hard-code that IP addr as well.I have read many theories about the reasons for this, lack of memory often seems to come up as a possibilty. (have you seen the Goldilocks board?, 16K ram!)To get around my lockups, I use the watchdog timer to reboot the board. It always seems to come back up OK.On my Arduino server, I have an uptime timer (days/hrs/min/sec) so when I look at the page I know how long its been since its last reset.What I dont differentiate (yet) is whether a reboot is due to an ethernet lockup or a string of five consecutive logging-service post failures. I will remedy that soon.It seems that the logging sites can miss quite a few posts in busy times, at least thingspeak can, nimbits seems more reliable in that sense.This is only my personal experiences.One thing you are doing that i dont is writing to the SD card. I just read from one. I'm not sure if writing uses more ram.Good luck with your coding, I hope you find some answers Stewie

// if more than 10000 milliseconds since the last packet if(connectLoop > 10000) { // then close the connection from this end. Serial.println(); Serial.println(F("Timeout")); client.stop(); } // this is a delay for the connectLoop timing delay(1); }

Hi Stewie I was going to go down the 'watchdog timer' route thinking that the one second drumbeat on the interupt would stop when the programme hangs. Using the drumbeat to continuosly re trigger a 555 monostable is simple enough though it would involve a new master PCB for the system, but Ive discovered the Web I/O can hang by itsself and the drum beat caries on dutifuly measuring temperatures and recording data. Other times the data recording stops but the Web server functions don't It all seems to vary with howmany Serial.print statments I've put in a particular version tyring to track the problem. That smacks of memory overload but FreeRam() is returning between 630 and 750 bytes depending on where in the programme I ask the question.

Thanks to SurferTim for the 'time out' code I'll try that when the current test falls over.

Here is the original test of the timeout. Almost a year ago. I did not find the bug, just provided the fix after it was pointed out to me.http://arduino.cc/forum/index.php/topic,102879It may not be your problem today, but it isn't really a matter of "if", only "when". The fails that happen once every couple weeks or months are the tough ones to find.

Thanks Tim,I have incorporated your timeout into my code and will see if it makes a difference.To date, about 3.5 days of uptime is my best. I'll be interested to see if this now changes (my fingers are crossed....)Stewie

sketch_mar25a.ino: In function 'void do_weblog()':sketch_mar25a:17: error: 'lastCloudTime' was not declared in this scopesketch_mar25a:17: error: 'postingInterval' was not declared in this scopesketch_mar25a:18: error: 'line' was not declared in this scopesketch_mar25a:19: error: 'showTimeDate' was not declared in this scopesketch_mar25a:20: error: 'showRunTime' was not declared in this scopesketch_mar25a.ino: In function 'void sendData()':sketch_mar25a:34: error: 'str' was not declared in this scopesketch_mar25a:35: error: 'cont' was not declared in this scopesketch_mar25a:44: error: 'GetTemperature' was not declared in this scopesketch_mar25a:45: error: 'temptemp' was not declared in this scopesketch_mar25a:64: error: 'EthernetClient' was not declared in this scopesketch_mar25a:64: error: expected `;' before 'Logging_client'sketch_mar25a:65: error: 'Serial2' was not declared in this scopesketch_mar25a:67: error: 'Logging_client' was not declared in this scopesketch_mar25a:88: error: 'bufindex' was not declared in this scopesketch_mar25a:97: error: 'content' was not declared in this scopesketch_mar25a:101: error: 'content' was not declared in this scopesketch_mar25a:104: error: 'content' was not declared in this scopesketch_mar25a:105: error: 'failcount' was not declared in this scopesketch_mar25a:109: error: 'failcount' was not declared in this scopesketch_mar25a:119: error: 'failcount' was not declared in this scopesketch_mar25a:125: error: 'Logging_client' was not declared in this scopesketch_mar25a:129: error: 'failcount' was not declared in this scopesketch_mar25a:132: error: 'WDTreboot' was not declared in this scopesketch_mar25a:135: error: 'lastCloudTime' was not declared in this scope

Please post technical questions on the forum, not by personal message. Thanks!

Its static IPAlso, In my attempt to reduce the potential of memory leaks/allocation problems I made almost everything global scope variables/arrays while monitoring free memory to see if it was getting eaten up by something. Bad practice from what I understand but if it nails everything down and removes a possibility then I can live with it for now. Currently 4473 bytes free.Strings in Flash, static IP addresses except for a NTP access on boot and every 24hrs.

- I just noticed that your timeout code just did its thing on a Nimbits post . I'm watching the status as I tinker...