robillard - it's a misunderstanding; DanKegel was actually responding to me... and i erroneously stated that i had taken the http submit code from that python project for my script, when in fact i pulled it from enphase-output.pl.

Comment

BTW, you probably know this already, but in my travels on this project I discovered that DateTime is _super_ slow at constructing instances if you pass in the string "local" for the timezone (which I do to convert from UTC time to local). My script significantly sped up (by an order of magnitude or more!) once I cached the DateTime::TimeZone object corresponding to "local", and re-used that cached value. Granted, I'm running on weak hardware (an rPi), but even on a studlier system, I've measured the cost of that code (without caching), and it is significant. Since we're doing lots of these conversions (or at least I am), the cost can quickly dwarf some of the other processing. Once I started using the cached timezone instance, I'm back to a very quick turn-around for my polling code.

Also, I'm curious: are you seeing any kernel buffer overflows when using Pcap in perl? I'm curious what happens if the packet callback that Pcap calls does not return in a timely manner, are packets just dropped if the buffer overflows?

Comment

robillard - it's a misunderstanding; DanKegel was actually responding to my message and apparently confusing the two of us... and i had erroneously stated that i had taken the http submit code from that python project for my script, when in fact i pulled it from enphase-output.pl.

Sorry for the confusion!

And for the record, I wasn't trying to get robillard to be a github contributor; I was just linking to this thread as documentation.

Comment

BTW, you probably know this already, but in my travels on this project I discovered that DateTime is _super_ slow at constructing instances if you pass in the string "local" for the timezone (which I do to convert from UTC time to local). My script significantly sped up (by an order of magnitude or more!) once I cached the DateTime::TimeZone object corresponding to "local", and re-used that cached value. Granted, I'm running on weak hardware (an rPi), but even on a studlier system, I've measured the cost of that code (without caching), and it is significant. Since we're doing lots of these conversions (or at least I am), the cost can quickly dwarf some of the other processing. Once I started using the cached timezone instance, I'm back to a very quick turn-around for my polling code.

Also, I'm curious: are you seeing any kernel buffer overflows when using Pcap in perl? I'm curious what happens if the packet callback that Pcap calls does not return in a timely manner, are packets just dropped if the buffer overflows?

well, i haven't profiled anything but nothing runs unless a packet matching the filter is seen, and message 130 only comes every 5 minutes, so... seems like it's not low-hanging fruit for me?

would these buffer overflows show up in dmesg? to my knowledge every time samples are missing in PVOutput, when i look at my log i can see that the supervisor has started replaying 130s for one reason or another, eventually leading to a 130 that's in between fragments, which i then can't parse. i really should fix that.

Comment

well, i haven't profiled anything but nothing runs unless a packet matching the filter is seen, and message 130 only comes every 5 minutes, so... seems like it's not low-hanging fruit for me?

Yeah, my script is actually kinda dumb, in that it first filters out all the 130/140 messages, then converts them to local time, and only then checks to see if they match the desired date-time of the next message... Also, the script is decoupled from the packet sniffer, so if the script dies for a higher-level logic issue (like 130 showing up without a 140, which I've now seen twice, and is resolved by simply waiting for it to be re-sent), when I finally get around to re-spawning it, it has a lot of messages to get through to replay the activity. Combine that with the supervisor's proclivity to re-send previously-sent data out of order, and there is the potential to be doing a lot of these conversions.

But yes, a well-written stateful script should not have to deal with this.

would these buffer overflows show up in dmesg? to my knowledge every time samples are missing in PVOutput, when i look at my log i can see that the supervisor has started replaying 130s for one reason or another, eventually leading to a 130 that's in between fragments, which i then can't parse. i really should fix that.

I don't think they'll show up in dmesg. I think the pcap buffer, once it is full of data passed from the kernel to user space, is effectively a ring buffer, i.e. if the user-space buffer is not serviced in time, and the buffer becomes full, new data will simply overwrite the eldest pre-existing data. This is one of the reasons why I've decoupled the traffic sniffing (which captures to a file, rotated and compressed) and processing (which then reads the file and even possibly the compressed archives, although the archives only actually comes into play if the processing script has died and I don't get around to restarting it for a while).

Another reason to decouple them is that the actual pcap process must run sudo (for promiscuous mode), whereas I don't want the entire script running sudo (too many potential points of vulnerability/exploitation).

Comment

I don't think they'll show up in dmesg. I think the pcap buffer, once it is full of data passed from the kernel to user space, is effectively a ring buffer, i.e. if the user-space buffer is not serviced in time, and the buffer becomes full, new data will simply overwrite the eldest pre-existing data. This is one of the reasons why I've decoupled the traffic sniffing (which captures to a file, rotated and compressed) and processing (which then reads the file and even possibly the compressed archives, although the archives only actually comes into play if the processing script has died and I don't get around to restarting it for a while).

Another reason to decouple them is that the actual pcap process must run sudo (for promiscuous mode), whereas I don't want the entire script running sudo (too many potential points of vulnerability/exploitation).

i see, you mean that all packets are getting dumped into the ring buffer at line rate and so regardless of the rate of the traffic you're interested in, the ring buffer could overflow? i don't think i've ever seen corruption that was not tied to the lack of packet reassembly... my machine is not exactly beefy either.

Comment

I'm now seeing another interesting phenomenon... Every day, somewhere between 9:30-11am, I'll see this single, solitary blip in the production numbers reported with the 130 message. For one interval, the number is lower than expected (based on the previous data points) and the for the very next interval (only), the number is higher, by the same amount. If I normalize the two, the data points fit perfectly within the curve of the neighboring data points.

At first I thought perhaps there might be something physical, like a reflection off of a window or something, but the values are not closely-enough temporally correlated for that to make sense, i.e. they do not happen at the times that are near enough from one day to the next for that to make sense.

Also, if I look at my net metering (140 messages), those do not also experience this blip, as one would expect. To the point, in fact, that the consumption (net + production) sometimes ends up going negative during the down part of the blip if I trust the production numbers.

Anyone else seeing this? (You can see this just in the SunPower monitoring, you do not need any special monitoring like is described by this thread...)

Yeah, pulling the data from SunPower is easy, if you look earlier in the thread you'll see that that's what we originally did.

However, this approach has some serious drawbacks:
a) there is a very large latency between when the supervisor sends data, and when it is available from the server
b) the server rounds all data points significantly, so you end up with progressive introduction of error due to this loss of precision; this may not sound significant, but it really is some serious quantization of the data; if you'd like I can post images of the graphed data from the SunPower server data vs. the actual data from the supervisor, to demonstrate this...
c) when there is lost data (due to internet outage, server outage, etc.), the server will not catch up for a very long time, and you'll have "holes" in your data
d) every 6 months you need to refresh your client UUID by manually re-logging into the monitoring server, and snarfing the URL they use to present the graph data

By monitoring the traffic directly, I eleminate all of these drawbacks. I've been very happy with the result.

But yes, pulling the data from the SunPower monitoring site is significantly easier, if you're willing to put up with the drawbacks listed above... And for the first 9 months, that's what I did for my PVOutput cross-poster and my local monitoring serverette...

Comment

Hey robillard - any chance you'd be willing to share your code? I'm impressed by what I'm reading you've done in this thread and I'm interested in doing the same. Would save me some time to have a starting place rather than reverse engineering everything like it seems you've already done. Please drop me a line at ehampshire <at> gmail. Thanks!

Comment

Hey robillard - any chance you'd be willing to share your code? I'm impressed by what I'm reading you've done in this thread and I'm interested in doing the same. Would save me some time to have a starting place rather than reverse engineering everything like it seems you've already done. Please drop me a line at ehampshire &lt;at&gt; gmail. Thanks!

The hard part is setting up the hardware for sniffing the traffic, but from a software perspective, it is not all that hard, once you understand how the Net::Pcap stuff works.

I cannot easily extract my processing code from the overall script to give you a ready-made solution, because of how I (mis-)architected my solution, as I've de-coupled the traffic sniffing from the data processing, but I'll give a shot at the flow below...

Note that you'll have to run the script sudo, since it needs to put the interface into promiscuous mode. This is one of the (several) reasons why I de-coupled the traffic sniffer from the data processing: I didn't want my entire script running sudo...

So once you've got the traffic echoing to a port on your machine, I'm doing something like this to capture the packets (error handling removed for readability):

[CODE]

use Net::Pcap;
use NetPacket::Ethernet;
use NetPacket::IP;
use NetPacket::TCP;
use NetPacket::UDP;

(One of) the (many) trick(s) involved herein is how you deal with calculating consumption from net and production... They do not necessarily arrive in the same packet (although they often do) nor does production (130) always arrive before net (140).

Because my process of analyzing the information is decoupled from the actual collection of the data, all I do in the process_packet_payload function (i.e. at scan time) is to append the data to a (rotating) file on disk. Then my decoupled data collection code comes along and scans that data for matching entries. It can thus also calculate deltas from the last collected data sets without having to maintain state in the script (i.e. "$lastprod" and "$lastnet", or more likely a hash of recent prods and nets, in order to deal with discontinuities or out-of-order delivery). Each time my data collection code runs, it determine what the last stable time was (for which we've collected both valid production and net (and consumption) information), and see if it can find values for the next time slot for both production and net, and if so, it can record those values and those become the last stable collected time and values.

I hope this helps...

(And the formatting for the code does not seem to be coming through properly, despite me using CODE tags, so I apologized for that as well...)

Comment

That's great info! Thanks! Now to brush up on my Perl.... any chance you'd share your processing script too? Also, I see where you would write to a file, but not what you are writing. I might modify to write to a mySQL DB eventually, but I'm fine with flat files initially.

Comment

That's great info! Thanks! Now to brush up on my Perl.... any chance you'd share your processing script too?

Well, not easily, no. But it's pretty straightforward, it's just a while loop that sleeps for 2 minutes, and each time it awakens, it scans the raw sniffed data, filters out only data from the last collected date/time onwards, processes that data to convert from raw lifetime net and production values into interval production and consumption values (which is what I really want to know), and records those in file(s) (see below), then goes back to sleep.

Also, I see where you would write to a file, but not what you are writing. I might modify to write to a mySQL DB eventually, but I'm fine with flat files initially.

During traffic sniffing, I actually just write the raw 140 and 130 lines to a separate set of sniffed-data files as raw text.

During my decoupled post-processing, I just append lines to flat files. Each file is a .csv file with columns for:
- date/time
- consumption (kW)
- production (kW)
- lifetime net (kWh) # this is basically the raw value reported by the 140 message
- lifetime production # just as this is the raw value reported by the 130 message

I keep the .csv files per day, as well as summaries for daily, monthly, and yearly. I keep a directory structure mirroring the dates, i.e.:
2016/12/23.csv
2016/12/daily.csv # contains summary total entries for each day of that month
2016/monthly.csv # contains summary total entries for each month of that year
yearly.csv # contains summary total entries for each year

Because I keep the daily/monthly/yearly, I detect when I pass over midnight, and when I do, I sum up the data for the day and append to the daily. If I'm also passing over a month-end, then I sum up the daily.csv, and append the result to the corresponding monthly.csv. If I'm also passing into a new year, I sum up monthly.csv, and append the result to the corresponding yearly.csv. I could simply calculate all of that on the fly, but hey, I'm running my server on a raspberry pi, so it's better to calculate in advance once, and cache the result, rather than re-calculating each time I need the data.

And yes, a db probably is a more efficient way to store the data, but this was easier at the time, and is human-readable, resilient against corruption, and text formats will outlive db formats...

Comment

Well, SunPower finally fixed my PV Monitor so I could actually start looking at this data. Took them 3 weeks, but finally got up and running yesterday. So, I'm now doing a combination of tcpdump for the raw data and getting your Perl working. Not sure if it was cutting/pasting or what, but your code needed some tweaking to get running. I'm now re-acquainting myself with Perl and trying to write to a file after proving I could display things to the screen. I'll be glad to share what I have going so far if people are interested. I am planning on fleshing things out more and making a github project or something to share this with others so people who don't necessarily know how to code can work with it.

One problem I'm seeing at the moment is I think my PV monitor is using my wireless network instead of the networking power adapter I'm setup to sniff the traffic on. =( Also, this is hard to do during winter when the sun isn't shining much, let alone hack on it after dark.

I had a question about your regular expressions. I'm terrible at regex, it confuses me. I did have to modify your regex to look for 2017 in the timestamp, but I'm wondering if there's a better way to write the regex so it doesn't have to be modified yearly. My slightly modified section of your code:

Anyways, I could use some help deciphering some of the data I'm seeing from the raw tcpdumps. I would like to keep track of individual panel performance (something the SunPower portal doesn't let end-users like us do). I can see each of my panels reporting on the 130 lines (I have 24 panels).
In terms of looking at the raw data, here's an example:

It looks like each of my panels are reporting on the 130 lines, so I can break those out into my own DB / web portal. I'm having a hard time deciphering the rest of the lines, though. I believe the 1st number (ie. "18.5466" in the first 130 line) is the Watts, the 2nd number (ie. "0.0417") is the kWh, but those are mostly guesses. I get lost after that as to what the rest of the values are.