I have been developing, as a part of my undergrad course requirement, a toolbox for the Hilbert-Huang Transform which is a powerful tool for nonlinear and nonstationary time series analysis.

The crux of the method is a feature extraction algorithm called the empirical mode decomposition (EMD) which decomposes any real signal into what are called intrinsic mode functions (IMFs), which by definition have well behaved time scale information.

Unfortunately the process is almost entirely algorithmic and lacks a rigorous mathematical treatment. Most complex signals can be broken down to a number of IMFs. I am particularly interested in investigating which of these numerous IMFs are suitable for specific applications - like feature extraction for time-series analysis, or even machine learning - and their properties.

I intend to approach the EMD problem from a machine learning perspective, and to develop an EMD toolbox with all its variations, different stopping conditions and algorithms so that users can compare results obtained on the same data through different variations of the programme, and select the best results.

I think this is a good idea. Also HHT promises a lot. But be careful, we currently do not have any support for time series analysis in Orange. So first is important that we get some basics in (reading, writing, filtering, some basic spectral/periodic analysis). And then it would be great to have some such tool as you have described to extract features we could then combine with existing tools we already have.

So your proposal should then be two-part. First implement some basic tools. Then implement HHT as a feature extraction tool. I do not know how time-consuming all this you think would be for you. So decide based on this how in depth you want to go.

For us it is important to get at least some basic support. So that later on we can start adding specialized things for time series analysis. And one such, like HHT, could be a good start example for later.

I'm quite well versed with Python (especially NumPy, SciPy) and I guess I will pay primary attention to basic tools. Can I get more info about how sophisticated you want the basic data i/o or filtering operations to be? Once I get an idea I think I'll make this my first deliverable.

I've already implemented a number of HHT apps in Python - and I'll make it my second deliverable to optimize them for large datasets and merging it with Orange.

I do not know, what I want, because I do not know how much time it will take for you. So some displaying of data would be nice. (I have more experience from DSP, so I will talk more in that terminology.) So some visualization (like for audio: waveforms, where you can move around, select some subset, spectral analysis, through time also).

The one important feature of Orange is that it is modular. So one widget/component can send some data out and get some data in. So when you read data in, you send then data of for example type time-series-value or something (what we will think about). And then some other widget/component takes that and do something with that. So when I was thinking about filtering I was more thinking about being able to select a subset of data (like selecting a timespan of the signal) which will then be output out of that widget/component and some other tools will then operate only on that segment.

Of course you can also implement some filters in general, convolution and so on. So that we could also transform signal and/or filter in frequency space. I do not know. All this is very interesting and it is hard to choose from. So I will let you decide.

For now I would not go into optimization. You can do that later, after GSoC. I would like first to have working things, because optimization can take much much time. And premature optimization is the root of all evil. As Orange is written in Python, it is a question what will be really reason for its speed/slowness. Maybe not the algorithm itself.

I would suggest that you try Orange and see how widgets/components work together. For our current example/feature-based datasets. Then you will maybe easier understand what would be interesting for us to so something similar also for time-series datasets.

And then some translation between those.

Also do not forget about scripting part. Orange canvas/GUI/widgets is just a nice polish over real machine, that is the scripting part. So components/API should be also well defined there.

So I would like mostly to have good API, well designed and reusable components. Much more then 1001 features which will not work well together. Because if you have well defined backend, it is easy to add new features later. If it is vice-versa, it will be problem.

So the questions are: how to integrate time-series in Orange. How to pass data efficiently around in Orange canvas/GUI. I would like (as a mind test or maybe even real test) to be able to make Orange read an audio file, give it through some widgets to process it in some way, and then playback it out. Is this possible? Is current system of passing data between widgets capable of this? Does we have to extend it?

So all those questions are those you will have to answer. And this is why I think basic features will be more than enough.

You can also do some other than signal processing tools. Those are those I am a bit familiar with. But you can also do some basic statistical things (mean, deviation and so on, basics, should probably really be in there, but for those we could just translate time-series data into example-based data and use existing tools), also some basic statistical prediction. I do not know what exists there but I believe some things are really basic. So we want them.

You should definitely go ahead with the application. You should not wait for the last moment. You can also submit now and until the deadline improve on your proposal/application.

I am in general against reviewing applications before deadline because then I believe I should do this for all students which is quite impossible. I see the process of writing a proposal also a part of the test if you are suited for the task, as you have to do your own research and your own decisions based on it. Furthermore, we have to have something to base our scoring on and it is a bit strange if we base it on a proposal already checked and helped on by the mentor himself.

But, if you have some concrete questions, feel free to ask. But such generic questions as if the proposal is OK or not are simply too generic.