I have recently noticed that the group is working on socket level
changes to MPTCP.
Can a summary of the issues and how the changes will solve them be
posted to the list? That would help any new participants as well. If
they have already been discussed please point me to the thread.
It will also help if it can be made clear how these changes will
expedite implementation of a basic MPTCP implementation. I believe that
is the current goal, but my understanding may be incorrect.
Thanks,
Shoaib

Hello,
We just had our 4th meeting with Mat and Peter (Intel OTC), Christoph
(Apple) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
- discussion of Rao Shoaib's patches:
- Christoph posted some comments
- Peter has comments but mainly similar to the ones of Christoph.
Will try to post comments the coming days.
- Rao was not able to join, we hope we will be able to discuss
about reviews that were done for these patches (or a v2) next week.
- Netlink PM (Matthieu's patches)
* Discussion on missed events: it is very rare. A bit like loosing
UDP packets between apps on the same host. We can have a PM with
Netlink. Could be interesting to get stats of lost events.
* _CREATED vs. _ESTABLISHED: we would like to have a simple v1,
with only the minimum required. Not having the _CREATED event could be
OK, we would not announce ADD_ADDR ASAP but on the other hand, if the
client receives this while the session is not "ESTABLISHED", it cannot
create a second subflow.
* Goal: simple API that can be extended later.
* (lot of discussions about that but it was difficult to take notes
at the same points for me).
- lockless subflows (Christoph's patches)
- one more fix coming
- waiting for review
- Mat and Peter's patches:
- first design is there
- will simplify (a lot) the current implementation, at least on the
client side. The tricky side is the server side. Work will need to be
done on this area. Peter is looking at this.
- Server side: biggest issue: allocate socket for new (coming)
subflows. Subflows need to be attached to the MPTCP connection. Idea
(dirty) from Christoph is to pre-allocate resources.
Current way of doing that
(https://github.com/multipath-tcp/mptcp): resources are allocated at the
3rd ACK. Here we would need to do the kernel's accept() in usercontext
but also allocate and attach resources to the MPTCP connection. What to
do in the meantime? We cannot drop data. The subflow should be seen as
fully-functional even if it is not properly accepted/attached.
- Note about these patches: it is a RFC, we agreed in the past to
share patches as soon as possible to comment about the idea.
- waiting for review on the ML
- Mat's summary:
- clear
- was added in MPTCP's wiki
- netdevconf:
- Because David will not be there, maybe a tutorial is more
interesting. Goad: Why MPTCP is interesting, why netdev community should
look at, what we are doing in the upstreaming process.
- Maybe good to have a dedicated topic in the ML (but already a lot
of other messages there, that can wait)
- Tutorial: some ideas
- Android smartphones with MPTCP (but you will need SIM cards,
etc.)
- Raspberry PI with MPTCP: they have two ifaces now but maybe
hard to setup / have something working
- safer to have a VM?
- (last time, there were tutorial where people were only
looking, not acting → we can also show what's happening with some
embedded devices)
- could be good to have new ideas.
- deadline is May, we still have one month to think about that.
- Next steps:
- Our ML is not very visible. Wiki: we can use Github.
- We could move stuff to Github
(https://github.com/multipath-tcp/mptcp), wiki included.
- Could be good to have a wiki because there are already a lot of
discussions in the ML. Mat confirmed he can edit the wiki at
https://github.com/multipath-tcp/mptcp/wiki
- most of the next steps concern the continuation of the
discussions on the different topics already opened on the ML (mostly
linked to big patch-set)
- people will try to find time -- among other priorities -- to
review them.
Next meeting:
- proposition: the 5th of April at 16:00 UTC (9am PDT, 6pm CEST)
- open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20180405
Feel free to comment these points and propose new ones for the next meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
--
------------------------------
DISCLAIMER.
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.

Hello,
(first, sorry for the long e-mail)
I now started working on cleaning up the input path to prepare it for
input-processing without holding the MPTCP-level lock.
To do this, I go through all the places where we access data-structures from
the meta-socket and see if I can move it to mptcp_data_ready or
mptcp_write_space (the callbacks that are called from
sk_data_ready/sk_write_space).
In tcp_check_space() we have the following:
if (mptcp(tcp_sk(sk)) ||
(sk->sk_socket &&
test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))) {
tcp_new_space(sk);
if (sk->sk_socket && !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
}
We do the if (mptcp(...)), because currently subflow's sk_socket is pointing to
application's struct socket. Thus, we need to avoid the check for
SOCK_NOSPACE, as otherwise we might end up not calling sk_write_space.
Other kernel-modules that create TCP-connections rather have an in-kernel
struct socket. And modules like RDS even force SOCK_NOSPACE to be set, such
that the TCP-stack keeps on up-calling. I thought that this was a good thing
to do in MPTCP as well.
So, my goal became to have a fully functional struct socket for subflows.
The benefit is also that we can end up using kernel_sendmsg,
kernel_recvmsg,... in the future. It also allows to do kernel_accept() on
the MPTCP-level socket to receive new subflows (a problem I mentioned in an
earlier mail).
It also would allow us to expose subflows as file-descriptors to the
user-space. That way the user-space can do setsockopt, getsockopt,... on the
subflows. An idea that came up in the past when we were thinking on how to
expose an MPTCP API that allows apps to control certain things on the
subflows.
To get there, there are a few places where things would need to change:
* mptcp_init4_subsockets - Here, this works perfectly. It also allows to
avoid "faking" the struct socket, as we are currently doing.
* mptcp_alloc_mpcb for the active opener - This is the first problem. mptcp_alloc_mpcb() can be
called with bh disabled. But sock_create_lite() assumes that bh is enabled
as it ends up doing an alloc with GFP_KERNEL.
A few ways this could be solved:
- Schedule a work-queue item in mptcp_alloc_mpcb that creates the struct
socket. This looks a bit racy to me. Not sure what side-effects this
might have.
- Change things entirely, such that the master-sock is being allocated
when the connection is created. That way, we allocate all the necessary
struct socket's right away.
In the past, we decided to allocate the master-sk only when receiving
the SYN/ACK. We did that so as to minimize the impact on regular TCP
when the server does not support MPTCP. But, as we are moving towards
explicitly exposing MPTCP at the socket-layer, we can rethink that
decision.
Any thoughts? Is it ok to pay the cost of allocating a master-sk before
we know whether the server supports MPTCP?
I think, we should do this, and transition to that model.
* mptcp_alloc_mpcb for the passive opener - same problem as above but on the
other side. We could allocate the master's struct socket upon the accept()
call from the application. This again sounds a bit racy to me. The struct
socket will be there for the subflow potentially much later than it has
been established. What happens if the peer sends data or an
MP_FASTCLOSE,... ?
* New subflows on the passive opener side - again, we are receiving those
subflows while bh is disabled. So, we have to schedule a work-queue to
do a kerne_accept() on the MPTCP-socket.
Again something that can potentially be racy.
In general, subflow-establishment on the passive-opener side again seems to
be a major pain-point. I think, we really need to redesign that.
Any thoughts, feedback, suggestions?
Or maybe, using real struct socket for subflows is not worth it? :)
Thanks,
Christoph

Hello,
Our public MPTCP upstreaming weekly web conference is scheduled for
tomorrow.
The list of topics is here:
https://annuel2.framapad.org/p/mptcp_upstreaming_20180329
Feel free to add/modify topics!
Meeting link: https://talky.io/mptcp_upstreaming
Speak to you tomorrow!
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
--
------------------------------
DISCLAIMER.
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.

Hello everyone -
In our recent web conference call, we thought it would be helpful to
summarize what is coming up for MPTCP upstreaming. To start, I'd like to
review work that's already been done and is underway.
Previous work:
* Mat developed an extensible TCP option framework patch, and Christoph
refactored upstream TCP-MD5 and SMC functionality to use it. The idea was
to integrate some extensions in to TCP that had utility for existing
functionality, but would also help integrate MPTCP. This RFC patch set was
turned down by Dave Miller, but did give us useful guidance.
* Mat published a proposal for optionally extending sk_buff shared info to
carry MPTCP metadata.
Ongoing work:
* Christoph keeps the multipath-tcp.org kernel merged with recent kernel
releases as part of his maintainership role for the multipath-tcp.org
project, and is refactoring parts of the MPTCP implementation to take
advantage of newer kernel or TCP features and to align with upstreaming
goals.
* Rao has published an RFC patch set based on the multipath-tcp.org MPTCP
implementation and the current upstream kernel, with modifications to make
the TCP implementation extensible in a way that's useful for MPTCP. It is
currently under review by community members.
* Matthieu has sent a generic netlink patch set to mptcp-dev.
* Ossama shared a generic netlink path management API proposal. The Intel
group developed this before we knew Matthieu would be sharing an
implementation.
* Stephan shared a userspace path manager draft implementation at
https://github.com/brenns10/pathmand
* Peter and Mat are preparing a patch set showing an MPTCP architecture
with a separate socket type for the MPTCP connection and using in-kernel
TCP sockets for subflows. The extended sk_buff structure patch is part of
this.
* Ossama has made significant progress on a userspace path manager
implementation with the goal to open source it
That gives us quite a bit to discuss and review on the list in the near
term. The bigger and longer-term challenge is to take these pieces and
develop a strategy and patches for upstream submissions.
There are a few things the MPTCP upstreaming community members on this
list seem aligned on:
* MPTCP belongs in the upstream kernel
* The multipath-tcp.org implementation is not upstream-ready as-is
* Implementing 100% from scratch is not the preferred strategy
How do we get to upstreamable patch sets from here? I think if we can
agree on what an upstreamable MPTCP architecture looks we can evaluate
patch sets accordingly. I've proposed these design characteristics before,
but the mailing list has expanded significantly since then:
* MPTCP is used when requested by the application, either through an
IPPROTO_MPTCP parameter to socket() or by using the new ULP (Upper Layer
Protocol) capability.
* Move away from meta-sockets, treating each subflow more like a regular
TCP connection. The overall MPTCP connection is coordinated by an upper
layer socket that is distinct from tcp_sock.
* Move functionality to userspace where possible, like tracking ADD_ADDRs
received, initiating new subflows, or accepting new subflows.
* Avoid adding locks to coordinate access to data that's shared between
subflows. Utilize capabilities like compare-and-swap (cmpxchg), atomics,
and RCU to deal with shared data efficiently.
Are these the right place to start? Anyone want to expand the list?
Thanks,
--
Mat Martineau
Intel OTC

There has been some misunderstanding on what an RFC patch is and what
standards it needs to meet. Following is the official response of David
Miller.
Based on those guidelines the RFC patch we submitted meets and exceeds
the requirements. So please review it. Replacement of function pointers
will be addressed later as a separate issue.
In case there are other procedural issues that would prohibit technical
discussion please site the written rules or first ask David Miller.
There are a lot of technical issues to discuss, so if possible lets not
get tied up on process and bureaucracy. Looking forward to detailed
technical comments.
Shoaib
-------- Forwarded Message --------
Subject: Re: Few questions about submitting patches
Date: Wed, 21 Mar 2018 12:46:17 -0400 (EDT)
From: David Miller <davem(a)davemloft.net>
To: rao.shoaib(a)oracle.com
CC: netdev(a)vger.kernel.org, eric.dumazet(a)gmail.com
From: Rao Shoaib <rao.shoaib(a)oracle.com>
Date: Wed, 21 Mar 2018 09:41:13 -0700
> I am new to Linux. I would like to understand the rules and etiquettes
> of engaging with the community. I have read the materials that I could
> find. As I work with Linux I come across situations for which I can
> not seem to find any answers. Hopefully folks on the list can answer
> them.
>
> * Submitting an RFC Patch
>
> As I understand, an RFC patch is submitted to solicit comments and is
> not for inclusion. Is it sufficient for an RFC patch to have the
> correct coding style and compile, or does it need more? For example,
> If the patch consists of a series of patches, does each patch have to
> compile independently etc etc.
It should build and function, unless you explicitly state that the
patch is not build nor functionally tested and is intended to show
the design of the change.
> * #ifdef FOO
>
> In a regular patch consisting of a series of patches, can the above
> #ifdef be used in a patch before the patch that allows the selection
> of FOO. That patch is part of the series but comes later.
It is better to introduce them at the same time.
But if it is prohibitively difficult to do so, yet at the same
time properly split up your changes into manageable pieces, it
can be OK.
It is definitely determined on a case by case basis.