tap0 device stopped working in 2.6.36 (ok in 2.6.35)

Previous thread: by WESTERN UNION OFFICE. on Saturday, October 23, 2010 - 4:09 am. (1 message)

Next thread: Warning! by University of Colombo on Wednesday, December 31, 1969 - 5:00 pm. (1 message)
From: Jim
Date: Saturday, October 23, 2010 - 5:55 am

My tap0 device stopped working with 2.6.36, seems it couldn't
send any packets anymore (receiving was still fine).

After some checking noticed that apparently link isn't ready:
 modprobe tun
 tunctl -b
 ifconfig tap0 192.168.20.1 up
Gives:
[   26.411932] ADDRCONF(NETDEV_UP): tap0: link is not ready

Bisected it all the way to this commit:

=================

# git bisect good
bee31369ce16fc3898ec9a54161248c9eddb06bc is the first bad commit
commit bee31369ce16fc3898ec9a54161248c9eddb06bc
Author: Nolan Leake <nolan@cumulusnetworks.com>
Date:   Tue Jul 27 13:53:43 2010 +0000

    tun: keep link (carrier) state up to date

    Currently, only ethtool can get accurate link state of a tap device.
    With this patch, IFF_RUNNING and IF_OPER_UP/DOWN are kept up to date as
    well.

    Signed-off-by: Nolan Leake <nolan@cumulusnetworks.com>
0d144f138fe93ffbe3da7ce31951855c60b51510 M      drivers

===================

Apply-ing this patch on top of vanilla 2.6.36 makes the tap device
working again for me (strangely in the function __tun_detach):
--- tun.c.ORIG  2010-10-21 18:08:12.404276662 +0200
+++ tun.c       2010-10-23 14:22:58.056366365 +0200
@@ -163,7 +163,7 @@
 {
        /* Detach from net device */
        netif_tx_lock_bh(tun->dev);
-       netif_carrier_off(tun->dev);
+//     netif_carrier_off(tun->dev);
        tun->tfile = NULL;
        tun->socket.file = NULL;
        netif_tx_unlock_bh(tun->dev);

====
Strangely that's in the function __tun_detach, it appears the functions
do the opposite of what is expected, when deleting the the tap0 device
it becomes ready!?

# tunctl -d tap0
Set 'tap0' nonpersistent
[ 1000.945790] ADDRCONF(NETDEV_CHANGE): tap0: link becomes ready

So to me it seems the netif_carrier_on / netif_carrier_off from the
commit should be reversed ??


_
Jim
--

From: Nolan Leake
Date: Saturday, October 23, 2010 - 12:39 pm

Hello Jim,

Thank you for the report and the bisect.  Please allow me to explain the
intention of this patch.

Previous to this patch, tun.c only kept the ethtool link state up to
date.  IFF_ and IF_OPER_ state were always RUNNING and UP, respectively.

Ethtool link state was (and is) controlled by mapping "tun device FD
open" to link up, and "tun device FD closed" to link down.  Obviously if
you've just used tunctl to create a tap device, and no process has yet
opened the /dev/net/tun backing FD, then this method establishes the
link as down.  Ethtool on a kernel that predates this patch will confirm
this.

What this patch does is make the IFF_RUNNING/IFF_OPER_UP state also
match this interpretation of link state.  Making RUNNING and OPER_UP
consistent with ethtool's concept of link state is, I believe,
consistent with how other ethernet devices work.  The presence of a
process that is sending and receiving packets via the tap device is a
decent analog of link-state for a physical ethernet device.

Could your use-case be solved by a udev rule that assigns the IP address
when the link state changes to UP/RUNNING?

If this is a common way to use tap devices, one possible solution is to
make newly created but unattached tap devices default to UP/RUNNING (and
presumably ethtool link-up, for consistency), and then only begin
accurately reporting link state for subsequent open/closes of
the /dev/net/tun device.

- nolan
--

From: Jim
Date: Sunday, October 24, 2010 - 2:59 am

Nolan,

Thanks for explaining the purpose of the patch.
But it appears something is missing and I think it breaks current
userspace. I use this tap0 device together with VirtualBox, I have a
virtual machine setup as bridged to tap0, not a very odd or strange
setup (this used to be the only method).
On the host side I run dhcpd to hand out IP address to the virtual
machine, but despite the dhcpd running on the tap0 device it never got
'ready' in the sense that no IP packets made it out from the host to the
guest.


--

From: Nolan Leake
Date: Tuesday, October 26, 2010 - 6:18 pm

To make sure I understand the situation, is this correct (ignoring the
exact names of the interfaces):
br0 bridges between eth0 and tap0, and you run dhcpd on tap0?

Since tap0 is part of the bridge, I think dhcpd should be running on
br0.  Does that work?
--

From: Jim
Date: Wednesday, October 27, 2010 - 9:09 am

Not exactly, VirtualBox calls it "bridged adapter", it 'bridges' the
guest machine to the tap0 interface on the host for so called host-only
networking.
See eg. http://forums.virtualbox.org/viewtopic.php?f=1&t=165

And this sequence is now simply failing
  tunctl -t tap0 -u tuxuser
  ifconfig tap0 10.0.0.1 up

Jim
--

From: Nolan Leake
Date: Wednesday, October 27, 2010 - 10:48 am

OK, so you have the tap0 device, and you assign an IP to it and run

The link is not ready until some process has attached to the tap device.
tunctl simply attaches and then immediately detaches, leaving it
link-down until the virtualbox process starts and attaches.

But this doesn't cause the problem for me!  I suspect that is because I
am running an ipv4 only kernel; the "ADDRCONF(NETDEV_UP): tap0: link is
not ready" error comes from net/ipv6/addrconf.c.

I have no idea why ipv6 vetos the upping of a link-down interface, while
ipv4 doesn't care.

If this is all intended behavior, then I guess I'll need to make the old
"tap devices are always link-up" mode the default, and add a way for
newer software to opt-in into correct link-state reporting.

David (CC'd), could you comment on this?

Thanks,
Nolan
--

From: David Miller
Date: Wednesday, October 27, 2010 - 10:52 am

From: Nolan Leake <nolan@cumulusnetworks.com>

If ipv6 cannot send multicast packets for neighbour and router
discovery, which it must do in order to function properly over the
device, the interface is unusable.
--

From: Nolan Leake
Date: Wednesday, November 3, 2010 - 4:10 pm

Jim,

Could you do me a favor and try this sequence:
  tunctl -t tap0 -u tuxuser
  <run virtualbox such that it attaches to tap0>
  ifconfig tap0 10.0.0.1 up

Thanks,
 - nolan
--

From: Jim
Date: Thursday, November 4, 2010 - 1:17 pm

Doesn't really work, virtualbox complaints loudly the tap0 device is
down, ignoring that after virtual machine is up-n-running, the
	ifconfig tap0 192.168.20.1 up
still gives  ADDRCONF(NETDEV_UP): tap0: link is not ready

Using wireshark I see guest machine is sending dhcp request but no
message is sent out from the host as before.

--

Previous thread: by WESTERN UNION OFFICE. on Saturday, October 23, 2010 - 4:09 am. (1 message)

Next thread: Warning! by University of Colombo on Wednesday, December 31, 1969 - 5:00 pm. (1 message)