Extending netlink

[Posted April 12, 2005 by corbet]

The netlink mechanism implements a special sort of datagram socket for communication between the kernel and user space. Most of the users of netlink are currently in the networking subsystem itself - netlink protocols exist, for example, for the management of routing table entries and firewall rules. Netlink is also used by SELinux and the kernel event notification mechanism.

Use of netlink is relatively straightforward - for kernel developers who have some familiarity with the networking subsystem. To be able to communicate via netlink, a kernel subsystem must first create an in-kernel socket:

    struct sock *netlink_kernel_create(int unit, 
                         void (*input)(struct sock *sk, int len));

Here, unit is the netlink protocol number (as defined in <linux/netlink.h>), and input() is a function to be called when data arrives on the given socket. The naming of unit dates back to an early netlink implementation, which worked with virtual devices; unit was the minor number of the relevant device. The input() callback can be NULL, in which case user space will not be able to write to the socket.

If there is an input() callback, it will be called whenever data arrives. That data will be represented in one or more sk_buff structures (SKBs) queued to the socket itself. So the core of a typical input() function will look something like:

    struct sk_buff *skb;

    while ((skb = skb_dequeue(sk->sk_receive_queue)) != NULL) {
        deal_with_incoming_data(skb);
	kfree_skb(skb);
    }

Sending data to user space involves allocating an SKB, filling it with the data, and writing it to the netlink socket. Here is how the kernel events mechanism does it:

    static int send_uevent(const char *signal, const char *obj,
		           char **envp, int gfp_mask)
    {
	struct sk_buff *skb;
	char *pos;
	int len;

	len = strlen(signal) + 1;
	len += strlen(obj) + 1;

	/* allocate buffer with the maximum possible message size */
	skb = alloc_skb(len + BUFFER_SIZE, gfp_mask);
	pos = skb_put(skb, len);
	sprintf(pos, "%s@%s", signal, obj);

	/* copy the environment key by key to our continuous buffer */
	if (envp) {
	    int i;

	    for (i = 2; envp[i]; i++) {
		len = strlen(envp[i]) + 1;
		pos = skb_put(skb, len);
		strcpy(pos, envp[i]);
	    }
	}
	return netlink_broadcast(uevent_sock, skb, 0, 1, gfp_mask);
    }

(Some error handling has been removed for brevity; see lib/kernel_uevent.c for the full version). The call to netlink_broadcast() sends the data in the SKB to every user-space process which is currently connected to the netlink socket. There is also netlink_unicast(), which takes a process ID and sends only to that process. Netlink writes can be restricted to specific "groups," allowing user-space processes to sign up for an interesting subset of the data written to a given socket.

There is more to the netlink interface than has been presented here; see <linux/netlink.h> for the rest.

Evgeniy Polyakov thinks that the netlink protocol is too complicated; it should not be necessary to understand the networking layer just to communicate with user space. His response is connector, a layer on top of netlink which is designed to make things simpler.

The connector code multiplexes all possible message types over a single netlink socket number. Individual messages are distinguished by way of a cb_id structure:

    struct cb_id
    {
	__u32 idx;
	__u32 val;
    };

idx can be thought of as a protocol type, and val as a message type within the given protocol. A kernel subsystem which is prepared to receive messages of a given type set up a callback with:

    int cn_add_callback(struct cb_id *id, char *name,
                        void (*callback)(void *msg));

That callback will be invoked every time a message with the given id is received from user space. The msg parameter to the callback function, despite its void * type, is always a pointer to a structure of this type:

    struct cn_msg
    {
	struct cb_id 		id;
	__u32			len;	/* Length of the following data */
	__u8			data[0];
        /* Some fields omitted */
    };

The callback can process the given message data and return.

Writing to a socket via connector is done with:

    void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);

The msg contains the cb_id structure describing the message; __groups can be used to restrict the list of recipients, and gfp_mask controls how memory allocation is done. This call can fail (netlink is an unreliable service), but it returns no indication of whether it succeeded or not.

For kernel code which needs to send significant amounts of data to user space, perhaps from hot paths, there is also a "CBUS" layer over the connector. That layer exports one function:

    int cbus_insert(struct cn_msg *msg, int gfp_flags);

This function does not send the message immediately; it simply adds it to a per-CPU queue. A separate worker thread will eventually come along, find the message, and send it on to user space.

The code seems to work, though some concerns have been raised about the implementation. Not everybody feels that the connector solution is necessary, however. The core netlink API is not all that hard to use, so it is not clear that another layer needs to be wrapped around it. Those who do think that netlink could be made easier do not agree on how it should be done; some developers would like to see the netlink API itself changed rather than having another layer put on top of it. Various user-space needs (auditing, accounting, desktop functionality, etc.) are all creating pressure for more communication channels with the kernel. Some way of making that communication easier on the kernel side may well get added, eventually, but it is far from clear what form that code will take.

Index entries for this article
Kernel	Events reporting
Kernel	Networking/Netlink

Extending netlink

Posted Apr 14, 2005 16:35 UTC (Thu) by bronson (subscriber, #4806) [Link] (1 responses)

Erm... If the netlink interface is too complex, why not make a libnetlink? Keep it in userspace! I haven't followed this on lkml so I may be missing something obvious...

Extending netlink

Posted Apr 14, 2005 17:59 UTC (Thu) by larryr (guest, #4030) [Link]

If the netlink interface is too complex, why not make a libnetlink? Keep it in userspace!

The purpose is to make the kernel side easier to implement.

Larry