[comp.unix.questions] reference about mbufs

david@Neon.Stanford.EDU (David M. Alexander) (07/06/90)

Does anyone know a book or article that discusses the mbuf structure
and the functions and macros to manipulate them?  I am having a hard
time finding something that talks about them.

Thanks,

Dave Alexander
david@neon.stanford.edu

chris@mimsy.umd.edu (Chris Torek) (07/22/90)

In article <1990Jul5.175406.22944@Neon.Stanford.EDU> david@Neon.Stanford.EDU
(David M. Alexander) writes:
>Does anyone know a book or article that discusses the mbuf structure
>and the functions and macros to manipulate them?

Hmm... do you mean `4.1BSD BBNNET mbufs', `4.2BSD mbufs', `4.3BSD mbufs',
`4.3BSD-tahoe mbufs', or `4.3BSD-reno mbufs'?  They are all different
(and probably differ from various Ultrix mbufs and maybe also 4.1a, b,
and c mbufs, but I never saw 4.1[abc]; now that SunOS has STREAMS one
would hope the kernel group settled on one kind of memory allocator as
well...).

4.3-tahoe mbufs are probably the simplest to easily explain:

struct mbuf {
	struct	mbuf *m_next;		/* next buffer in chain */

This links together mbufs that make up one (packet/group/whatever) so that
the amount of data in a data-chunk can be bigger than the maximum size of
a single mbuf.

	u_long	m_off;			/* offset of data */

This gives the offset from the base of the mbuf (the address of the entire
`struct mbuf') to the data.  For `normal' mbufs the data are somewhere in
m_dat[].  For `big' mbufs (`mclusters') the data are in a separate `page'
(typically 1Kbyte, i.e., not necessarily a hardware page) and the offset
is large (>= sizeof(struct mbuf)).

	short	m_len;			/* amount of data in this mbuf */

Thus the length of a complete packet is the sum of the lengths of all the
mbufs found via m_next's.

	short	m_type;			/* mbuf type (0 == free) */

One of the magic type codes.

	u_char	m_dat[MLEN];		/* data storage */

Up to 112 bytes of data.

	struct	mbuf *m_act;		/* link in higher-level mbuf list */

Various uses.  Mainly for datagram protocols: several packets are linked
together via m_act pointers.  Conceptually, following m_next pointers
`assembles' each packet, while following m_act pointers `lists' each
packet.  The m_act pointers are set only in the `top' mbufs:

	--------
  socket buffer: so->so_sb.sb_rcv
	--------
	    | sb_mb
	    v
	+-------+ m_act	+-------+ m_act	+-------+ m_act
	| pkt 1	|------>| pkt 2 |------>| pkt 3 |--->nil
	+-------+	+-------+	+-------+
	    | m_next	    | m_next	    | m_next
	    v		    v		    v
	+-------+	+-------+	+-------+
	|	|	|	|	|	|
	+-------+	+-------+	+-------+
	    | m_next	    | m_next	    | m_next
	+-------+	   nil		+-------+
	|	|			|	|
	+-------+			+-------+
	    | m_next			    | m_next
	   nil				   nil

};

functions/macros:

	MGET(m, waitflag, type)
sets `m' to point to a new mbuf of type `type'.  waitflag is either
M_DONTWAIT (if cannot sleep; then m may be set to nil) or M_WAIT (if
can sleep; then m will never be nil).

	M_CLALLOC(m, i)
Gets `i' mclusters (i must be 1).  Never waits; sets m to nil if there are
none.

	M_HASCL(m)
True iff m is an mcluster rather than a regular (tiny) mbuf.

	MTOCL(m)
Gets base of cluster page given an mcluster.

	MCLGET(m)
Changes m from a regular mbuf to an mcluster, if there is space.  If
not, leaves m a regular mbuf.  m->m_len is set to MCLBYTES on success,
or MLEN on failure (so, e.g., `M_HASCL' will tell whether it succeeded).

	MCLFREE(m)
Puts m on the mcluster free list.

	MFREE(m, n)
Puts m on the free list; sets n to what m->m_next used to be.  To free
a chain you could use
	while (m) { MFREE(m, n); m = n; }
Automatically knows when to use MCLFREE.

	struct mbuf *m_get(int waitflag, int type);
Returns a new mbuf, exactly like MGET except incurring a function call and
using less space.

	struct mbuf *m_getclr(int waitflag, int type);
Returns a new mbuf like m_get, but zeroes out all the data.

	struct mbuf *m_free(struct mbuf *m);
Puts m on the free list like MFREE; returns the old m->m_next.

	struct mbuf *m_more(int waitflag, int type);
Internal use (for MGET).

	struct mbuf *m_copy(struct mbuf *m, int off, int len);
Copies the data from the mbuf chain headed by `m' into new mbufs
(so that it can be modified without affecting other users of the
same data).  Skips the first `off' bytes of data; copies at most
`len' bytes.  Thus, to copy no more than 32 bytes from the chain
headed by `m', after skipping over the first 4 bytes, use
	mcopy = m_copy(m, 4, 32);
A `len' of M_COPYALL means `copy until end of chain'.

	struct mbuf *m_pullup(struct mbuf *m, int len);
`Pulls' a minimum of `len' bytes of data into the first mbuf in
the chain, possibly replacing the chain (as if via m_copy) in the
process.  Used to force entire packet headers into a single mbuf.

	mtod(m, type)
Gives (as type `type') the address of the first byte of data in m.
Used as, e.g.,
	m = m_pullup(m, sizeof(struct ip));	/* get entire IP header */
	struct ip *ip_header = mtod(m, struct ip *);

	dtom(pointer)
Turns an arbitrary data pointer into the corresponding mbuf (via trickery).
dtom() might someday go away.

Something important not mentioned above: packets received from an interface
are put on the appropriate protocol's input queue with the first mbuf
containing a pointer to the `ifnet' structure as its first item.  That
is, after receiving an IP packet, an Ethernet driver puts an mbuf chain
onto `ipintrq' that looks like:

	offset 0: *mtod(m, struct ifnet **) points back to Ethernet I/F
	offset sizeof(struct ifnet *): IP header, followed by data

The `IF_DEQUEUEIF' macro handles this little idiosyncracy.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

guy@auspex.auspex.com (Guy Harris) (07/22/90)

>Hmm... do you mean `4.1BSD BBNNET mbufs', `4.2BSD mbufs', `4.3BSD mbufs',
>`4.3BSD-tahoe mbufs', or `4.3BSD-reno mbufs'?  They are all different

But 4.3BSD and 4.3-tahoe "mbuf"s don't seem *very* different, at least
not by comparing their <sys/mbuf.h> and their "uipc_mbuf.c"s.  I'm
certainly prepared to believe they changed a fair bit in Reno....

>(and probably differ from various Ultrix mbufs and maybe also 4.1a, b,
>and c mbufs, but I never saw 4.1[abc];

*Never*?  I'm shocked....

>now that SunOS has STREAMS one would hope the kernel group settled on
>one kind of memory allocator as well...).

No, in SunOS 4.x mbufs (which is what the networking software, with the
exception of NIT and the TLI veneer atop the networking code in 4.1,
uses) and streams buffers are separate beasts.