Commit Graph

23369 Commits

Author SHA1 Message Date
Andy Grover
2c3a5f9abb RDS: Add flag for silent ops. Do atomic op before RDMA
Add a flag to the API so users can indicate they want
silent operations. This is needed because silent ops
cannot be used with USE_ONCE MRs, so we can't just
assume silent.

Also, change send_xmit to do atomic op before rdma op if
both are present, and centralize the hairy logic to determine if
we want to attempt silent, or not.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:06 -07:00
Andy Grover
7e3bd65ebf RDS: Move some variables around for consistency
Also, add a comment.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:05 -07:00
Andy Grover
940786eb0a RDS: queue failure notifications for dropped atomic ops
When dropping ops in the send queue, we notify the client
of failed rdma ops they asked for notifications on, but not
atomic ops. It should be for both.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:04 -07:00
Andy Grover
ee4c7b47e4 RDS: Add a warning if trying to allocate 0 sgs
rds_message_alloc_sgs() only works when nents is nonzero.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:03 -07:00
Andy Grover
372cd7dedf RDS: Do not set op_active in r_m_copy_from_user().
Do not allocate sgs for data for 0-length datagrams

Set data.op_active in rds_sendmsg() instead of
rds_message_copy_from_user().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:02 -07:00
Andy Grover
5b2366bd28 RDS: Rewrite rds_send_xmit
Simplify rds_send_xmit().

Send a congestion map (via xmit_cong_map) without
decrementing send_quota.

Move resetting of conn xmit variables to end of loop.

Update comments.

Implement a special case to turn off sending an rds header
when there is an atomic op and no other data.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:01 -07:00
Andy Grover
6c7cc6e469 RDS: Rename data op members prefix from m_ to op_
For consistency.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:59 -07:00
Andy Grover
f8b3aaf2ba RDS: Remove struct rds_rdma_op
A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:58 -07:00
Andy Grover
d0ab25a83c RDS: purge atomic resources too in rds_message_purge()
Add atomic_free_op function, analogous to rdma_free_op,
and call it in rds_message_purge().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:57 -07:00
Andy Grover
4324879df0 RDS: Inline rdma_prepare into cmsg_rdma_args
cmsg_rdma_args just calls rdma_prepare and does a little
arg checking -- not quite enough to justify its existence.
Plus, it is the only caller of rdma_prepare().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:56 -07:00
Andy Grover
241eef3e2f RDS: Implement silent atomics
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:55 -07:00
Andy Grover
d37c935905 RDS: Move loop-only function to loop.c
Also, try to better-document the locking around the
rm and its m_inc in loop.c.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:54 -07:00
Andy Grover
c8de3f1005 RDS/IB: Make all flow control code conditional on i_flowctl
Maybe things worked fine with the flow control code running
even in the non-flow-control case, but making it explicitly
conditional helps the non-fc case be easier to read.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:53 -07:00
Andy Grover
1d34f17571 RDS: Remove unsignaled_bytes sysctl
Removed unsignaled_bytes sysctl and code to signal
based on it. I believe unsignaled_wrs is more than
sufficient for our purposes.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:52 -07:00
Andy Grover
da5a06cef5 RDS: rewrite rds_ib_xmit
Now that the header always goes first, it is possible to
simplify rds_ib_xmit. Instead of having a path to handle 0-byte
dgrams and another path to handle >0, these can both be handled
in one path. This lets us eliminate xmit_populate_wr().

Rename sent to bytes_sent, to differentiate better from other
variable named "send".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:51 -07:00
Andy Grover
919ced4ce7 RDS/IB: Remove ib_[header/data]_sge() functions
These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.

Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:50 -07:00
Andy Grover
6f3d05db0d RDS/IB: Remove dead code
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:49 -07:00
Andy Grover
f147dd9eca RDS/IB: Disallow connections less than RDS 3.1
RDS 3.0 connections (in OFED 1.3 and earlier) put the
header at the end. 3.1 connections put it at the head.
The code has significant added complexity in order to
handle both configurations. In OFED 1.6 we can
drop this and simplify the code by only supporting
"header-first" configuration.

This patch checks the protocol version, and if prior
to 3.1, does not complete the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:48 -07:00
Andy Grover
9c030391e8 RDS/IB: eliminate duplicate code
both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:47 -07:00
Andy Grover
809fa148a2 RDS: inc_purge() transport function unused - remove it
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:46 -07:00
Andy Grover
6200ed7799 RDS: Whitespace
Tidy up some whitespace issues.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:44 -07:00
Andy Grover
d22faec22c RDS: Do not mask address when pinning pages
This does not appear to be necessary.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:43 -07:00
Andy Grover
40589e74f7 RDS: Base init_depth and responder_resources on hw values
Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:42 -07:00
Andy Grover
15133f6e67 RDS: Implement atomic operations
Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:41 -07:00
Andy Grover
a63273d499 RDS: Clear up some confusing code in send_remove_from_sock
The previous code was correct, but made the assumption that
if r_notifier was non-NULL then either r_recverr or r_notify
was true. Valid, but fragile. Changed to explicitly check
r_recverr (shows up in greps for recverr now, too.)

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:40 -07:00
Andy Grover
f4dd96f7b2 RDS: make sure all sgs alloced are initialized
rds_message_alloc_sgs() now returns correctly-initialized
sg lists, so calleds need not do this themselves.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:39 -07:00
Andy Grover
ff87e97a9d RDS: make m_rdma_op a member of rds_message
This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.

rdma SGs allocated from rm sg pool.

rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:38 -07:00
Andy Grover
21f79afa5f RDS: fold rdma.h into rds.h
RDMA is now an intrinsic part of RDS, so it's easier to just have
a single header.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:37 -07:00
Andy Grover
fc445084f1 RDS: Explicitly allocate rm in sendmsg()
r_m_copy_from_user used to allocate the rm as well as kernel
buffers for the data, and then copy the data in. Now, sendmsg()
allocates the rm, although the data buffer alloc still happens
in r_m_copy_from_user.

SGs are still allocated with rm, but now r_m_alloc_sgs() is
used to reserve them. This allows multiple SG lists to be
allocated from the one rm -- this is important once we also
want to alloc our rdma sgl from this pool.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:36 -07:00
Andy Grover
3ef13f3c22 RDS: cleanup/fix rds_rdma_unuse
First, it looks to me like the atomic_inc is wrong.
We should be decrementing refcount only once here, no? It's
already being done by the mr_put() at the end.

Second, simplify the logic a bit by bailing early (with a warning)
if !mr.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:35 -07:00
Andy Grover
e779137aa7 RDS: break out rdma and data ops into nested structs in rds_message
Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:33 -07:00
Andy Grover
8690bfa17a RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons
Favor "if (foo)" style over "if (foo != NULL)".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:32 -07:00
Andy Grover
2dc3935734 RDS: move rds_shutdown_worker impl. to rds_conn_shutdown
This fits better in connection.c, rather than threads.c.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:10:13 -07:00
Andy Grover
9de0864cf5 RDS: Fix locking in send on m_rs_lock
Do not nest m_rs_lock under c_lock

Disable interrupts in {rdma,atomic}_send_complete

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:32 -07:00
Andy Grover
7c82eaf00e RDS: Rewrite rds_send_drop_to() for clarity
This function has been the source of numerous bugs; it's just
too complicated. Simplified to nest spinlocks cleanly within
the second loop body, and kick out early if there are no
rms to drop.

This will be a little slower because conn lock is grabbed for
each entry instead of "caching" the lock across rms, but this
should be entirely irrelevant to fastpath performance.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:32 -07:00
Tina Yang
35b52c7053 RDS: Fix corrupted rds_mrs
On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion.  A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,

Thread #1
t0:
    sock_release
    rds_release
    rds_send_drop_to /* wait on send completion */
t2:
    rds_rdma_drop_keys()   /* destroy & free all mrs */

Thread #2
t1:
    rds_ib_send_cq_comp_handler
    rds_ib_send_unmap_rm
    rds_message_unmapped   /* wake up #1 @ t0 */
t3:
    rds_message_put
    rds_message_purge
    rds_mr_put   /* memory corruption detected */

The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.

        /* Release any MRs associated with this socket */
        while ((node = rb_first(&rs->rs_rdma_keys))) {
                mr = container_of(node, struct rds_mr, r_rb_node);
                if (mr->r_trans == rs->rs_transport)
                        mr->r_invalidate = 0;
                rds_mr_put(mr);
        }

I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one.  Whichever thread
holds the last reference will free the mr via rds_mr_put().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:31 -07:00
Andy Grover
9e2effba2c RDS: Fix BUG_ONs to not fire when in a tasklet
in_interrupt() is true in softirqs. The BUG_ONs are supposed
to check for if irqs are disabled, so we should use
BUG_ON(irqs_disabled()) instead, duh.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:31 -07:00
Jianzhao Wang
ae2688d59b net: blackhole route should always be recalculated
Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
triggered by IKE for example), hence this kind of route is always
temporary and so we should check if a better route exists for next
packets.
Bug has been introduced by commit d11a4dc18b.

Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 14:35:43 -07:00
Jarek Poplawski
f6b085b69d ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
Hi,
Here is one more of these warnings and a patch below:

Sep  5 23:52:33 del kernel: [46044.244833] ===================================================
Sep  5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ]
Sep  5 23:52:33 del kernel: [46044.277000] ---------------------------------------------------
Sep  5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!
Sep  5 23:52:33 del kernel: [46044.293627]
Sep  5 23:52:33 del kernel: [46044.293632] other info that might help us debug this:
Sep  5 23:52:33 del kernel: [46044.293634]
Sep  5 23:52:33 del kernel: [46044.325333]
Sep  5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0
Sep  5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717:
Sep  5 23:52:33 del kernel: [46044.357548]  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20
Sep  5 23:52:33 del kernel: [46044.367647]
Sep  5 23:52:33 del kernel: [46044.367652] stack backtrace:
Sep  5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3
Sep  5 23:52:33 del kernel: [46044.398764] Call Trace:
Sep  5 23:52:33 del kernel: [46044.409596]  [<c12f9aba>] ? printk+0x18/0x1e
Sep  5 23:52:33 del kernel: [46044.420761]  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
Sep  5 23:52:33 del kernel: [46044.432229]  [<c12b7235>] trie_firstleaf+0x65/0x70
Sep  5 23:52:33 del kernel: [46044.443941]  [<c12b74d4>] fib_table_flush+0x14/0x170
Sep  5 23:52:33 del kernel: [46044.455823]  [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0
Sep  5 23:52:33 del kernel: [46044.467995]  [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40
Sep  5 23:52:33 del kernel: [46044.480404]  [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180
Sep  5 23:52:33 del kernel: [46044.493025]  [<c12b069d>] fib_flush+0x2d/0x60
Sep  5 23:52:33 del kernel: [46044.505796]  [<c12b06f5>] fib_disable_ip+0x25/0x50
Sep  5 23:52:33 del kernel: [46044.518772]  [<c12b10d3>] fib_netdev_event+0x73/0xd0
Sep  5 23:52:33 del kernel: [46044.531918]  [<c1048dfd>] notifier_call_chain+0x2d/0x70
Sep  5 23:52:33 del kernel: [46044.545358]  [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20
Sep  5 23:52:33 del kernel: [46044.559092]  [<c124f687>] call_netdevice_notifiers+0x27/0x60
Sep  5 23:52:33 del kernel: [46044.573037]  [<c124faec>] __dev_notify_flags+0x5c/0x80
Sep  5 23:52:33 del kernel: [46044.586489]  [<c124fb47>] dev_change_flags+0x37/0x60
Sep  5 23:52:33 del kernel: [46044.599394]  [<c12a8a8d>] devinet_ioctl+0x54d/0x630
Sep  5 23:52:33 del kernel: [46044.612277]  [<c12aabb7>] inet_ioctl+0x97/0xc0
Sep  5 23:52:34 del kernel: [46044.625208]  [<c123f6af>] sock_ioctl+0x6f/0x270
Sep  5 23:52:34 del kernel: [46044.638046]  [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0
Sep  5 23:52:34 del kernel: [46044.650968]  [<c123f640>] ? sock_ioctl+0x0/0x270
Sep  5 23:52:34 del kernel: [46044.663865]  [<c10c3188>] vfs_ioctl+0x28/0xa0
Sep  5 23:52:34 del kernel: [46044.676556]  [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0
Sep  5 23:52:34 del kernel: [46044.688989]  [<c1048676>] ? up_read+0x16/0x30
Sep  5 23:52:34 del kernel: [46044.701411]  [<c1021376>] ? do_page_fault+0x1d6/0x3a0
Sep  5 23:52:34 del kernel: [46044.714223]  [<c10b6588>] ? fget_light+0xf8/0x2f0
Sep  5 23:52:34 del kernel: [46044.726601]  [<c1241f98>] ? sys_socketcall+0x208/0x2c0
Sep  5 23:52:34 del kernel: [46044.739140]  [<c10c3eb3>] sys_ioctl+0x63/0x70
Sep  5 23:52:34 del kernel: [46044.751967]  [<c12fca3d>] syscall_call+0x7/0xb
Sep  5 23:52:34 del kernel: [46044.764734]  [<c12f0000>] ? cookie_v6_check+0x3d0/0x630

-------------->

This patch fixes the warning:
 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 1 lock held by pppd/1717:
  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20

 stack backtrace:
 Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3
 Call Trace:
  [<c12f9aba>] ? printk+0x18/0x1e
  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
  [<c12b7235>] trie_firstleaf+0x65/0x70
  [<c12b74d4>] fib_table_flush+0x14/0x170
  ...

Allow trie_firstleaf() to be called either under rcu_read_lock()
protection or with RTNL held. The same annotation is added to
node_parent_rcu() to prevent a similar warning a bit later.

Followup of commits 634a4b20 and 4eaa0e3c.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 14:14:20 -07:00
Namhyung Kim
fb8621bb6c net: remove address space warnings in net/socket.c
Casts __kernel to __user pointer require __force markup, so add it. Also
sock_get/setsockopt() takes @optval and/or @optlen arguments as user pointers
but were taking kernel pointers, use new variables 'uoptval' and/or 'uoptlen'
to fix it. These remove following warnings from sparse:

 net/socket.c:1922:46: warning: cast adds address space to expression (<asn:1>)
 net/socket.c:3061:61: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3061:61:    expected char [noderef] <asn:1>*optval
 net/socket.c:3061:61:    got char *optval
 net/socket.c:3061:69: warning: incorrect type in argument 5 (different address spaces)
 net/socket.c:3061:69:    expected int [noderef] <asn:1>*optlen
 net/socket.c:3061:69:    got int *optlen
 net/socket.c:3063:67: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3063:67:    expected char [noderef] <asn:1>*optval
 net/socket.c:3063:67:    got char *optval
 net/socket.c:3064:45: warning: incorrect type in argument 5 (different address spaces)
 net/socket.c:3064:45:    expected int [noderef] <asn:1>*optlen
 net/socket.c:3064:45:    got int *optlen
 net/socket.c:3078:61: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3078:61:    expected char [noderef] <asn:1>*optval
 net/socket.c:3078:61:    got char *optval
 net/socket.c:3080:67: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3080:67:    expected char [noderef] <asn:1>*optval
 net/socket.c:3080:67:    got char *optval

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:46:13 -07:00
Changli Gao
6febfca98f net: rps: add the shortcut for one rps_cpus
When there is only one rps_cpus, skb_get_rxhash() can be eliminated.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:10:53 -07:00
Diego Elio 'Flameeyes' Pettenò
65040c33ee sctp: implement SIOCINQ ioctl() (take 3)
This simple patch copies the current approach for SIOCINQ ioctl() from DCCP
into SCTP so that the userland code working with SCTP can use a similar
interface across different protocols to know how much space to allocate for
a buffer.

Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:08:05 -07:00
Julian Anastasov
6523ce1525 ipvs: fix active FTP
- Do not create expectation when forwarding the PORT
  command to avoid blocking the connection. The problem is that
  nf_conntrack_ftp.c:help() tries to create the same expectation later in
  POST_ROUTING and drops the packet with "dropping packet" message after
  failure in nf_ct_expect_related.

- Change ip_vs_update_conntrack to alter the conntrack
  for related connections from real server. If we do not alter the reply in
  this direction the next packet from client sent to vport 20 comes as NEW
  connection. We alter it but may be some collision happens for both
  conntracks and the second conntrack gets destroyed immediately. The
  connection stucks too.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 10:39:57 -07:00
Jarek Poplawski
64289c8e68 gro: Re-fix different skb headrooms
The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p->data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be4333f

Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 10:32:15 -07:00
J. Bruce Fields
3211af1119 svcrpc: cache deferral cleanup
Attempt to make obvious the first-try-sleeping-then-try-deferral logic
by putting that logic into a top-level function that calls helpers.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 20:20:31 -04:00
J. Bruce Fields
6610f720e9 svcrpc: minor cache cleanup
Pull out some code into helper functions, fix a typo.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 20:19:12 -04:00
NeilBrown
f16b6e8d83 sunrpc/cache: allow threads to block while waiting for cache update.
The current practice of waiting for cache updates by queueing the
whole request to be retried has (at least) two problems.

1/ With NFSv4, requests can be quite complex and re-trying a whole
  request when a later part fails should only be a last-resort, not a
  normal practice.

2/ Large requests, and in particular any 'write' request, will not be
  queued by the current code and doing so would be undesirable.

In many cases only a very sort wait is needed before the cache gets
valid data.

So, providing the underlying transport permits it by setting
 ->thread_wait,
arrange to wait briefly for an upcall to be completed (as reflected in
the clearing of CACHE_PENDING).
If the short wait was not long enough and CACHE_PENDING is still set,
fall back on the old approach.

The 'thread_wait' value is set to 5 seconds when there are spare
threads, and 1 second when there are no spare threads.

These values are probably much higher than needed, but will ensure
some forward progress.

Note that as we only request an update for a non-valid item, and as
non-valid items are updated in place it is extremely unlikely that
cache_check will return -ETIMEDOUT.  Normally cache_defer_req will
sleep for a short while and then find that the item is_valid.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 19:22:07 -04:00
NeilBrown
c5b29f885a sunrpc: use seconds since boot in expiry cache
This protects us from confusion when the wallclock time changes.

We convert to and from wallclock when  setting or reading expiry
times.

Also use seconds since boot for last_clost time.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 19:21:20 -04:00
Linus Torvalds
608307e6de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
  pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator
  ipvs: avoid oops for passive FTP
  Revert "sky2: don't do GRO on second port"
  gro: fix different skb headrooms
  bridge: Clear INET control block of SKBs passed into ip_fragment().
  3c59x: Remove incorrect locking; correct documented lock hierarchy
  sky2: don't do GRO on second port
  ipv4: minor fix about RPF in help of Kconfig
  xfrm_user: avoid a warning with some compiler
  net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf()
  pxa168_eth: fix a mdiobus leak
  net sched: fix kernel leak in act_police
  vhost: stop worker only if created
  MAINTAINERS: Add ehea driver as Supported
  ath9k_hw: fix parsing of HT40 5 GHz CTLs
  ath9k_hw: Fix EEPROM uncompress block reading on AR9003
  wireless: register wiphy rfkill w/o holding cfg80211_mutex
  netlink: Make NETLINK_USERSOCK work again.
  irda: Correctly clean up self->ias_obj on irda_bind() failure.
  wireless extensions: fix kernel heap content leak
  ...
2010-09-07 14:06:10 -07:00
David S. Miller
6f86b32518 ipv4: Fix reverse path filtering with multipath routing.
Actually iterate over the next-hops to make sure we have
a device match.  Otherwise RP filtering is always elided
when the route matched has multiple next-hops.

Reported-by: Igor M Podlesny <for.poige@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:24 -07:00