[comp.theory] Finding subsets of equations

abbott@AEROSPACE.AERO.ORG ("Russell J. Abbott") (06/01/90)

Does anyone have or know of a solution to the following problem?
Suppose you have n equations in m unknowns, m > n and want to find a
subset of those equations in which the number of unknowns is no greater
than the number of equations.  Is there a way to do that without a brute
force search through all possible combinations of equations with common
unknowns?

rwi@cs.glasgow.ac.uk (Dr Rob Irving) (06/04/90)

In article <74444@aerospace.AERO.ORG>, abbott@aerospace.aero.org (Russell J. Abbott) writes:

> Does anyone have or know of a solution to the following problem?
> Suppose you have n equations in m unknowns, m > n and want to find a
> subset of those equations in which the number of unknowns is no greater
> than the number of equations.  Is there a way to do that without a brute
> force search through all possible combinations of equations with common
> unknowns?

rwi@cs.glasgow.ac.uk (Dr Rob Irving) (06/04/90)

In article <74444@aerospace.AERO.ORG>, abbott@aerospace.aero.org (Russell J. Abbott) writes:

> Does anyone have or know of a solution to the following problem?
> Suppose you have n equations in m unknowns, m > n and want to find a
> subset of those equations in which the number of unknowns is no greater
> than the number of equations.  Is there a way to do that without a brute
> force search through all possible combinations of equations with common
> unknowns?

It can be solved using bipartite matching as follows:

Construct a bipartite graph G with one set of vertices W representing the
equations and the other set U representing the unknowns. Join a vertex w in W to
a vertex u in U if the equation w contains the unknown x.

By matching theory, there is a set of q equations involving <q unknowns if and
only if the maximum cardinality matching in G has <q edges. Applying a bipartite
matching algorithm, such as the standard one based on augmenting paths, will
therefore determine whether such a set of equations exists, and if so, can
easily be extended to find a 'critical' set  -  i.e., a smallest such set
with maximum excess of equations over unknowns (maximum 'deficiency' in the
terminology of matching).

Otherwise, if there is a set of q equations involving exactly q unknowns,
first find a maximum cardinality matching M in G; say it involves vertices
(unknowns) x_1, ..., x_n. Then for each of these vertices in turn, remove it
from G and find a maximum cardinality matching in the resulting graph. As soon
as the resulting matching has size n-1 (rather than n) then the corresponding
critical set of equations is the one (or at least one of those) sought.

Since bipartite matching can be solved in O(|E||V|^0.5) time, where V is the
vertex set and E the edge set, this method is clearly much better than
exhaustive search.

Rob Irving.

tompa@GEODUCK.CS.WASHINGTON.EDU (06/04/90)

Russell J. Abbott asked the following:

  Does anyone have or know of a solution to the following problem?
  Suppose you have n equations in m unknowns, m > n and want to find a
  subset of those equations in which the number of unknowns is no greater
  than the number of equations.  Is there a way to do that without a brute
  force search through all possible combinations of equations with common
  unknowns?

Here is a polynomial time algorithm for this problem, but I'd be interested
to know if there is a more efficient one.  Construct a bipartite graph,
whose left block contains a vertex for each of the n equations, and whose
right block contains a vertex for each of the m unknowns.  Now test if this
graph has a perfect matching.  If it does not, then, by Hall's Theorem,
there is some set of k equations with at most k-1 unknowns among them, and
you are done.  (The standard perfect matching algorithms can be made to
output the set of k equations when they fail to find a perfect matching.)

Now to finish, add a new vertex u to the left block, and invoke  the
perfect matching algorithm m more times as follows:  in the ith invocation,
u is adjacent only to the ith vertex in the right block.  If any of these m
graphs fails to have a perfect matching, it must be because k equations
have exactly k unknowns among them, one of them being the one to which u is
currently adjacent (and again, the algorithm can be made to output the set
of k equations).  If all m have perfect matchings, then every set of k
equations has at least k+1 unknowns among them, for all k.

This still seems like brute force to me: it would be interesting to know
whether the problem stated can be reduced to only one call on perfect
matching.

Martin Tompa
tompa@cs.washington.edu

spencert@turing.cs.rpi.edu (Thomas Spencer) (06/05/90)

In article <9006041600.AA09706@irt.watson.ibm.com> tompa%geoduck.cs.washington.edu@VM1.NoDak.EDU writes:
>Russell J. Abbott asked the following:
>
>  Does anyone have or know of a solution to the following problem?
>  Suppose you have n equations in m unknowns, m > n and want to find a
>  subset of those equations in which the number of unknowns is no greater
>  than the number of equations.  Is there a way to do that without a brute
>  force search through all possible combinations of equations with common
>  unknowns?
>
>Here is a polynomial time algorithm for this problem, but I'd be interested
>to know if there is a more efficient one.  Construct a bipartite graph,
>whose left block contains a vertex for each of the n equations, and whose
>right block contains a vertex for each of the m unknowns.  Now test if this
>graph has a perfect matching.  If it does not, then, by Hall's Theorem,
>there is some set of k equations with at most k-1 unknowns among them, and
>you are done.  (The standard perfect matching algorithms can be made to
>output the set of k equations when they fail to find a perfect matching.)
>
>Now to finish, add a new vertex u to the left block, and invoke  the
>perfect matching algorithm m more times as follows:  in the ith invocation,
>u is adjacent only to the ith vertex in the right block.  If any of these m
>graphs fails to have a perfect matching, it must be because k equations
>have exactly k unknowns among them, one of them being the one to which u is
>currently adjacent (and again, the algorithm can be made to output the set
>of k equations).  If all m have perfect matchings, then every set of k
>equations has at least k+1 unknowns among them, for all k.
>
>This still seems like brute force to me: it would be interesting to know
>whether the problem stated can be reduced to only one call on perfect
>matching.
>
>Martin Tompa
>tompa@cs.washington.edu

Only one call to the maximum cardinality matching algorithm is necessary.
First, find a maximum cardinality matching M.  If it does not match all of 
the equations, we're done, as Martin says.  Next, direct all of the edges
not in M, from the unknown to the equation and all the edges in M from
the equation to the unknown.  Finally, determine if there is a directed
path to each equation from some unmatched unknown.  (This is the same
thing as determining if every equation is reachable via an alternating
path from an unmatched unknown.)  If every equation is reachable, then
no such set exists; otherwise, the unreachable equations are the desired 
set.

To see why this is true, let's consider the case where the set of unreachable
equations is empty, first.  Consider a set S, of equations.  It is adjacent,
via edges in the matching to a set T of unknowns.  It is enough to show
that it is also adjacent to some other unknown.  Consider an equation r
in S.  There is a path P from some unmatched unknown u to r.  Let v be
the first equation in S on this path, and let w be the previous unknown
along this path.  w is either the first vertex of the path P, or the 
edge of P entering w is matched.  In either case w can not be in T, but
w is adjacent to S.  Thus, the equations in S contain more unknowns than
the number of equations in S.

Conversely, the set of unreachable equations might not be empty.  Call 
this set S.  Let T be the set of unknowns adjacent to S.  Suppose that 
T contains an unknown x that is not matched to an equation in S.  Since
x is in T, x is adjacent to some equation y in S.  If x is not matched 
at all, then y is reachable, contrary to the definition of S.  Alternatively, 
x may be matched to some vertex z not in S.  But, z is reachable and the
path can be extended through x to y.  Again this contradicts the definition
of S.  Therefore, T contains only unknowns that are matched to equations
in S, so |T| <= |S|.  QED

I hope that this helps

			-Tom Spencer
			 spencert@turing.cs.rpi.edu
			 uunet!steinmetz!itsgw!spencert
"First figure out what you are trying to do."  -Me.

wilber@homxb.att.COM (Bob Wilber) (06/05/90)

Russell J. Abbott writes:
>Does anyone have or know of a solution to the following problem?
>Suppose you have n equations in m unknowns, m > n and want to find a
>subset of those equations in which the number of unknowns is no greater
>than the number of equations.  Is there a way to do that without a brute
>force search through all possible combinations of equations with common
>unknowns?

Martin Tompa replies:
>Here is a polynomial time algorithm for this problem, but I'd be interested
>to know if there is a more efficient one.  Construct a bipartite graph,
>whose left block contains a vertex for each of the n equations, and whose
>right block contains a vertex for each of the m unknowns.  Now test if this
>graph has a perfect matching.  If it does not, then, by Hall's Theorem,
>there is some set of k equations with at most k-1 unknowns among them, and
>you are done.  (The standard perfect matching algorithms can be made to
>output the set of k equations when they fail to find a perfect matching.)
>
>Now to finish, add a new vertex u to the left block, and invoke  the
>perfect matching algorithm m more times as follows:  in the ith invocation,
>u is adjacent only to the ith vertex in the right block.  If any of these m
>graphs fails to have a perfect matching, it must be because k equations
>have exactly k unknowns among them, one of them being the one to which u is
>currently adjacent (and again, the algorithm can be made to output the set
>nof k equations).  If all m have perfect matchings, then every set of k
>equations has at least k+1 unknowns among them, for all k.
>
>This still seems like brute force to me: it would be interesting to know
>whether the problem stated can be reduced to only one call on perfect
>matching.

The answer is yes.  Construct the bipartite graph G = (Q, V, E) where
Q = the set of equations, V = the set of variables, and E = { (q, v) |
equation q contains variable v }.  Compute a maximal matching.  As above,
if the number of edges in the matching is < n, then by Hall's Theorem
there is a subset of k equations with <= k-1 variables among them, and
we're done.  Suppose we have a matching with n edges.  Call a vertex v in V
exposed if it is not adjacent to any edge in the matching.  Say that a path
is alternating if it is a simple path that starts at an exposed vertex
and alternates between unmatched and matched edges, with the first and last
edges being unmatched.  (So the last vertex in the path is in Q.)  Let Q'
be the set of vertices in Q that are contained in some alternating path.
Note that Q' can be computed in O(E) time by depth first search starting at
the exposed vertices.

If a subset S of k equations contains an equation in Q', then those equations
have at least k+1 variables among them.  For there are the k variables at the
endpoints of the matched edges coming from the k equations, and in addition we
can find one more variable as follows.  Let q_0 be an equation in S that is Q'.
Let q_0, v_0, q_1, v_1, ..., q_i, v_i be an alternating path from q_0 to an
exposed vertex v_i.  Let q_j be the last equation in the path that is in S.
Vertex v_j is either the exposed vertex, v_i, or it is matched up with an
equation q_{j+1} that is not in S, and since v_j is in equation q_j it brings
the total number of variables among the equations in S up to k+1.

So if Q' = Q there is no subset of k equations with <= k variables.
Conversely, if Q'' = Q - Q' is nonempty, then Q'' has as many variables as
equations.  For let V'' be the set of variables matched up with the variables
in Q'' (|V''| = |Q''|).  Every variable in an equation in Q'' must be in V''.
Suppose to the contrary that an equation q_0 in Q'' has a variable v that is
not in V''.  If v is exposed then edge (v, q_0) is an alternating path so q is
in Q', a contradiction.  Otherwise let q_1 be the equation that v is matched
with.  Since v is not in V'', q_1 is in Q'.  But then the alternating path to
q_1 can be extended with edges (q_1, v) and (v, q_0), so q_0 is in Q', again a
contradiction.

The complexity of the algorithm is dominated by the time to compute one maximal
matching, O(|E| sqrt(m)).

Bob Wilber

gilbert@parc.xerox.COM (06/05/90)

Russell J. Abbott asked the following:

>   Does anyone have or know of a solution to the following problem?
>   Suppose you have n equations in m unknowns, m > n and want to find a
>   subset of those equations in which the number of unknowns is no greater
>   than the number of equations.  Is there a way to do that without a brute
>   force search through all possible combinations of equations with common
>   unknowns?

Martin Tompa gave a solution that calls maximum matching m times and asked:

>   This still seems like brute force to me: it would be interesting to know
>   whether the problem stated can be reduced to only one call on perfect
>   matching.


It is indeed possible to solve the problem with one call on maximum
matching.  This was essentially done by Dulmage, Mendelsohn, and
Johnson in the 1950's.  It is now used in sparse matrix computation;
see e.g. Coleman, Gilbert, and Edenbrandt, "Predicting fill for sparse
orthogonal factorization", JACM 33 (1986) pp. 517-532.

Briefly, a matrix (or bipartite graph) with more rows than columns has the
"strong Hall property" (SHP) if every nonempty subset of columns is incident
on a strictly larger set of rows.  Abbott's problem is to decide SHP for
the transpose of the matrix, and to find a certificate of non-SHPness if
one exists.  Call the two sets of vertices in the bipartite graph of the
matrix "rows" and "columns".

Theorem:  For a bipartite graph with more rows than columns,
          the following are equivalent:
(1)  M has SHP.
(2)  There exists a maximum matching on M for which every vertex of M is
     reachable from an unmatched row vertex by an alternating path.
(3)  For every maximum matching on M, every vertex of M is reachable from
     an unmatched row vertex by an alternating path.  (I.e., any maximum
     matching will do in (2).)

Thus one can test SHP in O(e sqrt(v)) time by finding any maximum matching
and then searching along alternating paths from unmatched rows.  This also
gives a witness set if the matrix does not have SHP:  In this case the set
C' of columns not reached during the search is incident only on the set R'
of rows matched to those columns, which cannot be larger.

(Incidentally, a square matrix M is usually defined to have SHP if every
proper subset of columns is incident on a strictly larger set of rows.
Then square M has SHP iff every pair of vertices is joined by an alternating
path in some/every maximum matching.  Any bipartite graph has a canonical
"Dulmage-Mendelsohn decomposition" into subgraphs that are strong Hall or
whose transposes are strong Hall, which generalizes the strongly connected
components of a directed graph; the decomposition can be found by matching
plus depth-first search through alternating paths.)

- John Gilbert
  gilbert@parc.xerox.com

furer@CS.PSU.EDU (Martin Furer) (06/05/90)

Russell J. Abbott has asked:

> Does anyone have or know of a solution to the following problem?
> Suppose you have n equations in m unknowns, m > n and want to find a
> subset of those equations in which the number of unknowns is no greater
> than the number of equations.  Is there a way to do that without a brute
> force search through all possible combinations of equations with common
> unknowns?

Martin Tompa has answered with a polynomial time algorithm calling a maximum
bipartite matching subroutine m+1 times. He has remarked:

> This still seems like brute force to me: it would be interesting to know
> whether the problem stated can be reduced to only one call on perfect
> matching.

In fact Martin Tompa's implicit conjecture is correct, one call to a matching
algorithm is sufficient.

First, we follow Martin Tompa. We form the bipartite graph with equations on
the left and unknowns on the right hand side, and we compute a maximum
matching. If the maximum matching does not involve all equations (i.e., it
is not left-"perfect"), then the standard algorithm gives a subset of vertices
violating Hall's condition and solving the problem.

Now, let us look at the case, where every left vertex (equation) is matched.
We transform the bipartite graph into a directed graph as follows.
We direct all edges from left to right. In addition, the matched edges are
directed from right to left too. Using depth-first search, we compute the
strongly connected components of this directed graph. These components are
partially ordered by:
   component(u) < component(v)  iff  there is a directed path from u to v.

Take any anti-chain (i.e., set of incomparable components) in this partial
order, and add all components which are greater (with respect to < )  than
any component in the anti-chain. We have a perfect matching on this subset,
and there is no edge to a right vertex (an unknown) outside. Hence, this is a
solution to the problem. In fact, it is not hard to see that every solution
is of this form (in the case, where a left-"perfect" matching exists).

Martin Furer
Pennsylvania State University
furer@shire.cs.psu.edu