How did Strassen come up with his matrix multiplication method?

The famous Strassen’s matrix multiplication algorithm is a real treat for us, as it reduces the time complexity from the traditional O(n³) to O(n^2.8).

But of all the resources I have gone through, even Cormen and Steven Skienna’s book, they clearly do not state of how Strassen thought about it.

What is the rationale of Strassen’s matrix multiplication algorithm? Is this a lucky accident or is there something deeper in it?

Apart from Strassen, nobody is able to tell you how Strassen has got
his idea. Howeber¹, I can tell you, how you could have found that
formula yourself—provided that you are interested in algebraic
geometry and representation theory. This also gives you the tools to show that Strassen’s formula is as good as it can, or more precisely, that there is no formula computing the product of two 2×2 matrices that uses fewer than 7 multiplications.

Since you are interested by matrices I assume you know basic linear
algebra and will be a bit blurry for the more advanced details.

First let be E the set of all linear maps from a plane to a
plane. This is basically the set of all 2×2 matrices, but we forget
about a particular coordinate system—because, if there were a better
coordinate system than the “default one” we could have interest in
using it for matrix multiplication. We also denote by E† the dual
space of E and by X = P(E⊗E†⊗E†) the projective space associated
to the tensor product E⊗E†⊗E†.

An element of X = P(E⊗E†⊗E†) of the special form [c⊗α⊗β] can be
interpreted as an elementary operation on matrices, which, in some
appopriate coordinate systems, reads a coefficient of a matrix A and
a coefficient of a matrix B and writes the product of these
coefficients in some matrix C. A general element of X is a combination of
these elementary operations, so the product π of two matrices,
understood as a map from P(E)×P(E) to P(E), is a point in X.

The usual matrix product formula and Strassen’s formula can be
expressed as combinations of these linear operations, so let me denote
by W₁ the set of these elementary operations [c⊗α⊗β] and let me describe
geometrically their combinations.

Let W₂ be the variety of secants of W₁ in X. It is obtained by
taking the (closure of the) union of all lines going through two
(generic) points of W₁. We can think of a it as of the set of all
combinations of two elemetary operations.

Let W₃ be the variety of secant planes of W₁ in X. It is obtained by
taking the (closure of the) union of all planes going through three
(generic) points of W₁. We can think of a it as of the set of all
combinations of three elemetary operations.

Similarly, we define secant varieties for greater indices. Note that
these varieties grow larger and larger, that is W₁⊂W₂⊂W₃⊂⋯ Hence
the classical matrix product formula shows that the product of
matrices is a point of W₈. Actually

PROPOSITION(Strassen) — The product of matrices π lies in W₇.

As far as I know, Strassen did not put things that way, however this
is a geometric point of view on this question. This point of view is
very useful, because it also lets you prove that Strassen’s formula is
the best, that is, that π does not lie in W₆. Geometric methods
developped here can also be used for a broader range of problems.

I hope, I caught your curiosity. You can go further by reading this
article by Landsberg and Manivel:

http://arxiv.org/abs/math/0601097

¹ I will not fix this typo, because I caught a cold.

I’ve just been tasked with doing this for homework, and I thought I had a neat epiphany: Strassen’s algorithm sacrifices the “breadth” of its pre-summation components in order to use less operations in exchange for “deeper” pre-summation components that can still be used to extract the final answer. (This isn’t the best way to say it, but it’s hard for me to explain it).

I’m going to use the example of multiplying two complex numbers together to illustrate the balance of “operations vs. components“:

Notice that we use 4 multiplications, which result in 4 product components:

Note that the 2 final components we want: the real and the imaginary parts of the complex number, are actually linear equations: they are sums of scaled products. So we are dealing with two operations here: addition and multiplication.

The fact is that our 4 product components can represent our 2 final components if we simply add or subtract our components:

But, our final 2 components can be represented as sums of products. Here’s what I came up with:

If you can see, we actually only need 3 distinct product components to make our final two:

But wait! Each of the capital letters are in themselves products! But the catch is that we know we can generate (A+B+C+D) from (a+b)(c+d), which is only 1 multiplication.

So in the end, our algorithm is optimized to use less, but “fatter” components, where we trade the amount of multiplications for more summing operation.

Part of what enables this is the distributive property, which allows A(B+C) to be equivalent to (AB+AC). Notice how the first can be computed using 1 add and 1 multiply operation, while the second requires 2 multiplies and 1 sum.

Strassen’s algorithm is an extension of the optimization we applied to complex number products, except there are more target product terms and possible more product components we can use to get those terms. For a 2×2 matrix, Strassen’s algorithm morphs an algorithm that needs 8 multiplications to one that needs 7 multiplications, and leverages the distributive property to “merge” two multiplications into one operation, and instead takes away from the new “fatter” node to extract one product term or the other, etc.

A good example: to get (-1) and (2) and (5), you can think about it as just (-1), (2), (5), or you can think about it as (2-3), (2), (2+3). The second operations use less distinct numbers, though. The catch is that the number of distinct numbers is equivalent to the number of product components you need to compute for matrix multiplication. We simply optimize for this to find a certain view of the underlying operations that leverages isomorphic outputs using a different variation through the distributive property.

Perhaps this could be linked to topology in some way? This is just my layman’s way of understanding it.

Edit: Here is a picture of my notes I drew in the process of making the complex number explanation:

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 06:45

Thẻ: algorithms, matrix