Most areas of CS you do are unlikely to need advanced calculus or topics that de...

Dn_Ab · on July 8, 2012

I agree. ten's list is excellent but the OP wanted to learn topics that would help with CS and algorithms. To that end graph theory, combinatorics and basic calculus should have been given more weight. But for theoretical CS I can't think of where Linear algebra would be useful. Abstract Algebra yes, but why Linear Algebra ? (p.s. i read your stuff on kelly criterion a long whiles ago, top notch, thanks!)

----

I would also suggest more emphasis on logic, sets, and type theory. Category Theory is also something fairly common in CS. So I would recommend (Free!): www.cs.unibo.it/~asperti/PAPERS/book.pdf

Abstract algebra: http://abstract.ups.edu/

And while I am skeptical as to the need of linear algebra for CS, it is such a key requirement for mathematical maturity I will suggest: http://www.math.miami.edu/~ec/book/. If Linear algebra is a building then abstract algebra is the frame. The two should really be taught at the same time.

btilly · on July 8, 2012

But for theoretical CS I can't think of where Linear algebra would be useful. Abstract Algebra yes, but why Linear Algebra?

Linear algebra is a necessary piece of background for linear programming (including the simplex method) and or standard approximation algorithms for many NP-hard problems.

Strassen's algorithm for fast matrix multiplication is commonly taught in algorithms class. It does not make much sense unless you know what matrix multiplication is.

Also if you want to work in computer graphics (which comes up both in theoretical and applied problems) you will need a solid understanding of linear algebra and matrices to understand the material.

I could list more, but that's enough to demonstrate that linear algebra does come up in a lot of places.

(p.s. i read your stuff on kelly criterion a long whiles ago, top notch, thanks!)

You're welcome, and thanks for the compliment. :-)

Dn_Ab · on July 8, 2012

I thought of games and linear programming but didn't really count them as theoretical CS. The graphics and compilers groups don't really mix but I guess Graphics researchers are doing theoretical CS. But what's more, I didnt consider that there are people working on better algorithms for various integral transforms, on learning theory bounds and algorithms for various matrix operations and decompositions. Doh. For some reason I most strongly associate CS Theory to Languages. Which was silly in hindsight.

btilly · on July 8, 2012

We tend to see the world through a filter of our own experience. :-)

ten_fingers · on July 8, 2012

> Most areas of CS you do are unlikely to need advanced calculus or topics that depend on it (differential equations, topology, measure theory...).

Right. The only advanced calculus text I listed was 'Baby Rudin', and the main contribution there is just to get some of the more important properties of the real numbers, Euclidean n-space, infinite sequences and series, and Riemann integration solid.

If a CS student is to stop before these topics, okay, but if they are going to go on then these topics will be part of what is generally assumed.

For CS, as we know, it is doing what EE did -- moving beyond its core tools and into what to do with those tools. E.g., EE got into nonlinear filtering and stochastic integration. E.g., CS is now getting into both optimization and statistics. Then being handy with that Baby Rudin material, at least the early chapters, will get to be important.

With Baby Rudin already done, there can be a good course in differential equations, and such a course can be a good way to see some of the value of what did in linear algebra and Baby Rudin and to exercise that material. At some point in the future, a CS guy might well get into some work involving differential equations -- viral growth models, flight of airplanes and space vehicles, and much of mechanical engineering.

One of the main themes in the future of computer applications is handling 'randomness', and my view is that serious work in that direction should have the measure theory foundations. I did an A/B on that! Early in my career I tried the easy way. After measure theory, Neveu, Breiman, Loeve, Dynkin, Lipster, Shiryaev, etc., I concluded that the measure theory approach to probability, stochastic processes, and statistics was essential.

In particular, without measure theory, people too easily get totally stuck in the mud on what 'random' means, while with measure theory Kolmogorov has a really nice answer. My view is that people may not like Kolmogorov's answer but that, with some really simple assumptions, we get forced into that answer anyway!

For the rest, the Hilbert and Banach spaces won't go away! E.g., a huge dessert buffet of really finger lick'n good applications of the Hahn-Banach theorem is David G. Luenberger, 'Optimization by Vector Space Methods', and that book will be a nice source of methods for a lot that computing people might encounter.

> You left out graph theory and combinatorics, both of which are extremely important to CS.

For combinatorics, I assumed that one would get enough, in some ways deeper than in Knuth's TACP, from abstract algebra and elementary probability. E.g., combinatorics is a lot about counting, and so is group theory in abstract algebra.

For graph theory, I assumed that one would get enough from optimization on networks. E.g., a 'basis' in the network simplex method on a network is a minimum spanning tree! And the max flow/min cut theorem can follow just from linear programming. Dynamic programming, which I mentioned, can be viewed as graph theory.

I did mention linear programming (and so does CLRS), and one reason is that in the CS study of algorithms the algorithms for linear programming are important, and surprising, benchmarks. Also integer linear programming is one of the more important motivations for the question of P versus NP.

For some of the surprise, the simplex algorithm has low degree polynomial expected performance (K. Borgward) but exponential worst case performance (Klee and Minty), but the polynomial algorithms (Khachiyan) when they are faster than simplex have both of them too slow for practice! So, at one of the first places we looked at computational complexity for problems more challenging than, say, heap sort and AVL trees, an exponential algorithm turned out to be superior in nearly every sense of interest to a polynomial algorithm! There never was a guarantee that the study of computational complexity would be easy!

But there's a lot of overlap: I didn't mention courses in 'finite mathematics', combinatorics, or graph theory, but one way and another what I described should provide enough coverage. I didn't mention the CS book CLRS, but I mentioned good coverage of some of the more advanced topics in that book. Or, there is a lot of blending old wine and pouring it into new bottles.

There is a broad point: A major theme in CS now is to borrow, modify, and apply work done some years ago in applied math, especially from operations research, e.g., linear programming, flows on networks, stochastic point processes, and statistics. While CS has some new applications, typically the material is done more carefully in the old applied math sources. So, I emphasized learning the material as math instead of as CS. Besides, the OP was asking about math for CS and not 'mathematical CS'! Also the title asked for the "hard way"!

There's another broad point: What to learn and why to learn it? Just a first cut view of what O( n ln(n) ) means should take only a little searching on the Internet. Besides, Knuth's TACP is quite clear on such 'asymptotics'. I wish Microsoft's MSDN documentation of .NET was as clear and easy to read as TACP!

One reason not to learn this stuff is to confirm that Knuth, Sedgewick, CLRS, etc. were correct after all!

Generally the reason to learn such stuff is for new applications in the future. For that, my view was just to stay with a relatively traditional course in relatively applied math although I leaned away from classic mathematical physics and more to business applications.

btilly · on July 8, 2012

Meh. If you want to learn hard math to make it harder, and to give you background for stuff you might encounter, why not go whole hog? Go ahead, learn algebraic topology so that you can understand category theory properly, then when you encounter it in CS you'll know what people are talking about.

I think there is a point of diminishing returns.

Baby Rudin I'm dubious about. But Royden and big Rudin (both of which you recommended) I have certainty about. There are good reasons that I never saw CS students in my real analysis classes. I don't think it is particularly valuable for CS either then or now, to acquire a deep understanding of real analysis.

And yes, I know about measure theory. I know how it applies to probability. But I went the other way. I learned measure theory. Then I learned probability. Then I began having to do probability stuff in the real world. And not once has my measure theory background been particularly relevant.

As for Hilbert and Banach spaces, they are key pieces of mathematics. In fields from wavelets to optimization theory, they come up over and over again. But I would wager that most computer science professors do not need to know what Hilbert and Banach spaces are. I'd even bet that most have not heard of the Hahn-Banach theorem. Again, if you find yourself going that way, learn it later.

On combinatorics and graph theory, you claim that people will learn enough of that material elsewhere. Maybe, maybe not. But it is clear that programming problems routinely get turned into graph theory problems, many of the most important programming algorithms are about graph theory (start with the traveling salesman problem and work your way through the list of NP-complete problems), and at its heart, analyzing an algorithm's run-time is a combinatorics problem. Acquiring the necessary concepts and vocabulary for those is necessary, whether you classify the book you're learning from as a math text or a CS text.

ten_fingers · on July 9, 2012

> I think there is a point of diminishing returns.

Yes, there is a big question about what to learn, about how much to invest in such things.

> Baby Rudin I'm dubious about. But Royden and big Rudin (both of which you recommended) I have certainty about.

But Baby Rudin is a prerequisite to Royden and big Rudin.

I'm sorry, but probability, stochastic processes, and mathematical statistics were junk for me until I went at them via measure theory.

I floundered terribly with random variables until I saw the measure theory definition; it's terrific: Go take 10,000 measurements. Now have the values of 10,000 random variables. Any 10,000 measurements at all. So far, no concept of 'randomness' at all. So, random variables are very general things and, e.g., handle even deterministic processes as a special case.

E.g., sufficient statistics is just an application of the Radon-Nikodym theorem, and a total train wreck to do otherwise. Yes, order statistics are always sufficient, maybe nice to know in 'data mining'. That sample mean and sample variance are sufficient in the Gaussian case is mind blowing; nice opportunity for 'data compression'!

E.g., measure theory and the Radon-Nikodym theorem define conditional expectation, that is, under mild assumptions, E[Y|X] = f(X) for some measurable f. Then easily f(X) is the best non-linear least squares approximation of Y. Nice.

Further, if 'cross tabulate' Y on X, then have a discrete approximation to E[Y|X] which shows that cross tabulation is a discrete version of the best non-linear approximation of Y given X.

Measure theory permits working with all the forms of convergence of random variables, especially strong convergence, at least awkward to do otherwise.

Then martingale theory makes little sense without measure theory.

Measure theory, the Kolmogorov extension, shows that we really can have a collection of random variables with desired properties.

Measure theory is crucial in even defining E[Y|U(t), t <- a] since we are conditioning on an uncountably infinite collection of random variables. Measure theory, then, shows that we can replace U(t), t <= a by the sigma algebra they generate and, then, condition on the sigma algebra. Cute.

Constructions such as

E[Y|U(t), t <= a]

are crucial in the nice qualitative, axiomatic definition of the Poisson process.

Similarly for independence of two collections of random variables where each collection has uncountably infinitely many random variables.

Measure theory was crucial in the standard results of ergodic theory.

I wrote a paper on anomaly detection in server farms and networks, and the key idea in the paper was a finite group of measure preserving transformations lifted roughly from ergodic theory.

Via measure theory we can show that the space of real valued L^2 random variables is complete, and, thus, a Hilbert space, which continues to blow my mind that any such thing could be true. I'd also like to have locally compact, but that's a bit much to hope for!

The Doob decomposition shows that every stochastic process is the sum of a martingale and a predictable process, all measure theory!

It's tough enough to believe in probability with the measure theory foundations; otherwise, I couldn't swallow the stuff!

There is a broad point: Maybe OP wants to know what to learn for the applications of the future. Then what current CS profs know is not necessarily very relevant!

btilly · on July 9, 2012

OK, what it sounds like to me is that you found a CS niche that needs a ton of measure theory.

That does not tell me that this will be the future of CS. But there are niches where this stuff is more applicable than I had realized.

ten_fingers · on July 9, 2012

The future of CS will be in several, maybe many, directions. E.g., there will be new struggles keeping the cores busy when we have 1000 cores per processor. My software development is taking too darned long; one direction is to make that faster.

But one direction is handling 'randomness', and for that I recommend a measure theory foundation of probability, stochastic processes, and mathematical statistics. Just a recommendation. It's risky; your mileage may vary. As Yoda said, "Always difficult to see, the future".

Dn_Ab · on July 8, 2012

You argument is solid and it would be great if math was taught more as a connected whole but many do not have that luxury. Key though, is the author never went into much detail as to his or her intentions and motivations so not much can be said if your list is inspiring or too intimidating.

One thing I'd like to point out is that measure theory is not the only and probably the least interesting way to study probability. There is the more elegant (IMO) approach via Nonstandard Analysis. And the fun more practical approach via Game Theory and Markets (which also support "imprecise probabilities").

I also think theres room for different approaches to the same thing, each offering their own unique insight. Many differential equation modelling problems, especially those involving populations could be fruitfully replaced by agent modelling.

pmiller2 · on July 8, 2012

Do you have any book recommendations for a nonstandard analysis treatment of probability? I'm really only familiar with the measure theory approach, myself. (In fact, I've been known to say that probability is the study of measurable functions with finite, nonzero integral over the real line.)

Dn_Ab · on July 9, 2012

The standard is Nelson's Radically elementary probability.

Vovks and Shafer's game-theoretic approach is interesting in that related approaches like bandit models and online learning have recently picked up in popularity.

btilly · on July 8, 2012

Nonstandard Analysis is an interesting side note. But pulling out the axiom of choice to differentiate x^2 is a bit much IMO. (Yes, I'm aware that there are different ways to construct the nonstandard model. But the subtleties needed to really understanding NSA are substantial. I far prefer the little-o approach that Knuth recommends.)

Dn_Ab · on July 9, 2012

Perhaps you are more familiar with Robinson's derivation but Keisler has an axiomatic approach that makes calculus easier for students than regular calculus (besides by the time you are doing continuous differentiation you are already calling on some far out concepts like real numbers and inifinity).

And real understanding takes time no matter what you do, the best one can do is start off in a manner such that the tools required to understand well enough are not more complicated than the subject matter itself.

btilly · on July 9, 2012

Yes, pedagogically you can just use infinitesmals after an appropriate hand wave. Most students will ignore the complicated details and just absorb the infinitesmal picture. If you take the attitude that students just need to learn the formulas, then it doesn't matter what approach you take.

However you've created difficulties for the day that they dive back in and try to really learn the subject. Because as they go to learn what a real number really is, and what its properties are, and about pathological functions, they also have to learn about the hyper-reals and a complex model-theory construction that (in both variations that I am aware of) requires choice.

There are a variety of other pedagogical choices that do not present such barriers to comprehension. (Note, by no means am I a fan of the limit approach. I know full well that it goes over the heads of the students, and I see no point in having the person at the front blather on about stuff that the class is not able to expected to understand.)

In any case my biggest complaint about how Calculus is taught is this. I think people come out of a first Calculus course without understanding the tangent line properly. If you don't understand the tangent line, Calculus is a mass of formulas. Despite how easy it is symbolically to jump directly from the tangent line to the derivative, I think that a solid week should be spent on the tangent line (calculating it for more complicated functions, finding applications, etc) until it is every student understands it well enough that they are ready to take the leap of looking at the slope and building a function out of it.

Dn_Ab · on July 9, 2012

I do not agree. This is because the text treats infinitesimals and epsilon-delta at the same time (or after). By that time the material is familiar enough that epsilon delta is not so difficult. Kind of like learning python before assembly. The student would also get some predicate logic much earlier than normal which is a great boon. As for the reals, how many people end up studying that in depth? All I remember is being proud of doing something related to the awesomely named Dedekind Cuts.

Anyone that can follow the proofs required to construct the reals can easily do so for the hyperreals. I am not an expert in the area and it's been years since I studied at depth but I know there are methods which avoid the need for model theory. I do not recall them invoking the axiom of choice but I could be wrong on that.

But the point is not that the learner gets a Hyperreal only approach but a varied exposure. I think that is the key - much more important than understanding tangents (what happens to visualization at dimensions of 4?) I think that learning all of the disparate parts at once and letting the student have the time to become comfortable (linear maps, derivatives, surfaces, groups) is what would be best. Allowing them to drift and backtrack and then whenever they felt ready would take whatever appropriate exams to show mastery of each area. The exams would also allow for more interesting problems.

It would take longer but the end product would be a far more cohesive understanding than the mishmashed nature of the current historical siloed approach. You've done machine learning right? I think one can take something from there about learning. The brain is just a much more advanced version of those basic objects: more and varied examples is better than curated and small examples. It won't overwhelm anyone unless they're overly impatient and then, math is probably not for that personality type.

btilly · on July 9, 2012

I have been through constructions of the reals, and the proofs for the hyperreals. It has been years, but I am confident that without much difficulty I could probably give you every proof that is necessary from scratch.

The proofs for the hyperreals involve a lot more machinery than the proofs for the standard reals. That is my educated opinion based on knowing both sets of proofs and constructions. (Of course this need not make infinitesmals a pedagogical disaster - very few students actually care much about learning the proofs.)

As for the model theory approach, I am intimately familiar with the ultrafilter construction. It uses choice. I know there is a second construction which I am not familiar with, but from what I've read it also requires choice. Both involve model theory. That's a mighty big sledgehammer for a pretty small fly.

Incidentally Dedekind cuts can be understood as follows. The set of reals can be equated with the set of points where you can cut the rationals into two. More precisely if X is a nonempty subset of the rationals with an upper bound, we get a cut of the rationals into the set of upper bounds of X, and things that are not an upper bound of X. Any two subsets can be considered equivalent if their set of upper bounds is identical. An equivalence class of subsets is a real number.

For any cut you can generate a unique set A of rationals that are not upper bounds, and a set B of rationals that are upper bounds. When you do this, all rationals in A are less than all rationals in B, and A does not contain an upper bound.

Conversely if we have a partition of the rationals into 2 non-empty sets A and B such that all members of A are less than all members of B, and A does not contain an upper bound. Then we have a cut of the rationals.

So there is a 1-1 correspondence of reals to places we can cut the rationals to partitions of rationals with that property.

Those partitions of the rationals are called Dedekind cuts.

(This is one of two constructions of the real numbers. The other, Cauchy sequences, turns out to generalize more usefully in the field of topology.)

Dn_Ab · on July 10, 2012

Hey, you are right. the method I was thinking of does require model theory to justify its axioms (IST). It had been waved away as you put it, so that the core could be focused on. But you don't really need to understand why the axioms are justified any more than most people understand the axioms of ZFC (excepting those like you of course). And if the outcome is a better first intuition of calculus, I don't think it is accurate to label it a fly.

I did find out that there is a constructive approach though, so the axiom of choice is not actually necessary. www.math.ucla.edu/~asl/bsl/0403/0403-001.ps