Tip:
Highlight text to annotate it
X
Okay, now we'll talk about mappings, which is a combinatorial structure that is
related to many of the things that we studied.
And it really is an appropriate topic on which to close.
It's kind of a poster child for analytic combinatorics.
So what is a mapping? A mapping is an N-word of length N.
So that is we've got N characters and every one of the characters can be
anything from one to N. So obviously there's N to the N, different
N words of length N or mappings. That, that seems simple enough but
actually there's lots of interesting structure hidden under this.
Because the idea is that every mapping corresponds to a digraph.
So that is, you have for every position in the word you make a node.
So this 37 positions is where we have 37 nodes.
And for every node, you just draw an arrow from the node to its value.
Since there's N different possible values, there's N different possible places to
point. So, so every node has out degree 1,
there's only one place every node can point to.
That's it's position in the mapping, so 30 points to 18, 29 points at 23 and so
forth. But some nodes could be pointed to a lot.
That's the, like 33 is pointed to by 4, 14, 19, and 28.
That appears four times in the mapping. So the sequence of numbers kind of masks
this interesting combinatorial structure. And so once you see this structure then
there's all kinds of it, it breaks up into things that are kind of like trees.
Well they are trees that eventually go to a cycle.
So it's a cycle of trees. But there's independent components too.
So natural questions that come up is well, what's the probability the thing is
connected? Or if it's not connected, what's the
average number of connected components? Or how many nodes are on cycles and how
many of them are on trees? All, all these types of questions turn out
to be of great interest for important applications.
In just, but just as a mathematical curiosity it's quite a fascinating
structure to get out of a simple idea like an N word function from a set of integers
onto itself. So in order to address those kinds of
questions we'll start with something simpler called, a Cayley trees.
So a Cayley tree is a labeled, rooted unordered tree.
So labeled means all the nodes are labeled.
Rooted means that there's one distinguished root.
And unordered means we don't consider the order if there's two children of a root.
We don't consider the order of the two children to be significant.
So there's 9 different Cayley trees of 3 nodes.
So these 6 are all different because of the labels and these 3 are all different
because of the labels. 1, 2, or 3 could be at the root.
But it doesn't matter what order the sub-trees of the root are.
Those are called Cayley trees. Now rather than write out all the
possibilities that a short form to do this is to just write the unlabeled trees.
And then write all the N-words that give those same trees.
So for example, 112 so that's this first tree here 1 points to 1, 2 points to 1.
So that's 1, 1 and then 3 points to 2 and so forth.
111 and that's 1, 2 and 3 all point to 1, so that's the inward and that's the tree
structure that comes out of it for all those cases.
So that's Cayley trees. So now the answer is the number of Cayley
trees. You might have guessed that there's a
power going on, it turns out to be N to the N minus 1.
But that's not at all an elementary result.
We're going to use a technique called Lagrange Inversion eventually to get to
this result. That, that is described on the next slide.
Lagrange Inversion is a classic method for computing a functional inverse.
And it's a, a, a very important and deep theorem.
I'm not going to go into all of the ramifications.
Locations because we use it in, in not quite a, a straightforward way.
So, for example, if I have a function f of u equals z like, f of u equals u over 1
minus u Well then, the inverse of that is the function U equals G of C.
So if I take set Z equal to U minus U, and solve for U, I get U equals Z over 1 plus
Z. That's the inverse and so Lagrange
Inversion's a classical method for, for computing this.
And we'll, we'll see how it applies for us.
So this is the Lagrange inversion theorem that is in, in our situation will be a
transfer theorem to get us from a generating function to coefficients for
problems where we're counting trees. So what it says that if you have a
generative function and it says g of z, and it satisfies this equation z equals f
of g of z. So it's sort of, like it, and I want to
compute the inverse. We have g of z and we want to know what f.
So f of 0 has to be 0 and f prime of 0 has to be non-zero.
Then g of n equals 1 over n coefficient of u to the n minus 1, u over f of u to the n
in, in the function u over f of u to the n.
And again, we're just going to look at applications of, of this theorem.
So just for our example, if f of u is u over 1 minus u then g sub n so u over 1
minus u, so u over f of u to the n, so u over f of u.
The u's cancel. It's just 1 minus u to the n.
So it says that g sub n equals 1 over n coefficient of u to the n minus 1, and 1
minus u to the n. And just from the binomial theorem, that's
exactly minus 1 to the n minus 1. So that says the g is sum of minus 1 to
the nz to the n, which is z over 1 plus c, which is what would have got from algebra.
Now, you can't always get it from algebra. Is the point you have to use the Lagrange
Inversion theorem. So from in the analytic combinatorics
context we're going to apply this as a, as a transfer theorem and you'll see examples
of it. And we'll talk more about it in the
context of the complex plain in part two. Actually we use as a more general
formulation of it where you can for any function h of, of the generating function,
you can get the coefficient of z to the n in that, and it just includes an extra
factor h prime of u, h of u is the basic theorem.
So that's the Lagrange inversion theorem and that's the technique that we're going
to use to analyze mappings. So now let's just before let's look at how
it works for binary trees. So for binary trees we have the standard
construction in the OGF equation. So with Lagrange Inversion we can extract
coefficients immediately form that equation, because it has the form z equals
function. So, so, for use f of u equals u minus u
squared for Lagrange Inversion. So we want to find the coefficient of z to
the n and T of z, and f is that links that square so it's use f of u as u minus u
squared. So we want to find u over f of u is 1 over
1 minus u, so uh,coefficient z to the n and T of z is according to the theorem 1
over n coefficient u to the n minus 1, and 1 of 1 minus u to the n.
And that is easily shown to be the Catalan number.
So that's one example of the application of Lagrange inversion.
So now let's look at Cayley trees. So where we want is the class of labeled,
rooted ordered trees. So in a word that just has that one, one
root and, and it's labeled. And, and we can, sorry unordered trees.
We don't care about the orders. So with the symbolic method we just used
the construction that say a tree is a root connected to a set of trees.
The order doesn't matter so we use set. So that immediately translate to the EGF
equation. These are labeled objects, so we use EGFs.
It's c of z equals ze to the c of z. That's called the Cayley function, that's
the one that enumerates Cayley trees. So in other words z equals c of z over e
to the c of z. So that's using Lagrange inversion with f
of u equals u over e to the u, so coefficient is z to the n and c of z by
the Lagrange Inversion theorem. We look at u over f of u, which is just e
to the u, u over u over u again. So it's the coefficient of u to the N
minus 1, 1 over n times the coefficient u minus 1, e to the un.
Which is n to the n minus n factorial. So the number of Cayley tress is N to the
N minus 1, and that checks with our small values.
So that's a Lagrange Inversion, now there to enumerate Cayley trees.
Now that's just trees now we can work up to look at more complicated structures
like connected compliments and mappings. Remember, mappings are cycles of trees.
So this isn't all the mappings theses are the ones that are just the single
connected component. So these are all the possible things that
can happen, all the possible structures, connected structures that you can get.
With three words with three mappings so, so for example we saw 111 so that's, or
222 so that's when they all point to the same thing.
But you can get a structure like this, where two of them point to each other and
so forth. So these words, so one points to two, say
this is one, one points to two, two points to one, or this would be one.
One points to two, two points to one, three points to one, that corresponds to
that. So these are all the ways to label and get
bad structure and those are connected so how many different ways are there to get
cycles of Cayley trees that's what the component is and that turns out to be n to
the nth times square root of pi over 2n. And the analysis of that is it's again
straightforward using the symbolic method and the Lagrange Inversion.
So, this is a typical cycle of trees and clearly the construction we're going to
use is cycles components of cycle of trees.
And then from the symbolic method, just log of 1 over 1 minus.
And that is immediately available for a Lagrange Inversion.
Again we've got, f of u equals u over e to the u and now our function is because
that's what the Cayley function does. And then our extra function in the Berman
form is H of U is log of 1 over 1 minus U. That's going to immediately get it, so H
prime of U is 1 over 1 minus U. And then we still have the E to the UN
that's U over F of U to the N. So, our number of connected components is
1 over N coefficient minus 1 in, in that formula and so just doing the convolution
it's N to the K minus 1 over K factorial. And change N to N minus K and we, our
coefficient is N factorial times that and that's our familiar Ramanution function,
which leads to an enumeration result. So that's a number of connective
components in mapping. So the probability that components is
connected is divide that by N to the N square root of pi.
Connected components in mapping comes directly from symbolic method coupled with
Lagrange Inversion. And then mappings, how many mappings are
there? So that's going to cover all possibilities
and again what's a mapping? A mapping is a set of cycles of Caley
Trees. So it's going to be E to the log of 1 over
1 minus Cz which is just 1 over 1 minus Cz.
And that one we can get with the extended Lagrange inversion with our h function
equals 1 over 1 minus If you do the math, h prime of u is 1 over 1 minus u squared.
So it's coefficient of uvn minus 1 over n and, and 1 over 1 minus u squared, you
could do UN. You do that convolution in this time, the
sums collapse. Just works out n minus k from 1 minus u
squared, [inaudible] factorial [inaudible] and then we just get two sums and they,
they all cancel except for the last term, n to the n over n factorial.
So, so the numbers of mappings is n to the n.
As expected from the trivial argument. So sure there's a trivial argument that
tells us the number of mappings but analysis gives the entire structure that
we can use for, for analyzing all sorts of properties and we'll go into the details
later on. Just for example an interesting property
of these functions is something called a rho length.
So the Rho Length of a function, so if we start at a given point and just apply the
function. Say you have a function like x squared
plus 1 mod 99. So 3 goes to 10, 10 100 plus 1 101 99's,
is 2. And that square that and go to 1 that's 5,
and so forth. You eventually get to a point where you
hit a cycle. So that's called the Rho Length, the
number of times that you iterate the function until it finally repeats.
Now so in the mapping wherever you start, you're always going to wind up in a cycle,
so there's a roll length associated with each point in the mapping.
And this is a mapping where we have a formula to tell us what it is, but random
mappings tell us the same thing. Well, one thing to think about is, what
about an algorithm to compute the row length?
It's kind of a fun problem. You could say, well I'll just use the
symbol table. I'll keep track of all the values I've
computed. And then that way, I'll know when I get a
repeated value. But in practice you can't do that because
we talked about in real applications we're doing this for huge numbers like 100
digits number. And it's no way you can have enough
memories to keep all the possible values. So what do you do?
Well there's a famous method due to Floyd. Which is the tortoise and hare algorithm
where you keep two variables. One that iterates the function just once
and the other one iterates it Twice, So, so, for example, so a and b start at the
same time, place at time t equals zero. And we iterate a by 1 and b by 2.
So, then iterate a by 1 and b by 2. And so you know, B's going to go around
the cycle and eventually they catch up. And it's not difficult to see that
eventually they're always going to catch up and when they do, actually, the roll
length of starting at that point is between T and 2T if T's is the number of
times that you had to go to get [unknown] to match.
So that's an interesting in that doesn't use any extra memory, it's just comparing
those two variables, so that's computing the rho length.
So, with mappings, again starting at every point you've got a rho length and that's a
parameter that's interesting to study. In this other like tail length.
That's how long until you hit the cycle. And again, number of components, number of
trees and so forth. Now using the same method of construction.
And using bi-variate exponential generating functions it turns out that
just, just give an example. So like number of components is a mapping
is a set of cycles of trees but we mark the cycles and that will tell us the, the
number of components. There's one for each cycle.
So immediately, we get the EGF equation 1 over 1 minus Cz to the u.
And then we can work with that equation to answer questions like, what's the average
number of components? Or if you do a number of trees then it's a
set of cycles of trees, and they just mark the trees, and you get this other
function. So we'll talk about this analysis in more
detail in part 2. But for now I just want to motivate the
study of mappings and it works out to be actually without that much work, that we
can get all these properties of mapping. So for example, the expected Rho Length in
a random mapping is about the square root of pi N over two.
You know, so why is all this relevant? Well here's a, an actual, application
where, it matters. This is a famous method for factoring an
integer a huge integer. And so what is called Pollard's method,
and what it does is it iterates a random quadratic function to find a cycle.
It's pretty much like the function that I just talked about we take f of x equals x
squared plus c. Until finding a cycle and we do it like
with Floyd's, Floyd's algorithm where we take two variables and iterate it once for
one of the variables, and iterate it twice for the other.
And now we use a random value of c, and a random starting point.
And, and keep going until this particular condition is satisfied having to do with
the GCD and when it's done you get a factor out.
It's, it's quite an amazing algorithm. So this is just an example that is doing
the same thing as Floyd's algorithm does. And when the GCD of these things is not
one then you've found a factor. So you know, why does it work?
Well, it's actually not too difficult to see how it works if you know number
theory. Although it's definitely magic if you
don't. That's not really the point of why I want
to bring it up now. The question is, how many iterations does
Pollard's algorithm take? And, well one things that seems reasonable
to do is to assume that this function is not a random mapping.
But since you're taking a random C and a random starting point.
Maybe it's got the same kind of characteristics as a random mapping.
And it's conjectured, actually, that the, random quadtratic function of this type is
asymptotically equivalent to a random mapping.
And in a random mapping, the rho length is square root of pi N over two.
So you can factor, a number you can factor a number N.
And square root of N cycles, which is a lot faster than trying every factor.
And Pollard actually used this method to factor his integers a while ago.
And it's typical of the kinds of thing that people are trying to do in
cryptography now a days is study properties of these kinds of functions.
Analytic combinatorics and properties of mappings plays a role in this kind of
research. So I wanted to end with a real application
like that, and, and with mappings they are quite fascinating combinatorial
structures.