______________________________________________________________________

                          DRAFT TRANSCRIPT

                            Routing SIG 
               
                    Thursday 2 September 2004 
                              4.00pm
______________________________________________________________________


PHILIP SMITH:

Good afternoon, everyone. I think we should make a start to
this just to try to keep time for once, I guess. Welcome to
the routing SIG. It's the final event of this part - the SIG
part - of the conference. Just some administration before we
get started with the presentations. Just a reminder, Randy and
I chair this special interest group. You can reach us at this
address:

sig-routing-chair@apnic.net. The mailing list is
sig-routing@lists.apnic.net. We don't have any matters arising
from the previous routing SIG meeting, so we can pretty much
get straight into the presentations. We've got three presentations
today, plus I suppose Geoff Huston's 'BGP - The Movie' to
finish off the SIG at the end of the afternoon. Gert Doering
is not here, obviously, so I'll be doing his presentation,
hopefully, on his behalf. Otherwise we will make a start with
Tim, who will be talking about BGP wedgies.


TIMOTHY GRIFFIN:

OK, thanks. This talk is really an informational talk about a
class of BGP anomalies and it's more of just a description,
so that particular operators might recognise the situation if
it ever occurs in their network. But in general, it's just -
you know, BGP is this beast that has evolved and we occasionally
have to deal with some of the anomalies that are caused by its
lack of design. This talk describes one class of anomalies.
So the class of anomalies I've given the term BGP wedgie - it
describes bad policy interactions that cannot be debugged.
That's a little bit strong, because as we'll see, if operators
talk to one another they may be able to straighten out the
situation. I have a formal definition here of what a wedgie
is. First of all, when you look at the BGP policies, they make
sense locally. I mean, if you look within any particular
autonomous system, talk to the operators, they will explain
their policies, they make sense. The interaction of the policies
globally, however, may allow for multiple stable routings. Now
it's interesting that quite a few BGP gurus are not aware of
the fact that you can actually get very distinct stable routings
with the same BGP policies. That is, BGP doesn't have - in
general, you don't have a unique routing associate with a set
of BGP policies. In practice we often do, but the protocol
certainly doesn't guarantee it. So you either get one unique
routing or you get multiple routings and which routing is
chosen is sort of non-deterministically determined. You can
also have zero solutions or zero routings and that's when we
have protocol divergence I'm sure some of you have heard of
the MED isolation problem. That arises when there is no stable
routing that satisfies the routing policies, so the protocol
just exchanges messages forever. So we have policies that make
sense locally. They interact globally to give multiple solutions.
Then, the important thing is that some of those solutions may
be consistent with the intended policies and some are not. The
problem is when a routing is installed, a stable routing, and
it's not intended. Well, the only way to kick the system back
is through some sort of manual intervention. OK? And just these
three conditions I'll call a 3/4 wedgie. This is not quite a
full wedgie yet. What really makes a full wedgie is when the
policies are distributed across multiple domains and no one
group of network operators has enough information to debug the
system when it falls into an unintended solution. So that's a
crucial thing. Now exactly what that means, that's pretty
vague, I'll admit, but you get the idea. Let me give you an
example. This example was abstracted away from a much more
complicated situation that arose at a large service provider
in North America that I used to work for. So I've simplified
it here quite dramatically. When you think about this example,
you've got to think about a much more complex situation where
people are trying to figure out what is going on and trying
to sort out is it this, is it that, is it the other thing. So
I've done all that simplification for you. You don't have to
worry about that. Here we have four autonomous systems and
this customer-provider relationship that I've indicated there.
So, for example, autonomous system three here is the provider
of autonomous system 2 which is the customer of AS 3. These
two top levels ASs have a peering relationship. So we have a
customer that wants to implement a backup link to autonomous
system 2 and a primary link to autonomous system 4. What it's
going to do is implement that by sending a depref me community
up to autonomous system 2. These are fairly common now.
Autonomous systems publish their communities to their customers
usually and customers can do things like control the scope of
their routing announcements or the preferences of their routing
by sending the appropriate community value. Many sober network
engineers resist the temptation to implement these, but the
marketing people insist, so they implement them. So the customer
here in AS 1 is going to use that to implement the backup
community and AS 2 is going to implement that in a route map.
Essentially you get a route with that community and it's going
to assign a local preference that's below that of its upstream
providers' routes. So here is the intended routing in that
scenario. That is, the AS 2-AS 1 link is intended only for
backup purposes if the AS 4-AS 1 link goes down. The intended
routing is that traffic should flow from AS 2 through AS 3 and
over to 4 and down to 1. Does that make sense? OK. But there
is another solution to that set of routing policies and I'll
call this one the unintended solution or the unintended routing,
I'm sorry. That is, AS 3 says, "Hey, I love customer routes.
I love them more than peer routes. And if AS 2 gives me a route
and I have a choice between AS 2's route and AS 4's route,
well, I'll take AS 2's route." Once AS 3 has that route in its
hands it's never going to tell AS 2 about AS 4's route, so AS
2 has a route that has a depref me community on it, but it's
the best route it has, so it's stuck with it. So if you look
at the intended solution there, the reason that's not happening
is because AS 3 only hears about one route to AS 1. It hears
about the one through its peer. It doesn't hear about the route
through its customer. So which one of these gets installed?
Which solution, which routing, gets installed really depends
on the non-deterministic exchange of routing messages or various
kinds of things. If you install the intended route - the
intended routing, you can very easily kick it over to the
unintended routing just by bouncing the BGP session between
AS 1 and AS 4. So if it just goes down temporarily or goes
down as a failure, when it comes back up you'll find yourself
in the unintended solution, the unintended routing and you
will stay there until you manually fix the problem. So the
unintended routing is reachable just from this failure mode.
The other thing here I want to say is the intended routing
would be the unique solution or the unique routing if AS 2
translated its depref me communities into depref me communities
for its provider, AS 3. Then AS 3 in that case - if we look
at the unintended routing, that wouldn't be a stable routing
because AS 3 would say, "Hey, I have this backup route and I
have a peer route. I prefer the peer route." But I know of no
service provider that does that kind of translation. You can
just imagine the mess. For example, AS 2 perhaps has 13 upstream
providers, all with different sets of communities. Well, maybe
some of them don't even support communities in this way. Such
as Deutsche Telecom. So this is a real problem. It does happen
in practice and one of the difficulties I think I want to get
across here is that people just don't understand what's
happening. When I saw this happen at AT & T - whoops! - it was
basically, "Well, it's a bug in Cisco, it has to be a bug in
Cisco." Right? But it's not. It's a bug, if you will, in the
protocol itself. And I think as people start using more and
more complex routing policies with more and more interdomain
signalling using communities and more and more fancy routing,
I think we're going to see more and more of this. So how do
we fix it manually? Well, suppose we're sitting here in the
unintended solution. I can bring the session down or filter
out the prefixes involved, whatever, on that session and then
bring the session back up and I kick the system back into the
intended solution. Now you've got to have network operators
that understand what's going on here. And usually if they do
realise they have to reset the session again, they're most
likely going to think, "Well, I just cleared a bug. There must
be some bug, that I reset the session." But it's important to
realise this is just inherent in the beast. So it requires
manual intervention, which is not good for a dynamic routing
protocol. It can be done in AS 1 or AS 2. That's why I'm calling
it a 3/4 wedgie. It's not quite a full wedgie because one group
of autonomous system administrators could figure this out.
Question back there.


RAM NARULA:

I don't understand - in the chart on the previous page on the
left side, if the backup fails it would go in the unintended?
If the primary link fails?


TIMOTHY GRIFFIN:

If the primary link fails - well, let's go back to the intended
solution back here. If the primary link fails, then we'll have
traffic here. When it's brought back up we'll end up in this
situation. So the customer will call up the help desk at AS 2
and say, "Hey, I fixed my primary but you're still sending me
traffic. Why are you doing this? Stop it."


RAM NARULA:

Is it due to the cache?


TIMOTHY GRIFFIN:

No. It's due to the fact that routing policies are very
expressive. We're not constrained to shortest path routing.
There is no guarantee that there is a stable routing. There's
no guarantee it will be unique if there is one.


RAM NARULA:

If there is a prepend, won't it come back up?


TIMOTHY GRIFFIN:

That works differently. In this case, it won't work at all,
because AS 2 is likely to prefer customer routes, no matter
how long the prepend is. That's precisely why they're going
to use communities in this case to implement the backup route.


RAM NARULA:

Thank you.


TIMOTHY GRIFFIN:

It's not good in a dynamic routing protocol to have to actually
intervene. The situation could get a lot worse. I'm going to
make this example symmetric in a case where instead of backup
to primary I'm going to do load balancing, so AS 1 here has
prefix P1 and P2 and it's going to use this link as a primary
for prefix P2 but a backup for prefix P1. Then it's going to
do exactly the opposite on that link. It's going to implement
this with communities AS 2 up to AS 5. You see the problem.
The problem is any kind of kicking the system back for P1 is
going to cause you to get wedged for P2 and vice versa. So
remember in that example if I just bring the primary down and
bring it back up I'm going to get stuck in an unintended
solution. This is one of those things where I actually have
to bring down both sessions simultaneously, and then bring
them back up, or filter out both prefixes from those sessions
and then bring them back up. I've talked to a lot of network
operators. I don't know many who can handle this situation.
Imagine in particular that the left-hand link is in New York
and the right-hand link is in Tokyo and you've got to coordinate
this between different groups within your network - within
your engineering organisation. Randy, question?

RANDY BUSH:

It gets even worse. Imagine the engineers actually found out
what was going on and calling the network operations centre
and telling them they have to bring down a working link.


GEOFF HUSTON:

It doesn't happen!


RANDY BUSH:

Yeah, we need a laugh this afternoon.


TIMOTHY GRIFFIN:

The case where I actually saw this happen first was exactly
in a load balancing case where there were many other prefixes
coming from the customer, not just the ones that were causing
problems. Let's go on to a full wedgie example. What I've done
here is just cooked up an example where I've tried to make a
little delta change to that example we've already seen so it
doesn't get too complicated. I've just added this autonomous
system that has a peering relationship with AS 2. It also is
a customer of AS 3. And I have now two backup links and one
primary. So it's not that much different than the example that
I just showed you. I have just added a little bit of complexity.
Again AS 1 sends up depref me communities to AS 2 and AS 5.
AS 2 implements it in the way that I said before, except now
we have to make sure that it prefers that route less than any
route from peer 5. Now here's where things get a bit funny.
So AS 5 implements the depref me community. It says, "Well,
I'll put your preference between that of a peer and a provider."
That is, when I depref you using this community, I'll prefer
peer routes but not provider routes. So it's just a matter of
where you are in the pecking order with this depref me. You
see a lot of this going on. There is no consistent implementation
of these communities between providers. So that's the way
that's implemented. Now, here is the intended routing. Again
it's just like we had before except this guy's in there going
up to the customer. So, boom, unintended routing - again just
like before. This guy's coming down through its customer. This
guy just is hopping over to a peer. But it looks a lot like
it looked before. Now the problem is - what's different about
this example is the recovery is not going to be simple. So
suppose that I'm in the unintended solution here and I take
that link down between AS 2 and AS 1. Well, I'll bounce over
to this other routing where suddenly I have traffic coming in
from AS 5. So remember on the left-hand side I'm a customer,
I say to my provider, AS 2, "Hey, you're sending me traffic.
My primary came back up. Will you stop it." So we decided to
reset the session. Guess what? We bounce over - the other
provider is suddenly sending me traffic from upstream. Why are
they doing that? And so you bring the link back up. So if you
bring the AS 2-AS 1 link up and down, you'll bounce back and
forth between these two unintended routings. What you have to
do here is you have to reset the AS 2-AS 1 and AS 5-AS 1
sessions simultaneously. So in other words you've got to get
cooperation from two service providers and one of the service
providers - AS 5 in this case - is going to say, "That route
isn't even my best route. You're asking me to reset that
session. And I'm not even learning a best route on that session.
How could that possibly be. "Go away. Don't bother me. I have
other fires to attend to. Stop annoying me." Suppose you
convince them to do the right thing, you can bring both sessions
down and then bring them back up and you're back to your
intended solution. Now this example I've given using communities
and customer provider relationships, but I don't mean to imply
that that's the only place they could occur. So I've just sort
of invented an ISP here or it could be a corporate Internet
for that matter. By the way, well-kept secret, many large
corporations use BGP as an IGP, so this problem may appear
there as well. Actually it's more likely to appear in a situation
like that, because you're less constrained by things like
customer-provider relationships. Suppose here this is my
corporation split into five autonomous systems. So I'm just
going to say here they are again. It's the same example.
Although perhaps now I'm implementing those policies using -
maybe I'm using MEDS, maybe I'm using communities, maybe I'm
using other sorts of hard-wired preference values. The point
here is that same kind of problem can arise in sort of traffic
engineering within one happy family of autonomous systems. So
what do we do about this? Well, I think the first thing to do
is be aware that there is a problem and it can be difficult
to debug and it has the characteristics that I outlined. It's
a bit like the MED isolation problem. It's nice to know it can
exist so when you're seeing strange things in your network you
can say, "Hey, perhaps this is a MED isolation problem." It
could be something else but at least you have a list of things
you can check. Here it's a little bit different, because it
involves just the very definition of it, it involves knowing
things that your neighbours know that you don't know. So it
does imply that you have to talk to your neighbours a little
more than perhaps is currently happening. The other thing is
that interdomain communities need to be thought out very
carefully. So, for example, when people are - I actually think
it would be a good idea to standardise some of these depreffing
communities so we could actually talk about consistently
implementing them across autonomous systems. One of the big
problems here is that for these very commonly used communities
depref me in some way there is no common way that they're
implemented. The differences between implementation can cause
these problems. Finally, there may be some way for a small set
of configurations - let's say configuration files within a
small handful of autonomous systems, half a dozen, a dozen,
there may be techniques to actually automatically discover the
possibility or potential BGP wedgies there and that's something
I've been thinking about and just started working on. In theory,
it's very difficult, because you essentially have to enumerate
all possible paths in the system, but for smallish networks
that's actually do-able in a reasonable amount of time. So
perhaps there's some help from tools here. The other thing I'd
be interested to hear from operators that may have encountered
problems like this, but maybe didn't have a framework to put
that problem in. So I'd be happy to hear about comments.
Questions?


RANDY BUSH:

Geoff, are you considering a task force draft on a single set
of communities for pref so translation upwards is unnecessary?


GEOFF HUSTON:

You're asking me this in the context of the GROW working group?


RANDY BUSH:

The Internet vendor task force.


GEOFF HUSTON:

There is a draft attempting to standardise some number of
communities in BGP, but the use of associated values with those
communities is not being standardised so that you still get
the issue of, well, rather than saying depref me in 10 different
ways, you can say depref me one way, but trying to standardise
the actual preference values is actually not on the agenda
right now, Randy, no, and there are no drafts around in that
area. So nothing is being done at the level you were suggesting,
Tim, that it would need to be standardised at that level.


TIMOTHY GRIFFIN:

Perhaps what's needed is not so much the values themselves but
the relative values of different classes of routes.


RANDY BUSH:

No, because then I wouldn't have to translate.


TIMOTHY GRIFFIN:

I mean to say that these different - I know at AT & T you can
depref yourself below a peer or below a provider and what are
those classes and how are they - you don't have to actually
say 100 or 90 or 70 for a preference. You can just say relative
to this other class I'm lower.

GEOFF HUSTON:

And that is not in the standardisation agenda right now but
certainly could be. There's no work at that level yet.


TIMOTHY GRIFFIN:

Another question.


SHARIL TARMIZI:

It probably is a dumb question - I'm probably the only non-techie
in this room. But have you known of any incidents where people
have used these kinds of situations as a competitive or
anti-competitive measure because you do not want them to route
the traffic in a particular way because you do not want your
competitor to get more?


TIMOTHY GRIFFIN:

I don't, but I know of providers that provide these communities
because they're forced to because their competitors will provide
them. And that's the only competitive situation I can think
of here.


GEOFF HUSTON:

This is one of these instances of precisely what are you trying
to achieve and how. BGP as was originally set up had this
really coarse matrix around AS path links and the system
basically preferred the most specific prefix with the shortest
AS path link and that was the end op. Then folks said not good
enough. I want to bias my incoming traffic based around things
other than raw AS connectivity. So we got into AS path prepending
wars and AS poisoning wars where you were trying to manipulate
the path with traffic engineering outcomes. Then along came
the use of communities to try and increase the language in BGP
to allow you to do a whole bunch of things beyond simply the
AS path. Now you've got this highly expressive language but
no standardised way of talking it and then you push this out
to the community. So where we got into with the depref me stuff
was normally ISPs want to minimise their spend so you normally
prefer customers over peers over transits. But to attract
customer money - in other words, maximise your income - you
generally have to give them the greatest amount of flexibility
you can as a provider, so all of a sudden you expose your
internal widgets to those customers saying if you will direct
me to do anything you want this is my advantage as an upstream.
All of a sudden now you're getting this rich expressive language
being used in a transitive way. You're getting highly complex
community mappings coming through because the communities
exposed to you aren't the communities my upstreams may use -
and then you're saying strange things may happen.


TIMOTHY GRIFFIN:

That's right.


GEOFF HUSTON:

Cool.


PHILIP SMITH:

Thank you, Tim. Next up we have Randy who will be talking about
happy packets and other things.


RANDY BUSH:

As you can see, Tim is a co-conspirator on this and I can't
keep my dates straight. So much for that. There's a central
question which we're asking, which is what is the relationship
between the control plane - that is, routing - instability and
the data plane - that is, forwarding of payload packets. And
the related question is is the quantity of BGP updates good
or bad. Who wants to see zero BGP updates? That's static
routing. We know that's not going to work. Because we frequently
hear comments and read in the press and so on and so forth
that the Internet routing is fragile, the Internet routing is
collapsing, it's going to hell, BGP is broken or is not working
well. On Wednesday there was a bad routing day on the Internet.
Just look at my graph. Or if we change the routing protocol
it will improve routing. And we often measure routing dynamics
in some fashion - like the number of updates - and say that
some measurement is better or worse than another. And we are
also told that a lot of BGP updates is bad as in instability.
And there are too many BGP updates or BGP must be broken. In
fact, I would suggest that BGP announcements are like white
blood cells. Their presence signals a problem. When you have
a high white blood cell count it's telling you the body is
producing these things to fight an infection. But the white
blood cells are not the infection. They're part of the cure,
not the problem. So once your routing announcements tell you
something's happened, they're just trying to help the peaks
get around it. This is a log base 10 scale. We see this and
we get told this is a major problem - the Internet had a very
bad day - because they saw a lot of prefix announcements. In
fact, these people are measuring big events. I'm going for
some detail single announcement events. This isn't too well
connected with this graph. But we're seeing things like that.
I would stay instead let's talk about routing quality and what
is good routing. How can we say this measurement shows routing
is better in A than B unless we have a metric. And it's not
the number of prefix or speed of convergence et cetera that
are the measure of routing quality. We contend that the measure
of routing quality is how well it controls the network so that
the users' packets - the payload - reaches the destination.
I'm an operator. Did I deliver the packets to the customer or
not? So if the users' packets are happy, the routing system
and the other components in the network are doing their job.
So we call these happy packets. We have well-known metrics for
happy packets - delay, drop, jitter and reordering. We have
easy ways to measure so we set out to measure how the control
plane is correlated to events on the data plane. When I say
that we don't care if there are a lot of BGP announcements if
the data gets there, I would like to qualify that a little,
by the fact that if there were so many announcements that the
routers were getting loaded by the control plane, then there's
a problem. But as long as the BGP chatter stays below Moore's
law as the Internet scales - in other words if the BGP chitchatter
does not increase more than double every 1.5 years, then the
hardware is going to keep up with it. So we decided to conduct
an experiment. And we have a BGP beacon - I'll tell you what
it is in a minute - and we stream packets to it from all the
Internet and we record the packets while BGP changes. Let me
give you some details. A BGP beacon is a prefix that is announced
and withdrawn at a known time. Here's a BGP beacon announcing
this prefix to the global Internet. It's a dual-homed beacon.
It doesn't announce it for two hours, it announces it for two
hours, it withdraws, it announces, it withdraws. But because
it's dual-homed, it's got two providers - provider one and
provider two. Here is the announcement to provider 1. At 2:00
in the morning it changes to both. At 4:00 in the morning it
goes back to 1, et cetera, et cetera. Think of it as I'm
simulating a dual-homed enterprise losing one of its links and
the link comes back up or the other link goes down, et cetera.
So we're simulating real events on the Internet. And we stream
data from something called PlanetLab. It's about 370 nodes and
you can run experiments on all these nodes. We stream the data
from all those PlanetLab nodes towards the BGP beacon. As the
beacon changes its announcements and we capture those data
streams and as it changes its announcements we look to see
what happens to the payload data, the actual customer data.
Here we have the announcements at time 0. This is delay on
this axis and drop and reordering on this axis. So we have no
drop, no need for reordering and no change in delay. Well,
when we went from dual-homed and we lost ispA* - well, in fact
that's because this provider always preferred - this customer
always preferred ispB. Since we didn't drop *ispB the packets
are travelling the same path. This will be half all the measured
cases on the average. Here we have the same thing - we're going
from dual-homed, we drop B. So they get dropped, the delay
goes in here and I think even some scatter reorders happen.
This is for about 30 seconds, I think. We can expand it and
see what happens - no, it's about 45 seconds - oh, but it
starts 15 seconds after the event. And we see this gap where
the packets actually don't get there. So the customer loses a
link and for 30 seconds packets don't get there. We also see
some interesting cases where they're in the gap there are these
islands where things are working. In fact if you look here
they're working most of the time, though there is some drop
and some jitter. But packets are getting there a lot of the
time. Why there are these islands of instability we don't know
yet. We suspect intermediate autonomous systems are converging.
How's that for smoke! Here's a bunch of them. Just looking at
the delay for a bunch of different sources. The delay is pretty
noisy in the first place. The event happens - you know, so
this guy gets a lot worse on the new patch which is over to
it in a few seconds. This is a close-up of that - not very
interesting. Here's another anomalous situation. Things were
slowly getting better all the time. So probably the source
node was sick in some fashion and carrying congestion and the
congestion may have been going down. But notice that, you know,
this guy gets very noisy after. So his old path was good. His
new path is worse. Then we have endless of these graphs. And
we decide what we want to do is see if there is a relationship
between the amount of loss of data packets and the amount of
BGP announcements. In other words, is BGP noise correlated
with the data loss? So this is the CDF of the number of updates
and the duration of packet loss during a transition from both
days - in other words these. It doesn't look very exciting.
isp*As lost, about the same story. Here it is going from just
ispB to both - in other words A is recovering. So on and so
forth. So we're sort of plotting duration of BGP announcements
- how long the noise went on and packet loss. If there was a
correlation, we'd expect to see a line somewhere like this.
We don't. Same thing for the BGP updates - this was on a
transition to A. Here's one on a transition to B. Same
uninteresting news. Here's one with BGP duration again and
aggregated. It's all beacon events. Again, no patterns on the
diagonal, which is what we were looking for. So it's very hard
to say that what we see is a correlation between the number
or duration of BGP announcements and customer data loss. There
was that anomalous guy whose stuff bounced. We went after it.
It's some sort of radial link that he's transferring to. There
he is a little closer. Here's the nonsense. Just anomalies
we've seen. There's another way of looking at it. The sites
prefer B so we drop B. We go from AB to A. There's Germany -
during the routing, change packet loss gets worse. Russia -
packet loss gets much worse. Toronto. Australia - it's better
when they lose their provider. They're upside down there anyway
- who knows. MIT, Berkeley, Poland, Texas. I don't know what
you do down there, Geoff. OK, same story for transferring the
other way - in other words, they prefer B, so we recover B.
Prefer B - go to B only. So if they prefer B and you go to B
why are these guys getting a something in that routing change.
It goes on like this. Here is a correlation between loss rate
and AS hop count. It doesn't look really exciting. The next
one I think does, yes. Router hop count and duration of the
BGP announcements. Now this figures the more routers in the
chain the longer it's going to take to converge. OK, that one
is as little as we expect. So this seems to say that distant
sites experience more loss. There is a correlation between a
site's routing preference and the type of transition. In other
words, if they prefer A they'll have higher loss when they go
from AB to B. If they lose their preferred provider they're
in trouble. The correlation between loss rate and AS or router
hop count is very weak. At some sites the loss rate during
normal periods is higher than during routing change. Some
references, et cetera. I just note sponsors, University of
Oregon, National Science Foundation, and so on. Now I'm going
to subject you to just one more presentation that I'm sneaking
in on you that you didn't expect. I made a presentation I think
at the last one - again by the same nefarious crew. This is a
year ago I did it - so it's two of them in Seoul. I told you
about BGP beacons and we have one large international ISP -
we have all their announcements for a single-homed BGP beacon.
And we discovered that when we made an announcement things
looked simple, even if it was multi-homed. But two hops away
in Chicago we saw route oscillation due to MEDS so for one
announcement we saw four and in some circumstances in Chicago
for a single announcement going from null to putting up ispA
and ispB we saw 41 events. So we started working with that isp
and looking at that data and here's the different colours and
their different routers - and look at all that noise. 25 things
for one event, et cetera, et cetera. This is the event of going
from - the beacon just turns up and says announced by B. So
they made a simple change and got improvement. The change they
made was just - if you remember the talk from a year ago we
believed the problem was in part due to multiple vendors and
their different buffering over time before they'll announce a
route. So they pushed a little delay into one of the vendors
and greatly reduced the noise. And no serious delay. They
barely touched the router and got this. So we can help you.
Questions? Answers?


RAM NARULA:

Are these issues with IPv6?


RANDY BUSH:

No idea.


GEORGE MICHAELSON:

I think it's a reality check statement to get back to the data
plane and talk about the good book. It is astonishingly obvious
- it's that a simple truth will always be the case. But there
are also interesting analogies in the biological sciences.
People are very fond of saying Darwinism, evolution, is maximally
efficient. But that's completely untrue. There are lots of
inefficiencies in natural evolution. It only has to be good
enough, it doesn't have to be perfect. Of course it's better
to have good routing, but good enough is good. I think it's a
good context to remind people of.


RANDY BUSH:

I don't have to run faster than the lion. I only have to run
faster than my friend.


PHILIP SMITH:

Thank you very much to Randy. Now it is my turn to pretend to
be Gert Doering.

(Thanks hosts and sponsors)

Just remind you about the onsite noticeboard, if there's any
up-to-date information it will all be available on there. The
opening plenary is available in the archive already and on the
APNIC18 meeting web site. MyAPNIC demo is on all day at the
help desk area just outside this room.


RANDY BUSH:

And it's really good.


PHILIP SMITH:

And it is really good! The help desk is also available, come
and have a chat with the APNIC host masters when you're
available. If you haven't had a look at the MyAPNIC demo please
go and have a look at it. Take your opportunity to go and speak
with APNIC host masters, indeed any of the APNIC staff. It's
quite rare to see them out in this part of the world so take
that opportunity that you've been offered. Gert Doering couldn't
make it from Germany. He did want to come, but supposing
escaping the German autumn to come down to what we call it
here - partial summer. He's done IPv6 writing table report for
probably the last three, four years, mostly at the routing
working group or maybe the IPv6 working group at the RIPE
meetings. Don't remember which one but I've seen most of his
presentations. I'll give this on his behalf. Please direct
questions towards him. His email address will be up at the end
of the slides. General overview, have a look at some of the
numbers, some pictures, graphs, trends, things that should be
there, some conclusions and some recommendations. The presentation
is on the APNIC web site. He's been updating it every day over
the last few days. I've given the latest one to APNIC so
hopefully it's actually up there, failing that you can get it
from that URL. As of the 30th August, in other words three
days ago there were 457 AS numbers visible. If you compare
that to 421 a few months back, so steady growth of AS numbers
is happening. Of that 2812 are origin only. 164 are origin
plus provide transit. And 12 transit-only ASes. As a side note,
those of you who see my routing report will see that I quote
some of the statistics, so Gert has been trying to copy the
same format so you can compare between v4 and v6. v6 has a bit
to go. There is a mixture of RIR and 6Bone space being announced.
300 ASes will originate 1 RIR prefix. 44 just announced 6Bone
prefixes, 46 announced both from the 6Bone and RIR. 27 ASes
originate 2 RIR prefixes, 10 will announce /32s and 35s. Those
of you working in v6 space will remember that the original
initial application was a /35 and those who had that got a
free upgrade to 32. Some are still in transition between the
two sizes. 28 ASes are announcing more than that. The largest
one is an announcement of 56 prefixes which is quite amazing.
14 ASes still announce the prefix has a 32 and a 356789 all
these paths are observed from AS 5539. Which is I believe the
space of the system in Germany. So why are people announcing
two prefixes? The first one is a 6Bone to RIR migration, some
people started off in the experimental IPv6 Internet so they
got this 3 ffe address space. When I suppose real IPv6 address
space appeared these people migrated or starting to migrate
over to the 2001/16 address block that's used for that. So the
example he has there is Cisco, we announced both our address
blocks and are transitioning out of the 6Bone space. The next
one - the migration from /35 to /32, some people are announcing
the 35 they originally received from the regional Internet
registries and they have started announcing the 32 as well,
the free upgrade so to speak. Then we have experiments and or
leaks, possibly. This example originated by AS 17382, the
sending out the /32 but they're also announcing two sub-nets
of this, two /48s. The next one - the multi-uplink or multihoming
experiments. There you see three /48s appearing. The final one
- mergers and acquisitions - or something. Or even different
business units of the same company, AS 3303 is announcing two
/32 address blocks. So if we look at the number of prefixes
received, /16s we see one and that is the 6to4 address block.
Which I guess we would expect. There is a single /20 which is
coming out of the RIR address space. The 38 /24s coming from
the 6Bone address space. We see a single /27. 38 /38s from the
6Bone address space. The 6Bone started off handing out /24s
for the experimental network then down sized that to /28 which
is why we see the two address space sizes from 6Bone blocks.
As for /32 we see 363 from RIR space, so those are the allocation
that the registries are making. We also see 31 out of the 6Bone
address block. /33s we see two. /35s we see 40. 36s to 39
nothing at all, but we see three /40s. Three - address blocks
for 41 to 45 and 101 /48s. /48s is the I suppose the minimum
address space that's assigned to an end site. Then smaller
than that, 52 to 60s, nothing at all. Single /64, then nothing
from 65 to 128. The 64 prefix is basically an anycast prefix
so we expect to see multi-origin. ASes. Many autonomous systems,
service providers and others will provide this facility. If
you look at the list there, Gert sees about ten or twelve 6to4
prefix blocks being announced - from ten or 12 autonomous
systems. There has been some research done on non-publicly
visible 6to4 relays. David Malone reckons there must be around
34. There could well be there. There is some more specific
prefixes from the 6to4 block being announced, that's predicted
by the RFC 3056. If we look at the routing table growth. Over
the last 36 months you see a steady increase like that. One
or two jumps, obvious jumps here and that there which I'll
talk about in a minute as well a big hole here, he doesn't
know what's caused that, probably a breakage in the feed. So
it's a steady growth. Comparing the RIR address block to the
6Bone prefixes over the last 36 months. The 6Bone is the pink
or red graph there. It was incrementing slowly. It reached a
peak about this point and has been decrementing there. We see
this steady decline there. Where as the RIR space is showing
quite a steady increase. Over the last four months while some
of the notable events, we had a leakage here from AS 17832
that started announcing 19/48s, for good measure a couple of
months later they announced another 30 just to make the table
bigger. I would guess somebody's been looking at the slides
because yesterday they pulled it out again. So the 50 /48s
disappeared from the IPv6 routing table. Apart from that there's
nothing really spectacular happening. comparing /35s to /32
s. The number 35 if you look before the graph it was going
quite steadily then the new size was released so to speak and
the No. 35 started dropping steadily. It's still hanging in
there, around 45, 40 announcements. Whereas /32 size has been
showing a fair steady increase. There's still quite a few /35s
visible and I guess the move is on to remain the remaining
people to stop announcing the favour of the /32s. Some numbers
- 684 LIR blocks allocated out of the 2001 address space. ARIN
have 120, APNIC 169. RIPE 385. LACNIC 10. As of end of last
month. That's a bit of an increase compared with four months
ago where - sorry, three months ago where it was 595. You see
the steady growth. The RIPE region has got the lion's share.
Some of the root servers have IPv6 addresses already. Some are
visible on root servers.org, some registered on the BGP. Line
382 visible, 684 allocated. There's a bit of a gulf there.
There's some very large allocations seen, those are listed.
NL-Benelux, Vectantnet in Japan, and Ataconet in Austria.
Looking at the graphics, by regional Internet registry region,
so the RIPE region has the black graph here, you see the steady
growth in the RIPE region announcements. 6Bone you see the
general decline there. The red line is APNIC, you see the jumps
from the leakage of AS 17382. At the bottom, ARIN, not quite
the bottom, then LACNIC right on the floor. Take the right
graph and look at the specifics by country there. You can see
again the general growth. I guess Germany's announcing the
most prefixes. There's an AS 1654. But the most noticeable one
of course is the - an EU-wide provider, it's my former employer.
They leaked a huge amount of prefixes back at the start of
year. So you see that spike there. If we look at the APNIC
region, Japan is announcing far more prefixes than anybody
else. You see a way up here. If you look down there's some
other ones. Gert said to me privately maybe this was a bid by
Korea to catch up with Japan in the announcement stakes. The
remaining countries around about 110 prefixes. If we look at
the allocated space versus the routed space. To explain this
graph - this is the number of prefixes that are routed or
allocated, so the blue is what's routed and the light grey is
what's been allocated. These are the actual /24 blocks out of
the two 001 space. The first two, the /22 went to APNIC, the
next two to ARIN and so forth. Again you can see what's been
allocated versus what's actually been routed. So about 50%
there. Interesting observations - he calls it ghost busting.
There's some interesting prefixes. We see floating around.
This one's probably the most interesting. A ghost is basically
caused by a BGP withdrawal bug. There's still quite a lot of
old, I daresay development and buggy software that's still
being used in the IPv6 network. These paths stay mostly unchanged
for weeks and weeks on end. You kind of track it through the
network and you still see routers hanging on to paths that
they don't hear any more. This is at most noticeable one that's
been around for a long, long, long time. You sometimes get
accidental hijacks where people have finger trouble or maybe
they don't understand hexadecimal, this was one - a /32
originated by AS 3292 was accidentally originated by AS 29657.
That was fixed pretty quickly. It's caused by static route and
people redistributing that into BGP. Some more interesting
observations - leaking of Martians. Network 1,000/8 then some
subprefixes of that. Gert reckons it's caused by some problem
within probably an - network of some sort and buggy software.
It's been the only documented leak since a-way back in late
2002, by this time 2002. So in that sense it's pretty good
going. But it does show there's potential for improving BGP
filters. Fourth one - weird AS path leaks. I suppose Gert put
this one in because it show this is one transited AS 5539 which
is the one Gert manages and he's quite careful with the prefixes
that he permits through his network. The ghost buster flagged
this one as a ghost. The problem was really caused by an
unlimited prefix distribution. In other words, a leaf node,
AS offering a full BGP feed to both upstreams and upstreams
accepting it. In the early days of the 6Bone pretty much
everybody appeared with everybody else and appeared with
everybody else and the 6Bone mesh, until quite recently was
quite a mess. There's been a quite a lot of effort to try and
prove some of this so the edge of the network doesn't try and
provide transit back into the core and so forth.

Fifth one - invalid AS numbers. We see this in v4 land as well.
AS 4555 - I don't know who owns it but it comes from the
exchange point AS block, announcing a private AS to the P6
Internet. Seemingly transiting MIT's AS number as well so maybe
it's a problem with MIT or maybe it's a problem with whoever's
running that particular router at AS 4555. Private AS numbers
should not be announced worldwide. So there's only really this
one left. It looks like an So news - the 6Bone, in other words
the 3 ffe /16 address space is going away. It finishes 6th
June 2006. Private an unallocated AS numbers seem to be out
of control but the ghost routes are appearing again - appear
to be under control. The early IPv6 networks starting to
deteriorate again. There's quite a number of unsolicited full
transit links. But people actually looking closer at it, they're
looking at trace routes, making some effort to try and fix
things. Overall structure's reckoned to be improving quite a
bit, going towards production quality. In other words the v6
path is no worse than the v4 path. If you've been following
the v6 progress over the last three years or so, you'll see
really bizarre situations as best path from one country in
Europe to another country in Europe would be via Australia or
Japan or something really weird. Quite a lot of people,
especially in the core of the 6Bone have made a bit of effort
to try and get rid of some of this highly suboptimal routing.
Also the US region is catching up on allocations but still
lagging far behind on actually advertising routes. Where to
from here? Needs to be more work on filtering recommendations,
more work on routing best current practice recommendations at
the RIPE routing working group or whoever else would like to
do the work. There's still much cleanup needed to be done. Bad
tunnels, filters, unsolicited transit relations. I'm probably
as guilty as anybody else because I do run 6Bone Cisco node
and I've got lots of tidy work to do that there as well. Bug
your upstream providers to offer native IPv6 upstream. Keep
an eye on trace routes to find out which ways packets are
travelling and try and get rid of the stupid paths, in other
words paths crossing oceans to go across the street, and
consider de-peering non-useful peers. Is it useful peering
with someone across the other side of the world. Talk to your
peers and help them fix stuff. Speak to your actual IP person
to person. IPv6 routing recommendations - the MIPP project
says no peerings over bad tunnels, get rid of high RTTs, third
parties. Apply incoming prefix filters to peers. Filter private
ASes and overly long paths. Don't give unrestricted IPv6 transit
to peers unless asked to. I tend to find most people we set
up tunnels to will only give their actual prefixes which is
much better than what was happening a few years back where you
just got the lot and try not to take IPv6 transit from too
many upstreams and avoid taking a single upstream over
intercontinental tunnel. Those are the references. So the ghost
route hunter, quite an interesting web site. If you're running
v6 node you could join the ghost route hunter if you're
interested. Merit 6Bone routing report. The list of IPv6 blocks
allocated by the RIRs is listed there and so forth. Questions
should go to Gert at Gert@space.net. I will try and answer
some of the questions.


GEORGE MICHAELSON:

I have about six questions, so it might be better to defer to
Randy to get one out of the way.


RANDY BUSH:

Could you go backwards two files, please. I work for a large
IPv6 Japanese ISP and no peerings over bad tunnels, we
operationally define bad tunnels as " All tunnels". If you're
trying to run a real network don't peer over tunnels.


PHILIP SMITH:

Get your upstream to provide you with native IPv6.


GEORGE MICHAELSON:

These are not all questions, these are different disaggregated,
disrelated observations. The 6to4 count seeing ten to 15 and
probing and discovering 42 that's an amazingly useful information
point for those of us in the midstream of looking at deployment
of 6to4 reverse - hint, hint, Doug, the total number of activity
here at this time in this network is low. So if we're considering
risk reward issues here it's ten. However, set against that I
recently decided to go and think about Teredo a bit more which
is not 6to4 but its ubiquitous, it's in every SP2 release-type
technology. I think it could be very interesting if there is
a measure to measure Teredo distinctly because I think it is
currently lurking in 3 ffe space and so there is potentially
an opportunity for us to find something out about this. And
get some better sense of what is going on in that space. That
was an observation. The tunneling observation - I think there
is a question here for people interested in mobile IP. I had
tunnel vision, I've talked about people saying we should run
the entire network as an overlay because the real net could
provide addresses, there is no address exhaustion. These kinds
of pint-of-beer discussions. But the observation that went in
the real world when up run tunnels, tunneled IP like it is
significantly worse than VPN tunnel. This has got to be thought
about for what people will experience when we start to do
layered IP behaviours on a global scale. Geoff is shaking his
head but I think there is a real world experience here that
needs to be thought about. Before he comes and rebuts there
was another one that mattered for me. What was it - that's
right. There are going to be some significantly large assignments
and allocations breaking the /20 horizon by a country mile.
It would be interesting if you made some measure of the
disaggregation they have to do to get under the radar rather
than over the radar because they get the ten or whatever it
is, what do they have to announce to get through people's
locked in, reject this, it's junk prefixes. This will be
interesting. The last one - not all 35s are historical. If you
have a 32 but you want to disaggregate it you can't announce
what you really want to, the 48s because people filter. But
the 35s gives you three bits of wiggle room. You have 8 virtual
sub-nets of disaggregation that anyone can do for free. I would
argue that is enormously useful and we should not seek to say
you must only announce your 32. We should of course say put
it in a routing registry but that wiggle room is useful. I use
it.


GEOFF HUSTON:

I was going to rebut one thing which is about tunnels and refer
back to a presentation made by Ken Duro at an IPG meeting in
March This year where he cross-correlated the performance of
v6 through tunnels against the underlying v4 and found the RTT
was basically the same as v4. He was pointing out a fair few
of the tunnels he was seeing were remarkably cleanly engineered
and didn't go out anti-routing. But I'm doing a secondhand
report. I think the issue is - go look it up.


GEORGE MICHAELSON:

OK. Fair comment.


PHILIP SMITH:

Thank you, Geoff for that. And George for your questions -
input more,

If Gert is watching this or - he'll see the transcript afterwards,
I think's all really useful feedback for him. I'm sure he'll
put it in his updated report which for those of you who are
going to the Manchester RIPE meeting will probably see it all
there. So any other questions about this presenting at all,
v6 status? If not, we will move on to the final presentation,
a bit of fun, hopefully. BGP, the movie. Directed by Geoff
Huston.


GEOFF HUSTON:

(Pause while presentation uploads)

I should give credit where it's due, a lot of the heavy lifting
is actually being done by George Michaelson. So it's work we
have done together here. This is a quick status report on BGP.
Then I'll just get into the movie as quickly as I can. There's
the routing table, the picture since 1994. Snapshots taken
every hour. What you see in the very left is the class v
explosion, then you see CIDR-isation between 98 and 20 2,000
you see the boom. Between 2001 and about 2002 you see correction
with the crash. Now you're seeing another growth again. It
seems to be from all of the route views here at various levels.
A single view - it's fairly obvious what's going on. The last
12 months in the IPv4 routing space something good happened
in the middle of April, something bad in the middle of May.
There have been some step functions as large amounts of routes
appeared and disappeared from the table. The table is kicking
at around 142,000 routes, someone just announced another
thousand a few days ago, thank you very much. This is the
address table span of the entire space. Much harder to figure
out what's going on. Not everyone sees the same amount of
address space in BGP. Most of the network is connected but
some of the network isn't. I'm saying what's announcing into
route views. There's a lot of qualification - (Question from
randy)

This is total reachability. So I included all the aggregates,
the lot. So if you're low you're missing some address space
that other folk are seeing. Really strange graph, actually.
Let's take a single view. There are flapping /8s. This has
gone down to about 3, from 4. One of them is I think InterOp's
old ShowNet but I don't know what the other one is, that's the
banding that's going on. The boom and bust happened a little
bit later here. You saw strong linear growth to mid-2001. It
has been growing again since early 2003 quite consistently.
Our burn rate is still on average around 4 /8s, or in other
words 60 million /32s per year is the current rate of growth
there. Fragmentation of the routing table - on the left is
percentage. So most folks see that around 50% of their routing
table are actually more specifics of covering aggregates. How
we got there was I think a bit interesting. They pushed up
some more data. Most of the fragmentation and the address space
happened between the Internet boom from 1998 till 2001 but it
never went away. In other words, once bad things happen in the
routing table they never go away. The routing table seems to
never forget and never get any better. What appeared is some
sort of rather strange behaviours during the boom to sit there
inside the routing table and rot. Some folk do strong prefix
filtering, some do weaker, but about half of the information
in your routing table is covered by other entries. (Next slide)
There was a strong exponential growth until around 2001. There's
a variation, around 500 AS numbers. People do see in terms.
AS numbers a different type of reachability. Lots of AS numbers
so you'd think the network would be getting longer and stringier.
It's not. As it grows it gets denser so the average AS path
length with all those ASes being added is still much the same.
So that the topology of the network is one where at any
particular AS diameter there are more paths in there and more
ASes, which is interesting when you consider that with routing
protocols. Some very naive work about aggregation. Naive because
it makes sweeping assumptions. We're currently carrying a
little over 140,000 routes. If you preserve prepended AS paths
and simply knock things together and ignore the fact there
might be - you could take it down to a little under a thousand
routes. That's a savage guess. It's probably not as good as
that. But it does indicate a lot of the fragmentation that
happens in the network is perhaps not due to traffic engineering
but due to some other artefact and some is due to traffic
engineering. if you do stricter and stricter forms of aggregation,
so the line below that is stripping out the prepending, then
I'd take out all AS path and go - if the origin's the same
knock it out. then I look at chequerboarding. There's a savage
practice out there of taking an allocation and you announce a
subcollection of specifics but they're not actually connected.
So if I actually aggregate over across wholes across RIR
allocations you get better theoretic aggregation. Then I really
push hard in terms of prefix length filtering. If you try to
squash things down the actual information content in the routing
table is around 43,000 routes without traffic shaping. In other
words, just straight reachability. If you include reachability,
if you include the sort of my traffic goes down there path not
that path the information content appears to be around 100,000
entries.


GEORGE MICHAELSON:

Geoff, the slope of those lower lines are relatively consistent
but all are a lower slope than the top one. it's not just that
there's less of them but their growth rates appears to be
consistently lower.


GEOFF HUSTON:

In other words, the amount of absolute fragmentation is growing.


GEORGE MICHAELSON:

But slowly.


GEOFF HUSTON:

But slowly. From Gert Doering you saw the IPv6 routing table.
Here is the same as Gert saw. 17,000 autonomous systems in
IPv4, slightly fewer in IPv6. The latest census I see is around
160. The aggregation potential, v6 has strong aggregation there
and currently the community's small enough to be strong about
aggregation. There's actually very little fragmentation happening
there. Although on a relative scale it's probably about the
same. Now I'll get to the movies 'cause I know time is short.
We've tried to figure out where we've got to and why and how.
So this is the first of these slides. There is a consistently
colour pattern. This is a map of IPv4. It's in /8. Each column
is a single bar of 16.7 million addresses. Five colours. If
IANA still got that space and it is useable at some time in
the future in the routing table, so it's unicast space it's
coloured yellow. From 85 through to 127, top of the old B
space, I believe there's 223 /8 kicking over there. If the
RIRs have been allocated the block and it's still sitting in
the RIR pool it's coloured red. So this is a space that's about
to be used by the RIRs. Old space that got returned to the
RIRs. When I say " Got returned" there's no delegation record
in the RIRs. So some of this old class B space you'll see the
red as well. If the IETF has reserved space due to a standard
action, the top space classed D and E. Network 10, network 14,
network 0, network 128. You'll see that in blue and probably
if you look really, really hard - although I think there's a
bug in the program, there should be some blue around there in
the old class B space for 1918.


GEORGE MICHAELSON:

It shows up in the movie later on. - better.


GEOFF HUSTON:

The rest of the space, green and light green is space that's
been deployed, assigned. It's out there. But not all of it's
routed. If you had routing information, the dark green is space
that I see in the routing system itself. The light green is
space that's out there but is not being routed, so it's
legitimately been handed out somewhere but I can't see it in
the routing table. Here's some of the old historic /8s that
got allocated directly by IANA. The whole lot that are light
green. Here's the old class B space. A whole lot that's light
green. Even the old C space, the older C space is fair deal.
The more recently managed space and the space that's actively
managed rather than dormantly ignored is quite efficiently
used.

Similar exercise in the autonomous system numbers. We're over
halfway of the current 16 bits, 65,000. That's your private
AS number block. Red is what the RIRs have. It's stuff that
if it got reclaimed in the early part most of this space is
allocated. That's what I see in the routing table. That's what
I see unrouted. ASes seem to have a bit like radioactive decay
- half lives that is over a period of time half of them disappear
and that seems to be continuous. Here's the IPv6 table. If it
really was the IPv6 table this wouldn't be here. There'd be a
very skinny stripe here and one there. So I've taken the only
two benefits active space I can see which is a /16 block, 2001.
That's the left-hand side. The /16 block 3 FFE which is the
right-hand side. What you see with 3 ffe is it's all been
allocated out from IANA. 6Bone's got it, but this much has
been handed out and is routed. I can't find a 6Bone registry
to colour that any other way, so I've just assumed the whole
lot is out there somewhere. Because their routing unit is
basically a /24 it's either all or half. So 3 ffe was dealing
in really large blocks. Here's the original set of /23s, right.
Here's the original /35 allocations out there. And they really
were tiny but a fair deal of it was routed. So in that space
there the actual light green is quite small. These are the
more recent allocations you've just heard coming in from a
number of the larger providers using v4 legacy. So that space
there, for example, has been hit the delegation files, I believe
that one there has come out from IANA and will be delegated a
the some point soon. Soy expect that to change colour. There's
another one where again the allocations are just happening.
So the movie is trying to say how did we get to that state,
because that state is about a day or so old. What we've done
is comb back through all the data we can find and push it all
together. Unfortunately we don't have routing data that goes
back that far. The routing data only goes back to 1997. But
the delegation data you can clean up. The IANA files are a bit
crappy but if you take that with the ARIN data which is really
quite accurate and rewrite some of the crappy IANA data you
can go back to 83 and start developing a reasonable picture.
This has now got an image for every day. It's just running
through. Let me explain a bit - we're trying to run a tachometer
on this to see how fast things are happening. AS numbers at
the bottom, IP numbers at the top. When you see an allocation
you'll see one of these numbers start to flick. That's pre-BGP
so that is nonsense. It'll take a while:

At the moment - 1984 - /8s - 1986 - /8s are being allocated.
Around 1987 the NSF project started up in the B space rather
than the As. A whole lot of universities, firstly North America
and then elsewhere started heading into the B space, then RIPE
wanted a space to open into Europe then you started to see the
B space really movement although there's activity notice B
space /24s are so tiny you can hardly see them moving. This
is really when we started to worry about the class B space.
If you notice the amount of space that's coming out of the
class B space at the time it was enormous. The delegations
were really quite solid. There was also some A space around
at around that time as well. I think the Japanese around that
time got net 45. You should see it coming in just around there.
The B space is moving very, very hard. Now we're starting to
do a little bit more BGP. At this point the network was still
hierarchical coming out of the US. There wasn't an awful lot
of AS system growth. You're seeing slow growth. 1993 was when
we really started to recognise there was a problem here with
class Bs and started to think what other ways of doing this
are there? At this point, early 94, A very strong pressure to
get out of the Bs and start using the C space in a better way.
You find now the C space is moving quite quickly because we're
now giving out CIDR blocks, the B space is relatively static.
We're about to get into the routing point. CIDR is moving on
pretty hard. We're doing it in routing. Now the AS numbers are
doing very strong growth. What's happening now is rather than
the network getting massively bigger in terms of its address
space it's getting massively bigger in terms of its routing
capacity and diversity. Routing has come in now. The B space
has only been half used. What got allocated years ago was not
getting routed. What got allocated in the A space is really
not getting routed. When you don't look after address space
it goes to the dogs. That's what we're seeing here. Over there
the RIR system is allocating at the edge. The RIR system is
allocate hearing. Here's the AS space. Quite a lot happening
here. Huge amounts. The network is getting more complex faster
than it's getting bigger in address space and half life of
decay - when ASes get old they disappear. The boundary coverage
is so fast it's amazing. 2003,000. This is the recovery phase.
The As are opening up in IP space but the AS numbers are growing
really quickly. Occasionally route views misses it. There you
*go, that was quick. Hopefully quick enough. Any questions? I
don't know where we got these credits from but they're pretty
amazing. There is a movie on IPv6. Nothing much happens - the
allocations are a couple of large blocks and that's about it.
It's probably not worth playing in this forum but there is a
comparable movie with the data that we have, yeah.


TIM GRIFFIN:

Looks like we need larger AS numbers.


GEOFF HUSTON:

Yeah, that kind of picture of the AS numbers is pretty bad. I
did some analysis about two years ago thinking we had 3 or 4
years before we had to worry. The AS number is still pretty
solid. Full byte ASes are the way to go. If it's going to take
some time for vendors to implement then we've got to do something
about recovery. If vendors come in with kit quickly over the
next two years give us some time to test and deploy we don't
have to worry. That stuff is irrelevant if you go to 32 bits.
This is the fastest growth point. The IPv4 space, the growth
rates aren't as bad. As I've done in other work - the growth
into these areas here is quite steady. You could say it's NATs,
a whole bunch of things but we've got well over one decade,
two, possibly three, growing into these spaces but that's not
the same picture with AS numbers. That's much faster. If you
look at the recovery opportunities that's just amazing. There's
so much space sitting there in unmanaged allocated space. If
we used that as well I think you'd buy about another ten years
out of this space if you actively managed the As rather than
shut your eyes and hoped they went away. Any other questions,
comments? Thank you.


PHILIP SMITH:

Thanks very much, Geoff.

(APPLAUSE)

That's all we have for the routing SIG for this APNIC meeting.
I'd like to thank all the speakers, Tim, Randy, Geoff and Gert
Doering in his absence, of course*.

Thank you for coming. May I remind you of the mailing list.
You can go to the web page there on the screen. If you're not
a member, please join. If you are a member please contribute.
If you would like to contribute a presentation to the next
meeting of the routing SIG that will be during APRICOT in
Kyoto, Japan end of February 2005. If it's not in your diary
please put it in. If you haven't made travel plans please think
about doing that also. See you all in about six months. The
BGP signing BOF is in about 15 minutes. Thank you for coming.

(End of online sessions for Thursday)

Time: 5.45pm