______________________________________________________________________

DRAFT TRANSCRIPT

SIG:           Routing
Date:          Wednesday 1 March 2006
Time:          2.00pm

Presentation:  Routing update
Presenter:     Geoff Huston
			       
______________________________________________________________________


PHILIP SMITH:

OK, while Geoff is setting up for his routing update presentation,
I will go through the housekeeping list that I omitted to do at the
start.

As I was saying, if people want to ask questions at the end of the
presentation, please come up to the microphone. I think there's
also a travelling microphone wandering around as well if you're not
mobile. State your name and affiliation, again, for the benefit of
the microphone, and also I should point out that this session is
being broadcast and audio-casted so it makes it more important that
people use the microphone so that the people who have joined us
from elsewhere can hear what's being talked about.

Afternoon tea is in the level 2 foyer area, so it's basically that
way, to my left, to your right.

The social event is this evening. If you've got your ticket, please
bring your ticket and the details of where it is are listed on the
back of your ticket. APNIC staff will collect the ticket from you
when you board the bus and that will actually leave - the bus that
is - will leave from level one plaza deck outside. Last bus will
be 7:10, so please don't miss it.

Next item - MyAPNIC and the policy flash demo are running all day
at the APNIC help desk. The help desk is available during break
times - morning, lunchtime and afternoon breaks.

Onsite notice board - again, it's advised to have a look at the
onsite notice board on the APNIC website for any last-minute updates
and so forth. There's a special session at 4pm today in meeting
room 3 discussing the APNIC fee structure. That's an open session.
Anyone with an interest in that topic is welcome to attend and is
invited to participate in discussions, although I'd much prefer you
to come to the Routing SIG but that's up to you. That's the
housekeeping.

So the next presenter is Geoff, who will be giving us a routing
update.

GEOFF HUSTON:

I will. Thank you. Good afternoon.

I seem to do these at every routing SIG, giving you an idea of
what's happening inside the BGP routing table. I've got three parts
to this presentation today - one is a status report and then I'm
actually looking at work based on a question that Vince Fuller asked
me a few months ago that I found it interesting to answer and further
observations after that.

Normally I use hourly snapshots I pull from routeviews but this
time I used a complete dump of the data and I must thank Stephan
Millet of Telstra for assisting with some of the data used in the
presentation. My disk is now full. Thank you.

Usual picture. This is 2005 January through February, the BGP
prefixes. It might look like November and December were tailing off
but be assured that January and February of this year, you have
come back again and routing is back on once more.

What does it look like? You put a line across and go, "The number
of prefixes in the default-free zone across last year rose from
150,000 to 175,000 prefixes in 12 months." So life is still increasing
the way it always was. The amount of address space is kind of
interesting - 4.4 billion addresses in IPv4 if you try and use them
all. It started at one point through 6 billion at the start of the
year and finished at 1.5 billionish. Those big jumps there - there
are till two /8s that appear and disappear like lighthouses. It
amazes me that there are /8s that flap but there are and there they
are. You can draw a line across the top of this and I've eliminated
the /8s, smoothed it out and then you see pretty cleanly that the
amount of address space rose 1.36 billion to about 1.5 billion
addresses. Some seasonal variation. Some of you took holidays over
the Northern Hemisphere summer and you're slightly below average
but then got back to work in October and decided to add more addresses
into the network. We appreciate that and thank you.

LAUGHTER

The total number of AS numbers - it's the same kind of curve. Very
consistent. Unlike address space, AS number appearances on the Net
keep romping through so the trend line is spot on from 17,500 up
to 21,000. Somehow ASes are remarkably consistent unlike addresses
or routing table entries. The addresses that keep on appearing,
appear almost like clockwork. It's strange. So the vital statistics
- prefixes up by 18%. Roots and more specifics - are we getting
better or worse at this business of only advertising the aggregate?
And the answer is no - no better and no worse. Because the number
of basic root advertisements that, if you will, encompass new space,
rose by 17% to 85,500 but the number of more specifics predominantly
/24s but a few others in the mix, also rose by about the same amount.
So around 50% of the network is still more specifics and around 50%
are basic root prefixes. The amount of addresses rose by 10% so the
granularity of advertisements is getting smaller, not larger, again,
yes, you know, lots of little advertisements because the address
space is not growing as quickly. The number of AS numbers up by
14%. What can I say? The average advertisement size is smaller.

Address origination per AS is getting smaller. ASes are moving down
the food chain. Interestingly, the average AS path link is remarkably
steady at 3.5 ASes. 14% more ASes but the diameter of the network
is constant, the density of interconnection is increasing. It's an
interesting question - whether this is uniform density increase or
whether there are black points where the amount of density is
increasing.

In other words are exchange points and similar of their ilk actually
gaining strength or is the bombing-out of the long-distance transit
fibre market causing interconnection to appear across longer spans?
Some work is happening, I think in Adelaide, right now around
topology of the network and I'd be keen to see if they're working
on that. It's interesting because the denser the mesh, the more
badly BGP behaves when it tries to converge. So trying to understand
if the explosions are global or local is probably an interesting
thing to understand. However, on a macro level, the network is
getting denser. The advertisement granularity is getting smaller.
More interconnections, more specifics. By contrast, this was v6.
Similar growth except you've got to look at the numbers. The number
of prefixes rose from a phenomenal 700 to a phenomenal 868, I
believe. More patchy - you can obviously see that some people decided
it was a bad time. In August, they mucked around with v6, and then
got bored. After that, they all went home again.

More noisier - the advertised address span. This is weird. Two big
spikes and everything going down. In other words, the blue line is
actually decreasing each time. Why is that? I took away the 6Bone.
And now what you actually see - and this is this issue that a /20
is much, much, much, much bigger than a /32. So here are all the
/32s, blip, blip, blip, /20. That's a /21 and that's a /20. It's
hard to show you the growth in v6 address space apart from saying
two whacking great big allocations happened that year and got
advertised and a few little ones but you can't see them.

There's the 6Bone. You've now got precisely four months to quit and
so far each of you are quitting. Each jumped down as a /24. The
6Bone is slowly being turned off. Here's that combined view now. I
don't know if it's any clear. That's the big picture, that's the
bit left after 6Bone and that's the 6Bone slowly flying off and
without those two big allocations being advertised, that's what we
actually have. You just can't see it. Someone's given me a laser
pointer. The bit without the big allocations.

Advertised AS numbers - noisier. Probably because 500 to 600 - so
there are 600 of you playing in the v6 game as far as I can see,
some in 6Bone, some not.

So what can I tell you about this? Prefixes up by 21%, routes by
15, more specifics by 21. Naughty, naughty, naughty. You're not
meant to disaggregate in v6. Stop it!

LAUGHTER

The amount of address space went up by a phenomenal 50% because two
allocations happened to be advertised and went separate up. ASes
up by 20%. The average advertisement size as a result of those two
massive ones is getting enormously large and origination per AS is
getting large but only because of those two factors. The path link
I can't give you much of a view on. Such a small network that you
can't see what the average AS path is and the interconnection
degrees, I can't tell. This is a network that continues to go large
with little overlays at the edges and trends really aren't there
yet.

Part one.

Part two - more interesting. If you were buying a router and you'd
like it to live for three years inside your network doing what it's
doing and it's trying to run in a default-free zone, what spec would
you tell the vendor to build to? You know, how many prefixes, how
many prefix updates per second? Vince wanted me to answer two
questions - v4 and v6. I took the easy one and did v4. It's hard
to predict v6 because it's such a small network. I can't tell. I'd
like to try and have a shot at this answer in a v4 context.

And for this I'm actually taking a macro view so I've taken the
entire set of update withdrawals from AS1221 for 2005. Because
you're inside a relatively busy network, there's a whole bunch of
local updates also happening from inside the network so I've basically
tried to filter out everything that I don't think came from the
default-free zone so smacked out a whole bunch of updates from
there.

What I'm trying to do is to see if I can relate the number of updates
and withdrawals against the number of entries in the RIB. And at
the same time, I'm also looking at the CPU load records from that
router that was supplying all the updates and again try and see if
there's a relationship between the amount of CPU that's being used
in that router on some kind of granularity against the table size.
Now, if that's the case, if you can do a table size predictive model
and you know the relative number of updates and withdrawals in CPU,
you have a vague idea of how big the thing should possibly be
somewhere in three to five years' time. So that's the methodology
of trying to answer that question.

Updates per day for the year in millions - default-free zone. So
I've filtered out a fair few. At the start of the year, somewhere
around 300,000 updates per second, BGP updates, messages, were being
caught at the router. By the end of the year, it was slightly under
600,000. That's big, an enormous amount of updates. Notice also
that this is not uniform. Now the law of very, very big numbers
says that, if each of you contribute a little, and there are 21,000
ASes, that line should be smooth. So the law of large numbers isn't
working. A variation on that number is so big, up to 50%, that I
suspect that each of you aren't contributing a little. And that's
what I want to talk about a bit later. Notice that astonishing
variation. That is not nice.

The number of prefixes per update message. How efficient are we at
packing it into single updates? Getting worse. In other words, the
update is getting closer to describing one prefix, not a bulk of
them. The network, in terms of the routing policy granularity and
update is decreasing, granularity is getting smaller and the number
of prefixes vary a lot. Some folk are doing very strange things.
I'm trying to understand that. Number of update message per day -
unfortunately, it's close to double, as I said, it's highly variable
and it shouldn't be. The number of prefixes per update message is
falling and I'm just wondering if this is actually due to this
increasing use of ASes to do multihoming at the edge, that I'm
actually seeing what used to be an ISP with 50 prefixes, now an AS
with 20, and a whole bunch of ASes underneath it that are starting
to multihome.

But for some reason, the number of prefixes is falling. And now
time trying to understand - it seems to be that the update rate is
increasing faster than the table rate. That update rate was almost
double. In other words, the number of updates happening in the
network is increasing faster than table size. Is there some kind
of multiplicative factor going on? Or is something else happening?
Some kind of thing getting larger than the routing table itself?
This is not good news.

Now, maybe if I stop looking at update messages and start looking
at individual prefixes. How many prefixes change per day and what's
their trends? What I've done now - I'm actually looking at prefixes,
these are updates, those are withdrawals, in millions. So the
withdrawals, 200,000 going up to 300,000 in a day. Huge number of
withdrawals. The updates of prefixes is even noisier. This is weird.

I can put a trend line on the prefix update rates and, you know,
yes, it has increased. Around 800,000 update prefix messages. You
know, the prefixes that actually got updated each day by the end
of the year.

Withdrawal rates, you're going to actually start to see - that is
an exponential line, and even though it's noisy, there's clearly
an exponential growth factor in the withdrawal rates. So high
variability and approximately exponential but at different rates.
The updates are growing faster than withdrawals. So now, can I
relate that to the size of the network? The actual number of entries
in the network itself? So that's the default-free zone across the
year, that's 100,000, that's 170,000 and you can see pretty clearly
there that that's not linear now. It really is a bend going on
there. I've smoothed it out and done a first-order differential.
The default-free zone in terms of number of entries in the RIB is
growing faster than linear and is actually growing order in squared.
If I look at it as an order two polynomial - how many RIB entries
in three to five years, Vince?

Somewhere between 275,000 in three years and 375,000 in five years
would be a prediction. My guess at the confidence interval was about
20% so it's not that confident, but that fit isn't bad. That appears
to be - if you're looking at three to five years, that is appears
to be the metric you're looking at.

Now, I've done the next thing, which is, for each RIB entry -
100,000, whatever - how many update per RIB entry over the year?
If that was linear, that would be growing at the same rate as the
table itself. If it's more than linear, then the number of prefixes
being altered each day is growing faster than the table size and
this is the number of withdrawals per RIB entry. Is that weird? One
withdrawal per RIB entry, on average, every day, the entire table
is withdrawn or there are a very small number of prefixes that are
doing an awful lot of work - withdraw, update, withdraw, update.
But that's one. Every single entry is updated three times every day
or there is a small number of prefixes that are, you know, pushing
up an awful lot of iron very quickly and that's growing very fast
and, again, very noisy.

So I can answer you at least at a gross level, Vince, on what I
think will happen on withdrawal and update, inside three years,
you're going to have to cope with around 1.7 million prefixes being
changed every day and, by five years, that will grow up to, you
know, around 3 million, which is, know, aine more amount. And the
withdrawal rate, similarly, should be around 1.5 million withdrawals
per day. Which should keep your routers busy.

How fast would your router have to spin? Again, same kind of
technique. I got from Steph the actual 5-minute and 1-minute CPU
loads and across the year in question, the router had a brain
transplant twice and this is a PRP2. Isn't it cool?

So what I tried to do was normalise that and make everything a PRP2.
So what I actually got here was this - this is what happened across
the year, the one-minute load rate on the PRP2 increased by that
rate and just pushed it up per RIB entry. This is growing faster
than the RIB it appears to be that, when I push that forward, if
I'm doing a unit of one by the end of this year, by the end of five
years, I'll need four times that amount of processing power to cope
with that load. That appears to be the projection. Today at 176,000
prefixes, 700,000 update rates per day, withdrawals of 400,000,
you'd need around 250 Mbytes of memory and need 30% of a one-gig
processor.

Three years time - 275,000 prefixes, just under 1.7 million updates.
Almost a million withdrawals, double the memory and 75% of that
processor and in five years, you'll need a new processor absolutely.
About 120%.

It seems awfully low. I think it is low. You really are talking
about trying to cope with peaks, convergences about speed, rather
than reliability. Got to get there faster. It's per-second peak
rates, not loads, that is the problem and it assumes that BGP isn't
going to change. And there's been more than enough words over the
last year or two that we need to do something around securing BGP.
If you think you need to be able to do that inside five years' time,
you may want to think about what exactly are the factors are the
router you're going it buy in terms of security-related protocols
- are we doing IPSEC in our peering or something similar, incremental
workload and so on.


So I would actually say that, if I was going to spec one out, I'd
at least want 500,000 entries in the RIB, no sweat. I'm going to
need an awful lot of adjacency RIB space. If I'm going to be
Conservative and say, if I do this, I think I'm OK, then it's about
6 million prefix updates per day. I think I need at least a two-gig
route processor memory and probably a 5GHz processor for route
processing. What was a Cray1? I think it was less than that, wasn't
it?

How good is that number? What's going on here? I've got a couple
more seconds. Is this uniform? I don't think so. I don't think all
of you are behaving so well as that. Is this skewed? If so, how
skewed?

289,000 prefixes were actually announced that were different. Now,
the table is only 179,000 prefixes. So there are actually 127,000
prefixes that appeared for some period and aren't there any more.
So people are leaking. And then they pull it back bull there's been
an awful lot of leaking. The number of prefixes that had no updates
at all through the year, congratulations, were 12,640. Well done,
gold star, tick, elephant stamp. Everyone else doesn't, they did
some kind of update.

This is a cumulative histogram of what's going on. 50% of the
prefixes contributed less than 10% of the updates. 60% of the were
fixes contributed less than of the 20%. 80% contributed just on
20%. So the top 20% of the prefixes were pretty bad and the top 1%
contributed 15% of the update load. OK. Let's name them.

LAUGHTER

So, if you see yourself on this list, you're on this list for one
of two reasons - one, you're multihoming and you don't know how
because, somehow, you managed 158,000 prefixes for the year, of
which 20,000 were actually just flips in the first-hop AS, the next
hop and, of course, you flap like crazy. All of these folk flapped.
Some of them also re-homed as well. So have a look there. You may
be there. You may wonder why you're there. If you're there and
wonder why you're there, go to the tutorials. Let's have a look at
these people because some of them are systematic and some of them
are night-time stuff-ups.

Systematic - Hong Kong Supernet, this was a prefix that was active,
withdraw update. Green is the flap and the red is an attribute
change so for a period of at least four months, this one prefix
managed to generate around 1500 update messages per day for one
prefix, most of which were withdraw, announce, withdraw, announce.
Precisely what information that added or subtracted from the routing
table beats me with a stick. Somehow, they got a clue in September.
Well done.

Here's another one. This is from ICARE in Hong Kong. This is straight
traffic engineering. This is Hong Kong Supernet again. They have
four upstreams and they're moving prefixes around.

This is one in Turkey. Someone out there in vendorland sold them
something in June that they shouldn't have bought.

LAUGHTER

Because this is systematic sustained 500 updates per second and
moving across multiple upstreams. This one is an interesting case.
Here's another one from Turkey. They went and bought even more of
it at the end of September and did even better with it. Why is this?

Again, here's another one, Amphibian Media, I think they're related
to the folk who do inbound route traffic engineering and here is
another one - Merit. Surely they should know better. I looked for
this network a couple of days ago and it's gone. They must have
seen it flapping and sent it off into routing hell because I couldn't
see it a work or so ago. Phenomenal amount of updates from folk
like Merit. Here's the last one again and, oh, it's our friends
from Turkey. They really did buy something in June. I think they
tested it in April and really turned it on in June. Shouldn't have
happened any other way. And Number 10 - again, a US one, I think
they're InterNAP-related. Systematic, absolutely systematic, so
this is no accident.

Is it prefixes or Autonomous Systems? Look at that curve. This is
the autonomous system one. Go back by 10, look at this one, so,
while there might be a small number of prefixes creating all these
updates, there's a tiny number of autonomous system numbers generating
all these updates. The top 1%, the top 2%, the top 3%, 3% of the
autonomous system numbers generated half of the updates. Thank you
very much. Well done.

LAUGHTER

Here's another way of looking at it. Red is the actual number of
updates. The green is just the top 50 ASes. So the top 50 ASes do
half the updates. 50 people cause your router to have a problem.
50 people cause BGP to have a problem. Let's name them. Here's the
first one, we met them before, the folk who bought this wonderful
thing in June, AS 9121, 206,000 rehomes since June. If they'd run
if for a year, they'd be effectively off the planet and so on. You
could see a lot of multihoming in 17557 and 721. 721 probably should
know better and an awful lot of flaps on all of them.

Here's the signature for our friends in Turkey of the total number
of prefixes they originate and, yes, they actually tried it in
February, liked what they saw. I think they bought it in April,
turned it on hard in June, took a holiday over August, came back
to work and then something blew up in December. This is bad stuff.
I actually had a look at them. They seem to have six upstreams and
my guess is that they're using a tool like OER and doing more
specific juggling to try and get their incoming traffic evenly
balanced. And my suspicion is that they could stop all this if they
bought more bandwidth. Because what's going on as far as I can see
- and you can look them up in BGPlay and have fun with these guys
but you will find that the routes flip across all six at 4:00 the
morning. I know they're hardworking ISPs but I don't think anyone
is up doing massive config changes as fast as they can type at 4:00
in the morning. Something is happening doing bad things globally.
More tutorials and a word of caution to the vendor - you shouldn't
have sold it to them.

The next one, Korea Internet Exchange, number two. This is not
traffic engineering. This is some pathological condition that lasted
for a while, caught it and fixed it.

MCI Europe - traffic engineering.

Pakistan Telecom. A combination of the two? I can't tell. These
enormous spikes are weird but there's a background of activity. So
that's probably harder to put a signature on.

Ah, Hong Kong Supernet, our friends. Whatever it was stopped in
September and we're all very grateful.

TPG here in Australia, again, you know, they're number 6 on the
list, really high spikes intermittently. Can't figure it out. Yet.

9121, our friends in the US military. Something must have happened
in June somewhere that required a huge amount of updates in BGP.

Hong Kong Supernet has two ASes and here's the second one. They
just keep on having more fun than you can imagine.

DACOM in Korea and the Korea National Computerisation Agency - those
are individual problems that happened over a few days.

We're not seeing a level of uniformity in update rates that my
answer to Vince was any good at all. I'm only looking at 100 ASes,
I believe. And I'm looking at 100 ASes whose behaviour pattern is
pathologically bad and they contribute the overwhelming bulk of
updates now and have done for at least a year and I suspect there's
two reasons.

One - there's a whole bunch of automated inbound traffic engineering
software that does route prefix juggling that is stuffing up the
entire network with millions of updates and withdrawals. Literally
millions, in fact hundreds of millions. That sustained consistent
update rate is killing us. If you want to do inbound traffic
engineering, consider buying more bandwidth instead. The rest of
us will thank you. I don't think the routing system can withstand
that kind of abuse.


There's something more than MED oscillation. You shouldn't see
withdrawal. It just simply flips. What I'm seeing in these unstable
ones is a massive number of withdrawals with moving around in the
prefix. We have isolated incidents of unstable configurations that
are causing massive load rates.

So, as I said, the overwhelming number of updates are generated by
an underwhelming number of sources. The uncertainty in the trend
models I gave you is extremely high. That means that, if you really
want me to give you an answer, Vince, on what you need to buy in
three to five years. My answer is I don't have a clue. Thank you.

APPLAUSE

RANDY BUSH:

Geoff, two things. Why did you throw out the internal route
fluctuations when the router is going to have to handle them?

GEOFF HUSTON:

This was the question as distinct from the investigation. The
question was actually asked, "If I was buying a default-free zone
router," and when I start factoring in AS 1221, how do I sort of
know that that's a typical kind of pattern So I thought the way to
do this - and AS 1221 doesn't have an awful lot of upstreams so,
if you look at its topology, it's actually got a couple of domestic
peers and you see this one upstream - I'm kind of looking truly at
a default-free zone single path and that seemed a good baseline to
take for these measurements.

RANDY BUSH:

This is what I need with a single router in the default-free zone
that doesn't connect to anything else.

GEOFF HUSTON:

I'm talking about the floor, not the ceiling.

RANDY BUSH:

And you don't want to know what router to buy. It's what router to
sell, since Vince is not a vendor.

The other thing is those /8s you saw flapping which you don't
understand - I gave a lightning talk at the end of the last NANOG,
where we discovered that there are a number of /8s being used by
spammers. They announce the /8, they hit you with spam from the
dark space in it, from the unused space in it and withdraw the /8.
And that's those flapping /8s.

GEOFF HUSTON:

Thank you. We should name them. I'm not sure it will do any good
because they're stolen.

RANDY BUSH:

We did.

GEOFF HUSTON:

You named them. But they're stolen.

RANDY BUSH:

Right. They cover some sparse allocations and what they're doing
is using it for agile spam generation. They announce the /8 and go,
" ping, ping, ping, ping," and turn it off.

GEOFF HUSTON:

Interesting. Thank you.

PHILIP SMITH:

Any other questions for Geoff?

If not, we should move on. We're a little bit behind in time.

Next up we have Randy.