Routing SIG, Thursday February 26, 4:00 pm-5:30 pm

PHILIP SMITH:

OK, we should make a start. Welcome to the APNIC 17 Routing SIG. Just 
some administration and announcements before we start proceedings 
properly - first housekeeping announcement is use the onsite notice 
board for update announcements. Please check that. There is a Jabber 
chat client service available, web casting and, of course, the live 
transcripts. Please let colleagues know that these services are 
actually available for you. The APRICOT closing event is this evening. 
You need a yellow event ticket to get on the bus so this is the yellow 
event ticket. If you don't have one, please go to the Secretariat. I 
think there are about a dozen or so tickets left. The buses leave at 
6.00 pm from the hotel lobby, OK, so please be there 6.00 pm, not much 
later.

The APNIC Members' Meeting is tomorrow morning. Registration starts at 
8:30 and you need to collect your new name badge. Your APRICOT badge 
will not work, OK? The meeting will be held in this room, which is 
Unity 1, just in case you're not sure which room you're in.

OK, so that was the housekeeping. This, as I was saying, is the 
Routing SIG. It's chaired by Randy Bush and myself. If you need to 
speak to us or want to speak to us, the e-mail address is there on the 
screen. You can join the mailing list through the APNIC website. If 
you're going to ask questions of our speakers, please use the 
microphones. Don't shout out from the middle of the room. There are 
two stationary mics there and, there are two roving mics who will come 
and find you if you don't want to walk to the stationary mic.

Please speak slowly and clearly. If you're going to ask a question, 
announce your name and affiliation just for the benefit of the 
stenographers. OK, our agenda is very short. Well, we have only one 
item on our agenda, I should say. Unfortunately, Phillip Harris, our 
other speaker has had to go back home for personal reasons. So we'll 
postpone that part of the content until a future date. So we have one 
speaker who is Geoff Huston, who will be talking basically about 
allocation versus announcement, comparison of the RIR IPv4 allocation 
records with global routing announcements and other things.

GEOFF HUSTON:

We seem to have a fair deal of time so Phil and I thought that maybe 
it would be useful to go back through some motivational material first 
and actually have a look at the entire space of what we're talking 
about which is global routing on the Internet. So the first of these 
slides - lots of cute colours, lots of cute point. It's a bit like 
television, always compressed. There's actually a description of the 
Internet table for the last 10 years from 1994 through to 2004. And 
the first thing we're looking at here is the number of individual 
entries that exist in the routing table. And, in some ways, that's a 
kind of metric of the size of the Internet and the pace at which it's 
growing. There are a number of interesting things here, some of which 
are only partially obvious, but I'll go through them.

Right at the back here in 1994, there was a suspicion we were running 
out of size, that the number of routing tables in the Internet was 
going to grow very dramatically and life was going to get very crappy 
in a couple of years. You can't see much before there but it's 
actually quite a neat exponential growth curve. It was at that point 
that classless domain routing was pushed on to the world and a push 
was taken to introduce it. From March IETF onwards in that year, you 
notice a slowdown through 94 in the growth of the routing table as 
operators move from advertising /8s, /16s and /24s into aggregating 
that and reducing the number of advertisements. And what was 
exponential growth up until around 1998 turned into linear growth of 
the routing table which is not a bad level. Around there, I actually 
managed to find the second series of data, so this data here comes 
from the Netherlands.

I flicked over to Australia and there's now two things, one from 
Australia and one from the Netherlands, tracking through that growth. 
If the Internet routing table is an indicator of the growth of the 
Internet, then it's reasonable to say that the Internet boom started 
in March 1999 and what was a relatively gentle linear growth in 
aggregated routes turned into a mad splash of fragmentary routes from 
1999 until 2001. And over that period, the routing table more than 
doubled, close to tripled, in size. The problem that we had at the 
time was that that pace was far faster than Moore's Law. Moore's Law 
says that the number of transmitters in silicon will double over 18 
months and the amount of computation you can put in silicon doubles 
every 18 months. The worry that we had about here was that, if that 
trend continued, the Internet would get faster or bigger than we could 
make silicon for within a few years. And there was certainly some 
concern around then that we were going to get to the point where the 
size of the Internet would exceed the capability of silicon to 
actually manage it.

Around there two things happened - one - a recognition within the 
community that we really did have a problem. And secondly - the end of 
the Internet boom. Between the two, 2001 was certainly a somewhat 
different year. The other thing that happened in 2001 is one of the 
more remarkable data sources came online. This is a very large router 
operated at the University of Oregon that takes around 40 separate BGP 
feeds and allows you to slip them out so now, instead of just tracking 
two BGP tables, you actually see that I'm tracking around 40. 
Interestingly, not everyone sees the same Internet. This view up here 
is actually internal to Telstra's autonomous system.

And what you find is there are a large number of routes that are 
purely internal and they're very fine-grained - /29s, /30s and /32s 
and there are quite some thousands of them. Most ISPs would have the 
same thing. Internally, you have a lot of detail, externally, you 
aggregate and the bulk of the aggregation is there. So most folk see 
within about 7,000 or 8,000 routes of each other. Down here is 
interesting. I think the major point is actually Verio. There are a 
couple of others. A number of major transit ISPs do do prefix-length 
filtering and what they do is they publish a set of /8 prefixes and 
they publish the minimum prefix size they're prepared to carry and 
they're running a somewhat smaller routing table. Interestingly, it's 
the same growth level, the same growth trend, but they're down by 
around 10,000 routes.

Let's have a look at more of this because it kind of gets interesting. 
This is the amount of address space that we're consuming in the 
routing table. Currently, some 2 billion or a little under /2 
addresses have actually been allocated, about 1.8 billion. At the 
moment, 1.3 billion are advertised and the other half a billion 
addresses are dark, they're not advertised globally on the Internet. 
This is not a 10-year curve. It starts at 2000 so it's just the last 
four years. Interestingly, the growth went on linearly until early 
2002. There was a leveling of the amount of address space. Whatever 
was going on there, whether NATs were very popular in 2002, were they? 
Flavour of the month? Or something else that was more social and 
economic. It's actually hard to believe because there were a lot of 
DSL rollouts at that point.

For whatever reason, the growth rate was a lot lower across 2002. By 
2003 we're back on a more gentle curve and then, just at the end of 
2003, we're back into where we were again. It would be interesting - 
and if someone wants to do it - to actually look at snapshots of the 
routing tables to understand why we're getting these rather massive 
shifts in the growth of the address space but, nevertheless, yes, 
there are some shifts. How big are the prefixes we're advertising? 
Same four-year period - 9,000 /32s, 10,000, 11,000, 12,000 - over the 
last four years the average prefix length has certainly come down a 
little bit and we're now steady at around 10,000 entries. 10,000 is a 
/19, /20 average as I recall, I think it's about a /19. Although, in 
the last few months, it's come down again. There are a couple of more 
finer-grained prefixes - a large number actually - altering that 
average prefix length that's being advertised.

The other thing that's evident in the BGP table is that there's a lot 
of noise, noise where the information doesn't add anything. It's quite 
common to see a large advertisement, maybe a /16. And then smaller 
advertisements within that same space - /20s, /24s, etc. So we have 
130,000 entries in the routing table and, these days, around 65,000 of 
them, or half of the routing table, adds no new information. And, in 
general, the reason why people do this is often to do with local 
resilience and local traffic engineering. Unfortunately, it's very 
difficult to scope that. It must be very difficult to scope that kind 
of local advertisement because no-one seems to be able to do it 
successfully.

And a huge amount of the global table, half of it, appears to be 
driven by what appears to be an expression of local policy - probably 
not the best thing to happen. And, as you see, it grows and continues 
to grow. That's the percentage - 50% of the routing table, 50% pretty 
steady over the last few years. How much address space? If half the 
routing table is more specifics, if half of the load in routers is 
more specifics, you'd expect, I suppose naively, that half of the 
address space would be treated this way? No. One sixth of the address 
space, in fact, slightly less, around 11% of the address space, is 
over half of the routing table. So the local policies we're talking 
about are actually local policies on very, very small prefixes 
predominantly /24s, 256 entries.

So that what's happening is that the mice, the little things, are 
actually dominating the routing globe. If you get rid of those small 
specifics and look at the prefixes that actually add information, 
across that four-year period, it's actually quite a consistent curve 
so that, where we saw all these bumps and anomalies, I suspect that a 
lot of it is due to traffic engineering and multihoming in a local 
context and, globally, the growth of the Internet over the last four 
years has been remarkably even and, interestingly, in routing terms, 
not in socioeconomic terms, but in routing terms, relatively linear 
rather than exponential. Oh, one last thing. The number of Autonomous 
Systems, the number of separate entities that push stuff into the 
routing table, the number of unique Autonomous Systems - the last five 
years - that's the end of the boom. I was only measuring from one 
point there but it's quite a strong curve.

Around the middle of 2001, you see quite a pronounced need and now 
we're in a post-boom build-out, very linear and here's where I started 
turning on looking at route views. Not only do all of the neighbours 
of route views see a slightly different set of advertisements, they 
also see a different number of Autonomous Systems and the folk doing 
prefix filtering actually don't see up to about 1000 Autonomous 
Systems.

It's not clear to me that they actually don't see their routes. I 
believe that, in most cases, even when you prefix filter, what you 
cloud out is fine detail. You can still get your packets there anyway. 
Perhaps that's an area where more work can be done. That's an overview 
of the routing table and the basic message at this particular point is 
that the growth is no longer exponential and very hard, the growth is 
actually more gentle and linear.

Let's move on to one other report before I get on to the presentation 
and this is a resource that you may want to look at for your own 
autonomous system. It's a thing called cidr-report.org. What does it 
do?

It tries to look at which particular originating ASs are generating 
prefixes that could, in theory, be dampened down into being local 
rather than global. And here are the autonomous system numbers, the 
number of networks that are actually advertising and the number that 
could be removed if true aggregation was happening. And you find all 
kinds of interesting folk there with all kinds of aggregation 
possibilities listed through. The other thing I try and look at is who 
is adding and who is removing routes every week? So last week, 
something that looks very much like being in South America added 
another 14 prefixes into the global routing table and all these other 
folk added, you know, seven, six, five, four, three, two, etc.

Quite a few. Busy week. The folk who increased their number of entries 
into the routing table, the number of folk who aggregated or decreased 
and so on. Leading me into my talk is one more section of the report. 
I'm interested to understand how many people receive an allocation 
from an RIR and advertise it as that allocation and how many people 
receive an allocation from an RIR and fragment it and advertise more 
specific routes. Because I'm wondering if our policies, which 
implicitly suggest that what you get allocated is what should be 
advertised, are actually working or not. So there's a section on the 
report here that actually looks at who is advertising fragments of an 
initial RIR allocation and how many of those things are actually 
fragments.

And, as you find, there are a fair few interesting folk there and I'm 
sure you can read it as well as I can, some from this region, some 
from America, some from closer to home. They're actually advertising 
more specifics of the RIR allocation. So that neatly leads me into the 
presentation that I'd like to do this afternoon talk being a 
comparison of what the registries allocate versus what comes up in the 
global routing table.

As I pointed out, a number of ISPs introduced prefix-length filters on 
the routes they accepted to try and dampen down some of this noise. 
Although not everyone does it, a fair few ISPs do some form or other 
of prefix-length filtering. We've had a number of accidents over many 
years where a large block leaks all the specifics, operator error 
whatever, and sometimes that brings down other people's routers. So 
prefix length filtering is actually defensive and many folk do it. And 
the filters are typically based on the RIR allocation units. So if, 
out of a particular block, the minimum allocation is a /20, then you 
might find that these prefix filters have been put in for that 
particular /8 is set to a /20 and, if people fragment and separately 
advertise little bits, it doesn't spread across the entire Internet, 
the prefix filters catch it in various points. So the implication of 
all of this is that there is, I believe, a relatively widespread view 
out there that what the RIR allocates is what we should see in the 
routing table as a single aggregate. And that the intention of those 
folk who fragment advertisements is generally things about local 
multihoming, local resilience, spreading your load between two 
upstream and a certain amount of traffic engineering and, in theory, 
those fine-grained fragments should be scoped, no export, explicit 
communities that try and limit its promulgation.

In theory these fragments shouldn't limit around the world. The kind 
of question that I had in my mind is how good is the assumption that 
what you get allocated is what gets advertised? And, if that 
assumption was good or bad, have things been changing recently, are we 
getting better or worse at actually tending to the health of the 
routing system? So the methodology is relatively simple. If you wander 
through the RIRs' data repositories you'll see delegated files which 
are effectively a log of the allocations they perform and their size 
and you can take that log or prefixes in size and compare it to a dump 
in the BGP table, which is what I did. So the first thing to look at 
is the last 13 odd months, across 2003 and what I found was 4,500 
allocations which is actually, I think, a higher number than it should 
be and I think some of the early historical transfers from ARIN to 
RIPE got redated as they came through. So I suspect the really number 
is closer to 1,000 but I don't know. I can only take the data that I 
have.

So, with that in mind, we press on. A quarter or a little less than a 
quarter of those allocations aren't in the routing table. Interesting. 
In the last year, a quarter of the things that got allocated aren't in 
the routing table. You'd say, "Maybe that's all those RIPE early 
allocations." It's not. Even in APNIC, LACNIC and ARIN, where 2003 
allocations really happened in 2003, even though the allocation 
happened, it hasn't been advertised. However, 3,600 are. Of those 
3,600 registry transactions, the routing table has 11,000 entries so, 
on the whole, everyone takes an allocation and puts two more specific 
fragments on it. Naughty people. I looked at it again because I 
actually don't think that's the case. So I looked a little more at 
those 3,600 allocations and found that most of them were advertised 
precisely. If they were allocated a /16, they advertised a /16. To the 
bulk of it, 8,000 advertisements, actually come from 1,000 
allocations, roughly. So 80% of the allocations are good but 20%, one 
fifth, are bad and the fragmentation rate is around 6.6. So it appears 
that most of us are doing the right thing globally and a fifth of us 
are actually splashing a lot of fragments out to the Internet. That's 
a large and busy table. Here's what people got, here's what they 
advertised. The predominant thing is to take /15s, /16s, /17s and even 
/20s and slice and dice them into /24s.

Someone must have a textbook out there that is remarkably well read. I 
don't know why /24s but that's it. Most of it is just slicing and 
dicing into /24s. Not good data, not a healthy look. I thought maybe 
if I limit myself to the last one-and-a-half months, are we getting 
any cleverer. In looking at the last one-and-a-half months, 520 
allocations, 217 aren't announced yet. They're very recent so that's 
probably about right. They've just got their block, it may be some 
time before we see it. 303 are announced and 576 routing 
advertisements. So rather than a multiplier of three, it's a 
multiplier of two. Getting a bit better. Are we? Oddly enough, the 
folk who are reading that particular textbook from the 1970s are still 
reading it and still doing it because, even in the last six weeks or 
so, 78% of the allocations are correct but 22% are fragmented and it's 
4.6 that are fragmented. It's the same proportion who are doing 
exactly the same now. How many are in this room? 20. Four of you are 
doing it. In the same table, what's the most popular fragmentation 
point - /24s, although I noticed some /16s, which are actually quite 
large allocations.

These folk should know what they're doing - fragmented into /21s and 
globally announce them. I'm kind of interested now that nothing seems 
to be changing. That doesn't tally. We must be getting better at the 
job.

So rather than just looking at a few years previously, I looked at the 
entire table, the big picture for the last 20 years. Lots of numbers, 
aren't they fun. Pretty graph. What I've tried to track is the 
allocation versus the advertisement. People were allocated /16s and 
advertised them as /16s but also love advertising to /24s and you see 
that the most common way of fragments space is a /24 and there's also 
fragmentation around the /19, /20. Local traffic engineering. So 80% 
of the advertisements are 'as is'. Has that always been the case? 
Around 80% of the advertisements are 'as is'. If we did this exercise 
in 1989, it would actually be 90% of the advertisements would be 'as 
is'. We got a lot worse until 1994 when we really got bad. Between 89 
and 94, people were accurate. Through the boom, through to 2001, the 
routing space also fragmented. The boom was driven by, if you will, 
fragmentation in there. And, since then, we've been slowly getting 
cleverer again. So it's not all bad news. Have a look at the number of 
fragments versus the number of allocations. Is actually now very 
indicative of the problem space. Between around '97 and 2001, we were 
excessively fragmenting the space. Since then, we've actually been 
getting a whole lot better in managing to actually advertise what we 
receive as an aggregate. And last, and not least, look at the number 
of allocations, the proportion that are advertised as fragments. 
Again, you see that same curve that, since around 2001 when we peaked 
at around 50%, we're now getting better. This is good. So you see that 
spot where the boom occurred, the number of routing table entries just 
went straight up. That was actually fragmentation of the space rather 
than raw demand of addresses. What we actually saw was a number of 
small businesses come up, grab addresses from wherever they could, 
fragment and advertise them out to anywhere they wanted. And, at the 
end of the boom, we actually managed to restore what we might call 
'business as usual', although at a slightly higher level.

So, as I see, the reason why the routing table tends to have unbounded 
growth is, when the RIR allocations aren't matching the natural 
tendency of the industry to operate. That, when the allocations are 
too big, the address space gets fragmented down into smaller entities. 
Since late 2000, oddly enough, the level of fragmentation has dropped. 
So what do people do? They take an allocated block and slice and dice 
it into /24s. That's what the textbook said. But the textbook was 
written before BGP had NOEXPORT in it as far as I can tell. If you 
want to do it locally, you don't want the world to see it. Around one 
fifth of the operators out there, the ASs, do an awful lot of 
fragmentation and everyone else actually manages to do aggregated 
advertisements. When the RIRs started allocating /21, /22s and /23s, 
that will actually match the end point. You don't actually see further 
fragmentation of those smaller blocks. Those smaller blocks appear to 
match quite precisely the end point of the business. And, yes, since 
around 2000, we have been getting better at it and the BGP table is 
certainly better behaved which is, I suppose, good. Questions. Oh, and 
thank you.

RANDY BUSH:

Randy Bush, IIJ. Do you believe that most of the fragmenting 
announcements - if I filtered strongly, my packets would get to the 
destination.

GEOFF HUSTON:

Yes, I believe it and, yes, I can show you data for it, yes. 
Fragmentations are not without covering aggregates on the whole that, 
when you actually look at particular Autonomous Systems, what you see 
is the original /18 and then maybe a /19 and, just to be on the safe 
side, a /20 and maybe some /24s as well all inside the same block. So 
prefix filtering certainly allows you, I believe, to see precisely the 
same Internet as much as anyone does see the same Internet and is a 
reasonable practice to try and limit the amount of noise you are 
seeing in terms of updates and so on of those prefixes.

RANDY BUSH:

And load on router.

GEOFF HUSTON:

And load, yes. So, yes, prefix filtering is probably a reasonable 
thing to do to actually reduce routing load and increase, if you will, 
the efficiency of your overall system. Interestingly, a number of you 
folk are actually fragmenting right now, one in five. These online 
reports will actually tell you who are. Type in your own AS and you 
will see. And it's probably worth doing. Because it's actually quite 
easy to cooperate with your upstreams and reduce the scope of the 
fragment. It's quite OK to take a fee from provider A and provider B 
and advertise more specifics to each to balance your incoming load. 
But, quite frankly, the rest of the world isn't interested and perhaps 
you should be giving them a community - maybe your upstream has a 
community you can tag the routes with to say, "Look, suppress this 
from here on through."

Not interested in having it promulgated further, it's a local problem. 
If you are a transit provider, you should be offering your customers 
communities that allow them to say, "This is a more specific. Don't 
onadvertise. There are covering aggregates. It doesn't matter." So, on 
both the customer and the transit side, there are things you can do 
right now, simple things around communities, that actually limit this 
noise. Half of the routing table is this kind of noise. So the more we 
do in this, the more headroom we get in the routing system and the 
fewer spurious fine-grained updates go all the way round the world 
being amplified by BGP. This is probably a good thing.

PHILIP SMITH:

Have you contacted any of the people who are announcing these /24s to 
ask them why? Are they leaking their BGP on purpose or do they really 
believe that we still live in the old Internet or what?

GEOFF HUSTON:

Personally, no, Philip. I am aware that a number of individuals around 
the world have done various efforts of contact with varying degrees of 
success but, sometimes, when you contact them, you see a change. Other 
times when you contact these folk, evidently, the answer is either no 
answer or the answer is, "It's really complicated. It's so complicated 
we fix this. Wow, this is complicated" and that's the end of the 
conversation. I'm actually not sure. But this fragmentation has been 
around for a long time and those efforts to try and fix it at that 
grass roots level haven't been as good as they could be I suppose.

RANDY BUSH:

I might as well confess. Of course, I was the one who had the filtered 
policies and, in fact, we could reach everybody, etc and going round 
begging them to stop polluting was, ineffective. I wrote a large telco 
in the Pacific and he told me he was doing it for traffic engineering 
reasons and so on and so forth and so it was easier just to filter.

GEOFF HUSTON:

And, in some ways, filtering the noise out yourself gives you back 
some of your router space. It knocks off some of the updates in the 
processing load. It gives you back some of your routers and you don't 
lose connectivity by filtering, as far as I can tell. All these 
fine-grained factors are actually covered by aggregates.

PHILIP SMITH:

A question for the audience. Are there any operators here? Do you do 
any filtering of these fragments that are being announced for example 
like Randy has been doing. If so, do you want to comment?

ANDY LINTON:

Yes, we do and yes, we filter.

PHILIP SMITH:

No other questions, comments.

PB PATIL:

We are a no-export community. We are three. We have three ISPs. We 
export for traffic engineering purposes.

PAUL WILSON:

Hi. Paul from APNIC. We've heard quite a few times that routing tables 
have grown quickly but they are by no means too big for modern routers 
and modern routers could cope with routing tables of much larger size 
at least in static terms and, furthermore, that what causes load on 
modern routers these days - I can see Randy preparing - what causes 
load is the dynamism in the routing system, the number of updates 
which are - which need to be handled and so forth. Geoff, you provide 
a static view of how big the routing table is but you made reference 
to routes leaking out and trafficking the world and being amplified. 
I'm wondering if there is work being done in terms of the volume of 
announcements, the size of announcements and withdrawals and that sort 
of dynamic nature.

RANDY BUSH:

Yes, research has been done in that area and smaller - longer prefixes 
churn more. Also, just because giant routers, for which I must pay 
$500,000 can handle it doesn't mean that that's what I want to employ 
in a small, multihomed POP, right? So, you know, there's - 
consideration often makes sense in this world.

PAUL WILSON:

The routing table while, during the dotcom boom, it may have exceeded 
more, on average, over the last 10 years, it's been well below I 
believe. So where are we really in terms of what we can, what we can 
handle versus the size of the table that we're being asked to handle?

GEOFF HUSTON:

I've been a keen student of some of Randy's presentations and what I 
see is that the interaction of various instances of BGP often turn a 
single announce or withdrawal into a flood of announcements further 
away. I have not seen a comprehensive analysis of logs of updates and 
withdrawals in major transit spots although I suspect that, if you 
looked at something like the RIS, the RIPE routing information 
service, you'd find a wealth of data. The frightening thing is, it is 
so much data, it's actually quite difficult to process. What we're 
trying to understand in terms of scale is that you can't take a 
snapshot of logs. You've actually got to take a time series and do 
very, very deep analysis if you're trying to analyse the whole. So 
far, we've been unable to - I haven't seen any good papers in this 
subject.

We haven't been able to do that and the reason why I'm a keen student 
of the work Randy and his associates have been doing is because 
they're doing experiments on single prefixes and trying to understand 
the amplification factor. So we have this feeling that, as it grows, 
there's a multiplier in terms of the total load that the system has to 
carry and we're becoming aware, by those experiments, as to the 
multiplication factor. The next piece of work, which is a lot of work, 
is to understand the total upload on the system as a whole. The 
suspicion is that these fine-grained prefixes which are intended to be 
traffic engineering local load-balancing are actually less stable than 
the aggregates and that's by and large what we're seeing, although it 
needs a little bit more analysis and that tends to suggest that the 
higher degree of fragmentation, the worse position we're all in.

PAUL WILSON:

Thanks. I've got one more question which is in the area of training 
and education. Geoff, you gave some suggestions earlier about how ISPs 
might avoid, lessen their impact on the routing table, tread more 
lightly on the routing table, if you like. How strongly would you word 
those suggestions and are your suggestions strong enough for them to 
be incorporated by the APNIC Secretariat into the sort of training 
that we're doing quite regularly around the region. Should we be 
promoting better routing practice in the training efforts that we're 
undertaking?

GEOFF HUSTON:

Yes. Simple communities can solve a whole heap of these issues much 
faster, I think, than almost anything else. But, having operators 
understand how they can apply communities to dampen down more 
specifics doesn't appear to be obvious. Here's an example where you 
see an original allocation of a /17 sliced and diced into /20s all 
from the same AS. So there's no new information about routing going on 
here. There's no new policy going on. And the rest of the world 
actually doesn't need to see any of that stuff. It's not new 
information. And, if the upstream at slowly AS cooperated on 
communities and dampened down this, we'd have a much smaller routing 
table with a much reduced overhead in terms of superfluous load.

PHILIP SMITH:

Are there any other questions or comments that anybody has at all?

GEOFF HUSTON:

I have one question and I think David O'Leary is here and I'd like to 
direct it to you. At the time when we were facing large problems in 
the late '90s and we had a routing table of around 100,000 entries, 
there was this general suspicion that, if we ever grew, at that point 
in time, to around 500,000 entries, we'd have meltdown. But it seemed 
to be that the margin of error in the routing equipment of the day was 
around 500,000 odd entries. If you were going to make an approximation 
as to what current technology is able to do, is it possible to give an 
estimate of how big a table could be?

DAVID O'LEARY:

I guess this is an 'it depends' question, right? How many entries can 
we put into a T640 versus what can run in the real world and I don't 
have that number for you. I guess - well, I haven't thought about that 
specifically, I haven't looked at that, we haven't done those tests 
in, kind of, real-world Internet because I think one of the dynamics 
we're seeing now is with other services being turned out in boxes, the 
edge devices are seeing more routes than the core devices, possibly, 
and there are a lot of local other stuff, VPNs and so on. So there's 
kind of higher demand there and, what's the biggest we've seen? I 
don't know, but certainly a lot bigger, a lot bigger than where we are 
in today's Internet.

GEOFF HUSTON:

Are we talking millions rather than hundreds of thousands? Let me 
prompt you.

DAVID O'LEARY:

Yeah, absolutely, yeah. I think one of the things that's happened is 
we have some competition now.

RANDY BUSH:

Is that storage or proportional to return?

DAVID O'LEARY:

You know, that's a good question. We don't see bosses running at 100% 
utilisation on CPUs. If you do, tell us and we'll try to figure out 
what's wrong. Don't see BGP as being what's hurting large routers. 
Now, I mean, that's not to say there aren't, you know… as Randy said 
earlier, there are smaller routers out there on the edges that are 
being challenged but, you know, I'm the wrong guy to talk to about 
small routers, pretty much. I do have a question though. One is a kind 
of an observation and you can talk about this offline if you want, but 
you said that, prior to '89, I think it was, I don't remember exactly 
what date, you said there was a 90% correlation between the 
allocations and the advertisements and I'm confused as to why it's not 
100% correlation in that. I'm not sure how you configure on a router 
prior to 1992 in terms of how to, you know, summarise routes and 
de-aggregate routes. You get allocated class B and advertise that.

GEOFF HUSTON:

This is as a result of my methodology. What I don't have is routing 
snapshots for every day going back 30 years. I took today's routing 
snapshot and compared it to the historical allocation and what I find 
is that a lot of the old historical allocations are still in today's 
routing table unaltered. And it's only at around 1988, the allocations 
that happened around then, that they actually started to get 
fragmented. Now, whether they were fragged in 94, in 1998 or 
yesterday, I can't tell. But the original allocation is largely 
intact, which is actually quite strange and it's only the ones that 
happened in the late '80s that you start to see this fragmentation 
happening.

DAVID O'LEARY:

Maybe it's strange, maybe it's not. At least any customer, enduser, 
I've talked to - if you ask for some address space back or ask them to 
renumber, that's always a challenge. Especially the people who bought 
their class Bs in, whatever, 1986- there aren't a lot of them 
volunteering to give back space.

GEOFF HUSTON:

Right but, sort of, since then, in the mid 90s, you see that 
fragmentation.

DAVID O'LEARY:

I think all these dynamics here and one of the reasons why this is 
hard to figure out, what's going on, is because there are so many 
factors interacting. I know there are some interested people in this 
room but getting local exchange points, do we think that's actually 
making a difference in terms of visibility of advertisements. Does the 
world of Asia Pacific or Australia or something look different in 
Poland when an exchange point is involved in Indonesia? I mean, in 
theory, it should, right? To you does it?

GEOFF HUSTON:

I have no data to answer that.

DAVID O'LEARY:

Yeah, I didn't expect but it would be interesting to try to figure 
that out, if it does actually really - in terms of the exchange point, 
people think of those kinds of things. Again, there's so many factors 
here that drive all these numbers. It's hard to tell which ones are 
going to have the most impact - it's a lot more work.

GEOFF HUSTON:

From my view in looking at this, the observation that they're actually 
consistently covered by aggregates tends to suggest that the specifics 
are designed for specific response locally. And, to my mind, the 
first-up answer is, bilateral use of communities between transits and 
upstreams will get rid of a huge amount of load and noise from the 
routing system and get rid of these fragments, because they just don't 
appear, logically, to have global [inaudible]. Otherwise, they 
wouldn't be covered by aggregates anyway.

PHILIP SMITH:

OK. Thank you very much.

RANDY BUSH:

I just want to be specific and say thanks because you're about the 
only one doing research at this level and this stuff is critical to 
operations.

GEOFF HUSTON:

Thank you.

PHILIP SMITH:

Are there any other questions at all? Any other comments? Anybody got 
any other things they want to discuss or bring up?

OSAMA DOSARY:

Actually, this is relevant to Randy's question - to remove specifics, 
longer prefixes, what would be a good breaking - /21, /18?

GEOFF HUSTON:

I mean, the assumption that's being made in a lot of places here is 
that the allocation suits the business and the business should be able 
to advertise that as an aggregate and that, when you slice and dice 
into more specifics, it's predominantly because you want to alter 
incoming traffic through various policies and paths. Now, if that 
assumption that what you get allocated is what you advertise is wrong, 
then, when we start talking about minimum allocation sizes and 
allocation windows, maybe it's a good time to expose whatever 
conditions you have that that assumption [inaudible] doesn't hold but 
the predominant idea in the routing - I'll use the word philosophy 
here - is that an autonomous system is also a single unit of policy 
and you should be able to, from a global perspective, be seen as a 
single unit, even if your local interactions are more finely grained. 
I suppose the answer is we shouldn't be fragmenting at all if you're 
doing it right.

OSAMA DOSARY:

OK, we might be on the file you mentioned. This has to do with where 
we're located. We are a subsidiary of a larger telecom and we need to 
allocate more addresses and they have their own peering arrangements 
that are separate from us, right, and we are not allowed to request an 
allocation for them. We had to break up our own allocation for them.

RANDY BUSH:

I want to address your first question. When IIJ was a filtering Nazi, 
we filtered on the minimum allocations with the prefixes being 
allocated by the registry. So we knew for each - the registry is 
documented, you can find them on the websites - for each part of the 
address spaces what was the longest prefix they allocated and we 
filtered on that boundary and, for the traditional class B space, we 
filtered on /16 and it worked.

OSAMA DOSARY:

Are you talk being reachability?

RANDY BUSH:

Yes.

OSAMA DOSARY:

OK, how do you know it worked?

RANDY BUSH:

Because for hundreds of thousands of customers, we had no complaints, 
"I can't get the X". We also studied the unfiltered routing table and 
showed that, essentially, for any prefix, it was covered by a shorter 
prefix with similar pathing.

OSAMA DOSARY:

So, as a recommendation to prefix incoming routes, we can just, you 
know, just filter our /16. Is that your recommendation?

RANDY BUSH:

For Bs. For the B space /16, right? For the A space, pretty much a /8 
except now that the proportions of the A space are allocated to the 
registries for slicing and dicing into longer prefixes, you have to 
treat each /8 differently. 60 /8 you want to do on a - oh, some 
registry person tell me, /23 or /22 or something. 206, I think, you 
can do on a /19. Right? The registries have, each registry who has a 
block has a page that says for what range, what prefix is the longest 
prefix they allow for. So the filter is, you know, 50 entries, but 
you're not filtering packets, You're filtering route announcements so 
you're not going to load your router down. Am I making sense?

OSAMA DOSARY:

So you'd have to break it down, depending on the range, like, for A, 
you just do it as if it was A class, and for B as if it was a B class. 
But we have a couple of customers that actually are - they gave us 
from an A-class, so, if we did that kind of filtering, how would that 
affect the traffic?

RANDY BUSH:

All they have to do is get it to you and you can slice and dice it to 
your customers. All I have to do is get the ball into your park. The 
fact that you slice it up, that's your problem.

PHILIP SMITH:

OK. Do we have any more contributions? No, nothing at all. OK, well I 
would like to thank Geoff Huston for a very, very interesting fringes 
presentation, full of amazing information, as usual. And thank you to 
all the contributors who gave very enlightening and interesting 
discussion afterwards as well. It was quite useful. So we're finished 
about 25 minutes early. Hopefully, that will give you enough time to 
get ready for the APRICOT closing social this evening. Just a reminder 
- the buses leave at 6.00 pm from the reception - or the lobby, the 
front of the hotel. And remember you need your yellow ticket to get on 
the bus. Otherwise, you don't get there. OK? Thank you very much 
everybody for coming and we will see you at the next APNIC meeting.

APPLAUSE