Routing SIG, Thursday Aug 21, 4:05-5:30 pm

PHILIP SMITH:

I think we will make a start to this. It's just after four o'clock. 
Welcome to the routing SIG. I've put the agenda up on the screen 
there so you can see what we're going to talk about. But, before I 
actually make a start to this, I have some announcements I would 
like to make. First one is to thank Korea Telecom for being 
platinum sponsor for this meeting and, secondly, the MyAPNIC demo 
is on all day outside the meeting room. I'm sure you've seen it 
sitting out there just behind the door at the rear of the room. 
Finally, also, check the onsite notice board for announcements. So 
that's on the APNIC website if you have a look there for current 
announcements. The URL is there, if you'd like to check the status 
of the onsite notice board.

My usual virus or worm status. The system with this IP address - 
221.143.6.120. That user is being devious. Because I put up the 
address at lunchtime, he got a different address. You can't escape. 
I am watching all the time. You will get your Internet access back 
when you get your laptop updated at the door. You can change your 
address if you like. I will block outgoing access. It's 
irresponsible to come in here with infected PCs. You shouldn't 
connect to a public Internet if you can't set up your PC properly. 
That's my personal opinion. The address is 221.143.6.120.

OK. Outstanding business that we had to close off from last time 
around. The first one was the routing SIG charter. We discussed 
the routing SIG charter at the APNIC meeting during APRICOT in 
Taipei. This was basically the charter that the routing SIG 
examined - important issues of Internet routing and policy in the 
Asia Pacific region and globally. These issues would include 
Internet routing table growth, de-aggregation of provider blocks, 
routing stability and flap damping, routing security and the 
Internet Routing Registry. That's what the meeting agreed. After 
the meeting, I put this on to the routing SIG mailing list, 
received no comments whatsoever, so this is now formally adopted 
as the charter of the routing SIG.

Also for all participants, this is more my personal request than 
anything I have to do but I would like you all to speak slowly and 
clearly as English is not the native language for the majority of 
meeting participants. Also, if you're going to ask questions or 
make comments, please use the microphone. We have microphones 
running around the room for you. Also try and use simple and clear 
sentence structure rather than getting too complex.

The other open action item I forgot to put on the slide is that, 
at the last SIG meeting, it was proposed that Randy Bush be my co-
chair for the routing SIG. There were no objections to this at the 
meeting. Again, I posted this to the SIG mailing list, no 
objections there either so Randy is now formally my co-chair. 
Welcome. OK. So that's the preliminaries. Now let’s get on with 
the agenda we've got here. We've got five presentations so we 
don't have a great deal of time to go through those. Speakers have 
a maximum of 20 minutes each. Hopefully we can fit all this in. 
The first one is from Geoff Huston which is about hunting the 
bogons. 


GEOFF HUSTON:

Thanks, Phil. OK. This is about hunting the bogons. Let me explain 
a little bit about this. Firstly, it's research activity that's 
being supported by APNIC and what I'm trying to do is to 
understand to what extent do we see advertisements in the routing 
table of address blocks or autonomous system numbers that aren't 
registered anywhere as being allocated? Those are what I would 
call a bogon. The most common example that one finds is leakage of 
RFC 1918 private address space. Most commonly are what you'd see 
is instances where network 10 appears from time to time on the 
routing table. But, if you're really trying to understand what is 
allocated and what is not, you've actually got to wade through a 
reasonable amount of data. And this is the list of sort of primary 
sources here.

The first primary source is actually the IANA registry report that 
indicates what blocks, /8s, have actually been allocated to the 
RIRs for use in further assignments and what blocks are reserved 
and are not to be used. So, if a block is marked by IANA as 
reserved and you see it in a routing table, that to my mind is a 
bogon. The next level down is to actually have a look at 
assignments that have been undertaken by the RIRs. And, by looking 
at the stats files, the stats file is actually a summary of all of 
the allocations that the RIR has performed. Those files are now 
updated periodically. This slide pack is slightly old and, in the 
last couple of weeks, all four RIRs update that stats file on a 
daily basis so now you have an accurate view of what address 
blocks have been allocated. If it's not a stats file, it's getting 
to be pretty dubious but the stats files unfortunately are not 
complete so what I've had to do to get a reasonable set is to 
include Whois data if I can dig it up to get a second set of 
network blocks that are listed in Whois that appears to be 
allocate or assigned in some form or fashion.

I've been asked to swallow the microphone! Firstly, having a look 
at the IANA registries, there are two primary sources for AS 
numbers and address space. And, actually, having a look at them, 
there are some problems in trying to analyse that data. It's not 
actually a very clean registry ... path as you'll find that some 
blocks are sort of assigned as reserved but then have a user 
listed beside, such as 36/8. Is it really still Stanford's or is 
it reserved? I don't know. And blocks 49 and 50 are marked as 
returned to IANA but still listed as a Joint Technical Command. 
I'm not sure if they're reserved or not. I can't understand why 
the top-level blocks, 240 and upwards, are marked the same as the 
lower-level blocks. It's actually quite difficult to make a clean 
judgment from the IANA data as to what is unicast, assigned and, 
you know, available for use and what isn't. I would certainly 
appreciate a slightly more consistent IANA registry file. There 
are also some listings in RFC 3330 that, actually, I don't 
understand. 

The one that I find rather strange is actually 223.255.255.0/24. 
It's marked as reserved but subject to allocation. I actually 
thought that most of the reserve blocks are subject to allocation 
and I'm not sure if that's a bogon or not. There's a URL at the 
bottom, www.potaroo.net/IPAddrs. You can make some sense of it 
there. The next sense is the stats files which are all of the 
allocations that the RIRs have performed. They're updated every 
day. The problem is that it's not quite all the allocations. The 
early ones, which are part now of the Early Registration Transfer 
process, ERX, are only listed in the ARIN area and actually aren't 
in the stats files cleanly. The Whois data actually contains a lot 
of additional records, around about 6,000 or 7,000 at the moment. 
Some of them are clearly nonsense. 

If you look up RIPE's Whois database, you'll find an entry for a 
/24 drawn out of network two, obviously nonsense. When you start 
parsing the Whois data, it's not clear that you're seeing real 
addresses all the time. Also in RIPE, they produce issue files and 
it would be good if the issue files were the same as the stats 
file. They're not. So I actually go to Whois and have a look as 
well. There are some references there to what I use. Again, OK, 
but here's these blocks that I'm finding in RIPE. If someone is 
here, why RIPE, have a look at Whois 2.6.190.56/29 and you will 
get an answer. If the Whois is meant to be an authoritative 
listing of space, then there are some anomalies here. I'm also 
seeing the same from the other registries to some extent or 
another but I am assured the RIRs are working on this. 

I'm also seeing things in the BGP table that actually have no 
visible allocation whatsoever and, if you wander over to the US 
military Whois server, you find a whole bunch more of both 
addresses and AS numbers. For example, the block of autonomous 
system numbers from 1451 to 1533 appear to have been allocated 
over to the US military but there is absolutely no record anywhere 
that I can find that actually says this is what happened. So I'm 
making the assumption that the military aren't lying. So on that 
assumption and on a whole bunch of other assumptions, what's a 
bogon? If it's not listed in the IANA - sorry if it is listed in 
the IANA registry as reserved, it's a bogon. If it's not listed in 
a collection of RIR stats files, then it's a bogon and if it's 
being advertised, then ... really there. What I'm seeing could be 
a bogon. What I'm also seeing could be an inconsistency in the 
data. 

What I'm not seeing, what I'm not reporting on, are folk who 
intentionally steal address space, invalid allocations of address 
space. So, if someone is hijacking address space, I can't see it. 
I don't do any checks based on origin AS or anything else. 
Hijacking is not part of this report and I don't report on it. 
Other folk may be working on that. 

I'll go live in a second. You'll actually find the bogon listing 
in www.cidr-report.org. It's updated every hour and includes a 
list of possible bogons. I first started this in May and found 54 
autonomous system numbers that looked dubious. There has been some 
very good work. It's come down to 17 at the start of July, so 
there has been some work in cleaning up autonomous system numbers. 
This is what I'm finding right now. The US military, the OD 
network, some stuff out of Harnet as announced by 3362. I can't 
find any record for AS 3363. It doesn't exist. I can find nothing 
for AS 4665 but it does appear in the BGP routing table. The other 
thing I'm looking at is IP addresses. When I started this in May, 
I found 264. It's down to 173 so it's getting a little bit better. 

So what? Why am I doing this? Addressing is important. It's 
probably why you're here and, in terms of trying to make the 
network work properly, the integrity of the network works on 
uniqueness of addresses, but uniqueness actually depends on 
matching records to reality. So that what we want, what all of us 
need if we're going to rely on the Internet as being a reliable 
system, is that what is routed is real and, if it's not real, it 
shouldn't be in the routing table. So, on the list that I'm about 
to splash up, if you find yourself there, what should you do? 

If you're listed, then figure out why and the first thing to do is 
to check your own records to check that you really do have that 
autonomous system number or ISP block being assigned to you and 
continuing to be assigned to you. Check your own records. And then 
go back to your own RIR and consult with them as to their records. 
Because, ultimately, there are only two ways to get off this 
report and sending mail to me isn't one of them. Either stop 
advertising what you don't have. It's a very clean way of getting 
off. Or work with the RIR to update their data to reflect reality. 
IANA and the RIRs are certainly working on resolving the 
inconsistencies in all this and, as a more general mechanism, a 
web page is fine, but it would be good to have tools to allow this 
to happen interactively. 

So let's have a look at the CIDR report and see what we can find 
in the current listing of bogon addresses. I may have to up the 
size of that font. So, here are some of the addresses that I'm 
seeing that I can't find any valid allocation record for. The 
first one is being routed by AS 01/14/92 and they're described in 
the database as CABLEONE and here's the block showing that 
advertisement I can't find any records for that. As you see, in 
terms of address space, there is a reasonable amount of it as we 
keep on scrolling down and down and down. So go to the web page 
and have a look yourself to see if you're listed. There's no 
shortage of these things. 

Secondly, the list of autonomous system numbers that I can't find. 
For example, I can find no record for autonomous system 1495 
actually being allocated. Yet AS 668, which is this chap here, is 
announcing that address. So there is the list of announcers and 
these are the bogons that I'm seeing. This one here is interesting. 
It's actually multihomed, announced by both of these ASs but I can 
find no record that that AS is actually validly in the system. So, 
yes, please have a look. If you find yourself listed there in that 
resource, please check your own records to make sure that you 
actually do have the autonomous system or address space that 
you're advertising and then make sure, if you really do have it, 
that your RIR knows that you have it and that you can get yourself 
listed back into the stats files. And that is the end of the 
report. Any questions or comments on that? 


BILL WOODCOCK:

As somebody who used to own one of the ISPs, one of the ways that 
people get on there is by customers or people representing 
themselves to be customers, returning blocks to RIRs, perhaps 
without authorisation and it's really unfortunate that we've had 
all the hijacking problems lately because there were a lot of very 
good database cleanup efforts which were under way. That's pretty 
well stalled all the cleanup because a lot of the cleanup was sort 
of case-by-case decision-making, RIR staff having to look closely 
and make a judgment call and, of course, all of their freedom of 
decision-making is gone out the window with hijacking. 


GEOFF HUSTON:

So what you're saying is that, in the same way that ISPs have an 
issue, when a customer comes along and says, "I have some address 
space, please route it," how can you tell? It's quite difficult. 
You're suggesting that sometimes address space gets returned to a 
registry from folk who don't own it and the registry conducts the 
return and pulls it back into its allocated pool. 


BILL WOODCOCK:

Yes. 


GEOFF HUSTON:

I can only agree but the issue is how do we find that out? This 
looks in the routing table and says, "I can see an advertisement 
and, no matter how I search, there is no allocation information 
that encompasses that."


BILL WOODCOCK:

I'm not disagreeing with your tool. I'm just pointing out one of 
the reasons for the problem.


PHILIP SMITH:

Any other questions at all? No? OK, if not, thank you Geoff and 
we'll move on to your next one - IPv6 CIDR report.


GEOFF HUSTON:

This is actually a very quick report and it's more to let you know 
it's there if you want to use it. This is the URL of the CIDR 
report structured for IPv6 and what it's trying to do is much the 
same as the CIDR report does for IPv4. It's trying to track the 
aggregation behaviour inside the routing table and provides an 
analysis of the v6 space in terms of the combined RIR address 
space and the current 6BONE address space. The way it works is 
that it takes a full snapshot of the IPv6 routing table as seen 
from AS 1221 every hour and then analyses that table to have a 
look at address ranges and aggregation possibilities. The overall 
picture with v6 is certainly the case that there's not an awful 
lot of aggregation that could be performed on the table. Currently, 
the v6 routing table as I see it from AS 1221 has around 500 
entries and, if I do various forms of aggregation, and there are 
various forms of aggregation that can be done, you can take the 
full prepended AS path and only aggregate when the paths of both 
of the candidate blocks of addresses match precisely. Or, a 
slightly looser condition, you can remove all the prepending and 
just look at AS paths compressed and see if you can combine. Or, 
finally, the most vicious form of potential aggregation, if two 
adjacent IPv6 address blocks share a common origin AS, then you 
can suggest they'll be aggregated. So, what you find here are four 
colours - and it doesn't look quite obvious because the second and 
third are at almost the same data point. The red is the routing 
table. The bluish colour - let me point to it - is actually using 
AS path compression and the bottom one here is using just AS 
origin itself. So overall, the picture is right now there's very 
little fragmentation of the report itself. 

I'll quickly go through the report. The number that you get is 
actually quite high. The largest number of prefixes announced by a 
single AS. Because the view is internal to AS 1221, you end up 
seeing all of AS 1221's detail. I'm looking for the largest 
address span announced by any AS, which currently is VERIO, which 
is announced a /23 v6. You can look at some plots of that data, if 
that's what you want to look at. 

The next part of the report may be hard to read - it's very small 
font. It tells you the ASs that actually could do the most good in 
aggregating. Telstra could get down from 23 to 3. Because it's an 
internal view, that's not possible. From outside AS 1221, I would 
hope that you only actually see three prefixes. I'm also getting 
an internal feed from PCH so, again, even though PCH is listed 
second here, I think that's actually not a very good data point. 
What it's showing is that there's very little aggregation that 
could be ... format on the v6 address space at the moment. The 
other part that may be interesting for people who are actually 
tracking the adoption of IPv6 is actually listing who is adding 
prefixes into the routing table each week. So there is someone 
from Portugal, someone from Demon in the Netherlands, someone from 
SpeedKom in Germany that have added prefixes in the last seven 
days. Someone has changed from two prefixes to four in Belgrade, 
Yugoslavia and Mojo Networks has withdrawn its only prefix that it 
was announcing. 

So, on that scale, you can see from day to day who is coming and 
who is going. You can see in aggregate form what is going on. In 
terms of total activity, most of the activity is around the/32, 
/48 level. Yes, there are /48s in the global network and there are 
/64s in the global network, or there were. It's now gone back down 
to zero. Some of them slip out out globally. I'm only seeing one 
bogon in this table and it happens to be an autonomous system 
number. It's a private one coming out of PCH. It's not real simply 
because the period I'm having ... the PCH exposes internal detail 
that you probably wouldn't see. 


PHILIP SMITH: 

Thanks very much Geoff. OK, next up, we have two presentations on 
the IRR basically. The first one is a survey of utilisation of IRR 
objects by Kengo Nagahashi and after that we have improving the 
reliability of the IRR database from Masasi Eto.


KENGO NAGAHASHI: 

I'm from the University of Tokyo. This is the agenda for this 
presentation. First, we talk about the background about this 
research. Second, we talk about the goal to achieve. Third, we 
point out related work about this research. Finally, we describe 
our approaches. 

First, background, what benefit IRR offers? There are many 
benefits to using ... getting contact information. Second, router 
configuration, such as some applications. There are many benefits 
in IRRs. Firstly, how many IRR objects are registered? Is all 
routing information registered in IRR? So, our goal is to 
understand divergence between IRRs and BGP prefixes. 

So our goal consists of three parts. One is how many prefixes are 
registered in IRRs? Second - what difference in IRRs, such as RADB, 
RIPE, APNIC or other IRRs? So there's many considerations such as 
region, history, operation. Third - is IRR very practical for 
router configuration? So these are the three items of our goal to 
achieve. And there is a couple of related goals in here. We point 
out the RIPE RRCC. The point of similarity with us is comparing 
with BGP routing table and IRR database. So we are also using BGP 
routing table, IRR database. The difference point is we analyse 
RADB, RIPE, APNIC, JPIRR database and also unified database, which 
I will describe later. 

So our approach is how to match IRR and BGP routing table? There 
are two ways of matching method. One is exact match. Exact match 
is very straightforward method. That just means the IRR prefix and 
the BGP prefix is identical. The second point is the best match 
method. The best match is the collation of networks. If IRR 
registers this prefix and BGP announces this prefix, then it is 
correction network and is a match. So this we apply two matching 
methods to our BGP routing table. Also there is eBGP multihop from 
two ISPs. 

So this is a summary of IRR database. IRR database number of route 
objects as of 2003/08/11. RADB is here, RIPE is here, APNIC is 
here and JPIRR is here. So, using the unified database. So what is 
unified database? Here is a definition of unified database. What 
is unified database? Unified database is a combination of RADB, 
RIPE, APNIC and JPIRR. But there are a couple of duplicated 
objects. So we removed duplicated records. What part of duplicated 
I will describe later. 

So why unified database is needed? So, as you know, routing 
information is worldwide spread but, in IRR route objects, this is 
regional spread. So ideal database that covers all regions. So 
that is unified database. So we make unified database for this 
research. So unified database is … number of duplicated objects. 
That is the exact match method. So RADB and RIPE and APNIC and 
JPIRR. So total number of route objects in unified database is 
total number of route objects minus duplicate of these is 92,500 
records is total number of unified database. So we apply unified 
database and BGP routing table. So here is a snapshot as of 
2003/08/11. 

The exact match ratio is 47%, that is about 26149 of 54,663 that 
matched. And about best match, the ratio is 94%. That means, this 
ratio. Also, this is a summary - total summary of unified database, 
RADB, RIPE, APNIC and JPIRR. So this is also a snapshot of unified 
database. The difference with our picture is the Y axis is the 
number of prefixes. This is exact match and this is best match. 
This is about RADB unified database. The distribution of RADB is 
here. So, as we talk about best match - utilisation of best match 
is very high, 94%, about /24. Exact match is low - 47%. So best 
match is indeed very low so are we happy using best match for 
routing configuration? Is it still indeed best match is valid? So, 
we need to investigate validation of Origin-AS. Validation of 
Origin-AS is we are focusing on only RADB and RIPE. Number of 
total best match prefix is in RADB is here and about correct 
origin, that is here and incorrect origin is in here. So incorrect 
origin ratio is in RADB is 65%. In RIPE, total number of best 
match prefix is here, correct origin is here, incorrect origin is 
here and incorrect origin ratio in RIPE is 34%. 

So why high ratio of incorrect origin data in RADB? So one cause 
is obsolete data, for example, invalid origin at RADB is this one. 
So this is conclusion. So first of all is how much prefixes are 
registered in IRR? The unified database on average in best match 
is very high - 92.4%. Second goal is what difference in IRRs such 
as RADB, RIPE, APNIC? So, as I already said, RADB stores many 
unmaintained objects; in contrast with this, RIPE stores more 
maintained objects than RADB. We can't understand why incorrect 
origin ratio about this conclusion. For example, RADB incorrect 
origin ratio is 65% and in RIPE incorrect origin ratio is 34%. So, 
final goal is - is IRR practical for router configuration? Current 
accuracy of RIPE IRR is high. Therefore, it is relatively 
practical to make router configuration using RIPE authoritative 
IRR objects. 

Future - future investigation is still needed to clarify these 
differences between RIPE and other databases. You can see some 
statistics on this site. This is the end of my presentation.


PHILIP SMITH: 

Are there any questions at all? No? Everybody is very quiet this 
afternoon. OK, thank you very much. Next presentation is Masasi 
Eto.


MASASI ETO: 

Improving reliability of IRR database. Prefix validation using IRR 
database. Improvement of consistency among AS policies on IRR 
database. Our goal is improving reliability of IRR database. So 
more widespread use of IRR. Prefix validation using IRR database. 
So, one of severe problems in interdomain routing is hijacking. 
Why? The reason for this problem is one AS propagates invalid 
origin prefix. For example, in this figure, but AS 4133. In this 
AS 1 and AS2. So, counter major approach. One of approaches is 
authenticate prefix in BGP update. BGP routers exchange 
certificate. There are several candidates, such as s-BGP, soBGP. 
BGP holds over 120,000 prefixes. So it takes a long time to deploy. 
So the motivation is to check a correct prefix by lightweight and 
simple way. What we need to check? To identify invalid origin 
prefix. To use certificate is too heavy, same as sBGP, soBGP. How 
to verify it - we are using database. Our approach is first - 
router download request for database. Our second is response 
prefix/origin-as pairs. So example one and example two: Just 
checking. So, we need simple protocol. Download router requests 
download to DB. Frequency is once a day. The second is response. 
Database responses to router. Response Prefix/origin-as pairs in 
database. Problems to be solved? 

Future work - to hold 120,000 prefix/origin-as pairs is overhead. 
Utilisation of IRR - all entries are registered in IRR database. 
Duration of update - is once a day too long?


GEOFF HUSTON: 

Do we want to take questions now?


PHILIP SMITH: 

Should we take questions now? OK. Let him finish the presentation 
first and then we'll take questions.


MASASI ETO: 

Generate router configuration from routing policy registered in 
IRR with RIconfigure. However, there are many inconsistencies in 
the database. As a result, when we generate the router 
configurations from IRR database, the connectivity between them 
will be lost: IRR inspects only policy's syntax. To find out 
inconsistencies systematically, we need to check inconsistency of 
the information from other peers. This is an example of 
inconsistency of importing. In this figure, S 1 and S 2 are there. 
S 3, 4 and 5. This policy, which is configured to S 2, 3, 4 and 5. 
Then on other hand... in this policy, this is missing by accident: 
S 1 couldn't establish connectivity with S5. Next, this is an 
example of inconsistency of exporting. AS 1 and AS 2 are under the 
same. In this case, AS 2 according to the contract. However, in 
the policy of AS 1, to import is missing by accident. As a result, 
AS 1 couldn't establish the connectivity with AS 5. Classification 
of inconsistencies. In this research, we found out all the 
inconsistencies systematically. To examine how many 
inconsistencies there are in our database. And to prevent 
increasing inconsistency with checks, which consists of stuff, and 
it prevents increasing inconsistencies. Database checker inspects 
how many inconsistencies exist on unified IRR database. We're 
inspecting the inconsistency. There is an example of query to 
examine the policy. Users will see this in here. If policy checks 
any inconsistencies, it takes 30 seconds for inspection. This 
figure shows how many inconsistencies are on the policy of each AS. 
We have found that 55.8% of AS has at least one inconsistency. The 
detail of the inconsistency is shown in this table. It shows the 
level of each inconsistency. In this table, inconsistency number 3 
and number 4 have 36%. Here, AS existed in IRR database. Future 
work - we are going to deploy a policy checker on JPIRR and 
collect practical data. In the future, we will start a service to 
notify result of investigation to JPIRR users periodically. Thank 
you for your listening. 


PHILIP SMITH: 

Any questions?


GEOFF HUSTON: 

It was proposed to load the entire IRR database on every router or 
was it merely the origin prefix/origin-as pairs? We go up a slide 
or two. More. More. Right. so, as far as I can see, neither of 
these sort of approaches seem practical to me. That having every 
router that's doing external hearing, requesting a download to the 
database all the time doesn't strike me as a reasonable approach 
here, even only once a day. Nor the router issuing all these 
requests to the database. I'm not sure that I understand that just 
doing prefix-origin AS matching really gains you an awful lot in 
securing integrity of the routing system. I was kind of wondering 
are there other approaches that might be more scalable. Because 
what you've suggested here is simple but I'm not sure that it 
scales?


KENGO NAGAHASHI: 

What is better way to improve this kind of quality? Do you have 
any idea?


GEOFF HUSTON: 

To my mind, I prefer the approach of DB. Have information injected 
that you can trust. And then have its transportation within the 
routing system undertaken in such a way that it is not corrupted. 
The approach that you're suggesting here is kind of like half 
locking the door. It's still unlocked. And security doesn't really 
get answered that way. So while I understand the intent of what 
you're trying to do, in this particular space, I suppose I'm 
offering the perspective that realistically we do need to go down 
a path of SPGP if we want a routing system that actually carries 
authentically and validly, carries.


KENGO NAGAHASHI: 

This intention is very simple. So this is what I try.


GEOFF HUSTON: 

The initial observation that you make is contents of most of these 
routing databases are extremely badly maintained. And part of the 
reason why they're badly maintained is that there's no motivation 
for operators to make it accurate. Why should an operator spend 
money, time and money, maintaining a database that isn't used? So, 
it's sort of a chicken and egg situation, that if it's not used, 
it's not going to get maintained. If what you're suggesting in 
terms of using it is only a small part of what needs to be 
addressed in a secure BGP environment, then you haven't got over 
this barrier of making it worth my time to maintain my data. So I 
suppose I'm suggesting here that we're yet to see in the operator 
community a strong motivation why you would maintain the integrity 
of IRR data and the only reason why I can see it gets maintained 
is that if an RIR looks at that data in conjunction with an 
allocation request, then I'll update the data about a second 
before I launch the request and I'll leave the data go rotten 
until I next need to talk to the RIR and that means most of the 
data is not maintained.


KENGO NAGAHASHI:

Any other questions. If not, thank you very much for your 
presentations.


RANDY BUSH: 

What you've seen so far are some measurements of the static data 
in the IRR and Geoff comparing the IRR to what's actually 
announced on the network. We're doing some routing research and 
trying to look at BGP as it performs in the network and comparing 
what we think BGP is to what we're getting. We believe BGP works 
by an announcement and a withdraw coming into some router and then 
the BGP mesh - this router announces and it withdraws and says, 
"I'm making an announcement and withdrawing." What we see in the 
world is that announcement and withdraw come in and this router 
says all sorts of things - a lot of noise, OK? And we're trying to 
understand how much, why this happens? Is it due to problems in 
the BGP design? Is it due to problems in router implementations of 
BGP? Is it due to configuration itself? 

One of the ways we're looking at this is we have these things we 
call BGP beacons, which is a router that announces a prefix into 
the global Internet and then we can watch that from other places. 
And we know the specific prefix it's announcing. This is an 
example of one beacon. And we know when it is announcing and 
withdrawing that prefix. It's doing it at well-known times. So 
that beacon is going up for two hours, the announcement, dropping 
it for two hours, announcing, withdrawing, on a fixed schedule of 
an NTP timer, etc. This is a single-homed beacon. We have a 
multihomed beacon that has a more complex schedule. It announces 
to two ASs, at midnight it announces to one. At 2 o'clock it goes 
to announcing two, at 4 in the morning, it announces one, at 6 in 
the morning, it announces 0. It's connected to two ISPs and it's 
simulating a circuit to one of them going down, both of them going 
down, etc. We have instrumented one large ISP, a global ISP - I 
can't tell you who it is - this is over multiple months and we 
have measurements coming from all their routers, all routers 
appear with other ISPs and we're injecting a beacon in Seattle. 
That happens to be in my rack so it's in Seattle and we're 
watching it at all the edges of that ISP. We've actually relayed 
more than one ISP but we're only going to look at this one today, 
OK?
And what we actually were hoping for - we were looking for some 
other measurements about other things. But there's an old saying 
that the sound of discovery is not 'eureka!" it's "oh my God". 
This is the fourth "oh my God" we've hit in this. Watch for the 
notation - 2003, July 1, at 2000, the router made an announcement 
and the Seattle router where we're measuring saw that one 
announcement from AS A, in other words, ISP 1, it saw the 
announcement this is the AS the beacon is in - 3130 is the beacon. 
If the beacon is multihomed, this announcement says it went from 
no announcement to an announcement to ISP A. This says, "We 
switched from A to B and we saw the withdraw of B and we see the 
announcement of B." Pretty simple, pretty quiet. This is right 
next to where the beacon is. Remember the beacon was in Seattle 
and we're measuring it, we're looking at this router to see this 
announcement. We go to Chicago for a simple announcement of just 
raising the beacon to one ISP. We see four announcements and 
what's happening is an oscillation between four different nodes. 
We want to know why it's doing the oscillation, why is it doing 
this? Here we see much more complexity. Here we're going from no 
announcement to announcing to ISPs. We see 41 events - 39 
announcements and two withdraws in the middle of them. This 
happens in 26 seconds from the announcement. The announcement 
starts at 1300. At 1300:26 we see the last announcement in the 
sequence. They don't charge extra for the announcements but the 
vendors tell me that BGP is rock-solid stable. If this is rock-
solid stable, I wouldn't want to be standing on that rock. But, in 
fact, we are all standing on this rock. OK? 

Why is this happening? OK. Really, BGP is a path vector protocol. 
Remember RIP? OK? And it is a distributed computation in time and 
delay and this delay is made work by the timing of when the 
announcement is made, the final router, and I connect. IfI am a 
router and I connect to four other routers, they're actually in 
the BGP specification as something called the MinRouteAdvertTimer 
that says that I delay before propagating that route. I give it a 
chance to stabilise. Well, in fact, 30 seconds is advised and 
implementations vary. Some don't do it at all. OK? But - so - but, 
no matter what, the difference here you're going to see, the fact 
that some vary and some do it at all just exaggerates, just makes 
this much worse than it might be, even if they were all identical, 
the same problem would occur - it's just a statistical phenomenon. 

Of course, being from Seattle, I think Seattle looks better just 
because it's much nicer than Chicago. We also have much better 
latte. Here you are in Lotte World and you can't even get a latte. 
The messages in transit or queued up are not shown, MEDs and IGPs 
are not always shown. One sequence is explored - I want to know if 
I can understand why I'm seeing that oscillation, why I'm seeing 
that withdraw? I just want an example. Now, we're going to look at 
the actual topology. 3130 is connected to ISP A. These routers 
have long MinRouteADvertTimers. These are the MEDs and we just 
know that X is less than Y and ispA happens to have two ASs and 
they're actually using BGP configuration. So these are eBGP and 
these are iBGP and here's my monitor and here is - so, in reality, 
this is very common. The customer is connected to one of the 
aggregation routers, there are multiple aggregation routers, they 
are connected to two backbone routers and the backbone routers 
have links from Seattle to Chicago. OK? Very simple, normal ISP 
configuration for a large ISP. 

So, in state one, the announcement comes out. So R says, "My path 
goes to here." He propagates to here first and says the path goes 
to here, OK? Propagates here, path goes to here, we're now going 
to get the announcement here, the path goes here, so this is the 
AS path we get. C says "35 is the MED and 3130 is the origin". OK? 
We then get the next announcement in that this one comes through 
and this one comes through. This one finally decided to propagate 
this way. First, he propagated this way. Now, he's propagating 
this way. Oops, 34 is less than 35 - new announcement. OK? S2, 
notice, hasn't heard from R yet, because R has a long 
MinRouteadvertTimer. R tells S 2, OK, so S 2 now goes this way. A 
tells B, because A tells B this route became invalid, so B sends a 
withdraw. So now we have the withdraw. Then, we have the new 
announcement because I go over here - MED is 35. And then, oh, I 
can go this way, MED 34. And then, finally, R announces to S2. 
Oops, my arrow is wrong there because R - oh, yeah, it connects to 
A, goes there, path 35 again. It comes over here... and then we 
settle again, path and MED 34 and it's simple topology.  

We saw this: Simple announcements can be very noisy singly homed. 
In reality, when it's multihomed, it has two routers. One goes to 
a different ISP. That ISP peers with the original ISP. That ISP 
uses also a slow-announcing router. First, this guy announces 
quicker cause it's a fast announcer. We come this way, he gets 
this path, so he says, "Oops, AS A AS B gets the short MED, but 
let's not worry about the MED." Here, the slow path starts to 
learn, OK? OK, so it comes here and he's starting to learn it but 
he hasn't made it down there yet. So I'm still converged through 
AS B. And I have this path too. Now, I have this path, which I 
will take, because this is a customer of here, I will prefer it to 
a peer route. Remember, we always prefer customers to peers, 
otherwise you get inconsistent routing. So, now, AS A is the path. 
Then, that same switch we had last time, so we get a withdraw. 
Then, we settle on this path, so we have A:B. Then, we settle for 
this path, prefer it because it's a customer, we have A. So here's 
all the things we have. I believe that with multiple S nodes and 
multiple X nodes - OK, multiple of these and multiple of these 
connecting over here - we can have multiple withdraws in the 
sequence which we were seeing in the original. And it's been shown 
in the lab - in other words, a place with real routers and racks 
of them - that with reasonable configurations that NEVER settle, 
that you raise the multihomed announcement and it will keep going 
forever. 

You also might want to see Tim's paper on iBGP configuration 
issues that will give you some suggestions for configuring your 
system so they won't have these issues. But when my friend Curtis 
tells me that, if route withdraws from treated immediately and 
changes propagated more slowly - that's MinRouteAdvert then the 
withdraw order is one, the route addition is order one and the 
addition of a better route is order one and a route change where 
the better route is removed is order one. That's idealistic. It 
has nothing to do with what's actually happening on the net. 
Questions?


GEOFF HUSTON: 

It appears from what you're saying here that the 
MinRouteAdvertTimer doesn't time all the interfaces at the same 
time.


RANDY BUSH: 

Specified not to.


GEOFF HUSTON: 

It's making it a relatively coherent wave front of change into 
fragmented change that enduces oscillation.


RANDY BUSH: 

It's not by interface, it's by peers. For instance, if you have a 
multimedia interface... 

Secondly, there's a trade-off. If you announce everything 
immediately, you will have - this is shown by simulation. We 
unfortunately don't have reality measurements because we weren't 
measuring in 1994 before MinRouteAdvert was introduced. Simulation 
shows that - no delay - the network converges more quickly but 
with a lot of noise. As you increase the delay, the convergence 
time becomes longer - it takes longer for the information to 
propagate - but there's much less announcements. Now, as you 
continue increasing MinRouteAdvert, convergence time keeps getting 
worse, but it does not get much quieter. And, in fact, simulation 
shows that the point in the curve where increasing MinRouteAdvert 
doesn't reduce noise much more but only delays propagation - in 
other words, what is the maximum useable MinRouteAdvert - is 
dependent on the complexity of the topology and its .... Does that 
help?


GEOFF HUSTON: 

A little bit and I'm comparing it to stuff that I only dimly 
remember about hold-down timers and grip because I'm going back 10 
years. In the RIP hold-down timer model, the hold-down timer was 
expressed for all of my RIP neighbours and, when the hold-down 
timer expired and I released, I released the same information at 
the same time on all interfaces. 


RANDY BUSH:

But nothing really happens at the same time and remember, as you 
were looking at this stuff, that none of these routers implemented 
MinRouteAdvert. Even though they're on zero, the announcement 
doesn't come out at exactly the same time on both. 


GEOFF HUSTON: 

What I'm wondering - it's almost a simulation question - is that 
if MinRouteAdvert wasn't on peer and it was a local timer and you 
released simultaneously when the timer expired, have you looked at 
that in simulations?


RANDY BUSH: 

It helps but my point is that this essentially is that and you 
still get switching. It's the simultaneous of zero but the reason 
that MinRouteAdvert is there is that if there were more 
announcements - and there were no announcements. What particular 
value MinRouteAdvert has here is not interesting.


GEOFF HUSTON:

I was wondering if you have a non-zero hold-down happening on S 1, 
S 2 A and B, but it was a true RIP-style hold-down, would it do 
what the router vendors think MinRouteAdvert actually does?


RANDY BUSH: 

My point was that it will change the statistical probability of 
the worst events. It will not remove them.


GEOFF HUSTON:

I'd agree with that. It feels intuitively right.


RANDY BUSH: 

It just changes the curve. And this is inherent in the protocol 
design. There are things you can do in your topology to reduce the 
problem and that was Tim Griffin's paper to which I referred to.


GEOFF HUSTON: 

Which gets down to almost an invariant that routing complexity has 
a lot to do with mapping policy over topology and scaling factors 
has less to do with the actual amount of prefix load you're 
actually carrying. Right. It's great work thank you, it's a 
wonderful presentation, very illuminating.


PHILIP SMITH: 

Thank you, Geoff. Any other questions? OK. Well, thank you very 
much, thank you Randy. As Geoff said, a very interesting 
presentation. Well, that brings us to the end of the routing SIG. 
Has anybody got anything else they want to say or talk about in 
the three minutes that we have left? If not, I'm going to do my 
usual virus announcement. We have two more that appeared within 
the last 20 minutes so, if you're in the room, hang on. I'll get 
my wire. You are up here - 221.143.6.155 and 221.143.6.156. You 
got it about 20 minutes ago because that's when you started 
blasting the network. I suggest you disconnect from the network, 
go to the APNIC help desk and find Darrin who will put it right 
for you. OK, apart from that, I have nothing else to say. Just a 
quick announcement that the APOPs BOF, starting at six o'clock, 
it's down on your program as being in the Emerald Room. It will be 
held here because there is no other BOF and we just want to make 
use of this facility, given that it's all set up and ready. APOPs 
BOF will be at six o'clock. You have half an hour comfort break 
and, if you'd like to come back for the Operations BOF at 6 pm, I 
look forward to seeing you then. Otherwise, if not, thank you for 
attending and we'll see you again for APRICOT 2004 in Kuala Lumpur 
at the end of February next year. Thank you and thanks to the 
speakers. 

APPLAUSE