______________________________________________________________________

                           DRAFT TRANSCRIPT

                              Routing SIG

                       Wednesday 1 March 2006

                               2.00pm

______________________________________________________________________


PHILIP SMITH:

Good afternoon, everyone. Welcome to the Routing SIG. This is the
first of two sessions of the Routing Special Interest Group.
Unfortunately, we're competing a little bit with the routing and
operations track but such is the scheduling that we have a conference
the size of APRICOT.

Randy Bush and I are the chairs of this special interest group.
Randy was somewhere here. Where's he gone? Oh, he's right there,
hiding at the back. Randy and I are co-chairs of this special
interest group. We hope we've got together an interesting enough
program for you all. I would like to start off really by thanking
all the speakers who have volunteered their content and their time
to participate in this session.

The mailing list, if you're interested in subscribing and if you're
not already a member is as on the screen. It's sig-routing@lists.apnic.net
and you can subscribe to that through the APNIC mailman system.

So we don't actually have any action items on the Routing SIG, which
is fine, so we don't have to look through any of those so all we
really need to do now is start with the actual presentations and
first up we have David Hughes, who will talking about BGP convergence.

DAVID HUGHES:

Thanks, Phil.

Right, as Phil mentioned, the general subject matter is BGP
convergence. More specifically, I cover better handling of what
we've been calling silent peer failures. It's something that is
growing more and more important as more and more people are taking
up services delivered over some of the newer technologies offered
by the telcos. One small apology I make before I get started is
that the networks that are involved with running are predominantly
Cisco-centric. Where possible, I've tried to determine through
public documentation the availability of certain features for
Junipers but, obviously, without exposure to the equipment and
software, some of it may not be accurate. But every effort has been
made to provide as much detail as possible.

So the basic overview of the presentation. Just a quick overview
of what we term as a silent peer failure and a then a look at some
of the technologies that have been worked on by various groups that,
in the future, will hopefully make this a problem of the past.
Unfortunately, as I mentioned, they are technologies of the future,
so we'll have a quick look at what can be done to help with the
problem now and then we'll present some operational experience from
what we've done with those tweaks.

But a quick overview of what we term a silent peer failure. Normal
network situation - a couple of routers. Unfortunately, these days
more often than not, you will not have a direct layer 2 connection
between the two devices. With the prevalence of technologies such
as metro-ethernet, we're seeing that the chances of you having layer
2 connectivity with DC connectivity between the two devices is
getting lower and lower.

So, in this situation we have router A and B connected via a switch
and BGP between them and in this situation B has a default rout
from A, A being the next hop. At some point in time, the link between
A and the switch disappears. Of course, router B still has enough
interface and will quite happily then continue forwarding traffic
it what it thinks is still a very valid next hop.  Obviously, router
A not being there, that traffic is just going to be black-holed for
quite some time unfortunately.

The concern here is the time it takes the BGP to actually determine
that that is no longer a valid path. So that's what we're terming
a silent peer failure, when the sending router has no idea that the
peer has gone away.

So the main problem with this is that the time to detect the lack
of the peer is ridiculously long. Looking through the BGP spec,
we've got two timers that are of interest. We've got the hold timer
and the keep-alive timer. The hold timer being the maximum number
of seconds that the peer will wait between receiving a keep-alive
message from its peer. The hold time is a negotiated value. When
the session is established, the two boxes will specify what they'd
like to use for the hold timer and the lower of those two values
will be used within reason. The keep-alive timer is historically
set at one-third of the hold timer and if a device has not sent a
message in that time frame, it should send a message to keep the
keep-alive time active.

The problem is that standard implementations on devices are using
ridiculously large values for these timers. The Cisco in particular,
the default keep alive-timer is set at 60 seconds, giving you a
hold time of 180 seconds. If that peer fails silently, you can and
will blackhole traffic for up to 179 seconds. Juniper's much better
in the fact that they run 30-second timers with a 90-second hold
timer and that is actually compliant with the RFC. The RFC does
specify - it doesn't specify, but it recommends 30-second keep-alives.
Even the best situation on a default config on a Juniper, you're
still going to blackhole traffic for 89 seconds. If you're trying
to offer, you know, your classic five lines of reliability that
everyone is trying to achieve, however unrealistic that may be,
giving you 316 seconds per year of downtime, you can't possibly
have the situation that you might blackhole traffic for up to 179
seconds running standard Cisco timers.  If you offer SLAs like that,
you cannot run standard timers.

How can we get around these problems? There are a couple of things
coming out that do help. An obvious one that is being made more and
more available is next hop tracking for BGP. The whole premise being
that it's event driven from your IGP and, if the next hop of prefixes
that are in your route from the IGP, they'll be dropped. There's
no waiting for the timers to expire and for your internal network
it's a very good solution. It's basically a fundamental premise of
routing that your next hop should be reachable and it's quite
astounding that it's taken this long for something as simple as
keeping an eye on the next hop to reach what is mainstream routing
platforms.

The downside, however, is that it's obviously not so good for your
eBGP session unless you're running an iBGP upstream which is obviously
not going to be happening. From my reading of the document, Juniper
has no such mechanism at such point in time. I would stand to be
corrected but from my readings it's not the case. To make it worse,
it's limited on IOS at this point in time, shown there as the list
of trains where you'll find it.

Unfortunately for people who do a lot of aggregation using 65
hundreds there's no sign of it there. It's a good feature but it's
not widely acceptable at this point in time.

Another emerging technology that helps considerably is bidirectional
forwarding detection. BFD, for those who haven't come across it
before, is basically a very aggressive hello protocol. It's designed
specifically to run where possible between the forwarding planes
of the devices leaving the control planes out of the equation. The
premise there being that, if the control plane gets busy for whatever
reason, it doesn't have an impact on the forwarding plane's ability
to talk to each other to maintain the consistency of that circuit.
The way it's been implemented, it works over basically anything
you'd think of, directly attached connections, tunnels, you name
it, it works over it. And reading through the RFCs, it's designed
to be very aggressive.  It can be specified in microseconds. The
next thing is that the two main guys behind the IETF draft, one
from Cisco and one from Juniper and everyone is pushing it as
something everyone should have so hopefully we'll get widespread
deployment and it's a good solution for eBGP sessions.

Unfortunately, it's only good for eBGP sessions if your upstream
supports it and, particularly in this country, I'm not sure how
long it will take to see a monopoly telco provide something like
BFD on its sessions.

Another slight negative for BFD is that it does require some
interaction back-up obviously to the routing protocols so there are
different standards being written once again for the IETF working
groups on the interaction between BFD and BGP, OSPF and IS-IS. As
far as trying to get any support for BGP, it's incredibly limited.
It shows you up there platforms where you might be able to find it.
Mostly we're seeing that it's being deployed for IGPs only at this
stage which doesn't help us a lot. If we had next-hop tracking being
able to yank prefixes as a group from an iBGP failure, that would
be great. If you want to, you can jump in at 12.4T and you get it
for eBGP in there as well. If you have a CRS1 lying around not doing
anything, you can have it there too. I didn't have one at the time
of testing.

A general solution for today - I mean, the underlying fundamental
problem here is that the value of the timers that we inherit by
default configurations of our routing devices are insanely long.
So the obvious solution is to just decrease the timers. We had a
bit of debate on one of the public Cisco-orientated mailing lists
about my proposition that this was a good idea and one participant
in particular was pushing forward the argument that using a high-level
mechanism like BGP timers to try to detect a underlying link failure
was fundamentally flawed. It is but, if there's no other mechanism,
I don't see what choice we have.

IOS supports a configuration within the BGP standard to tweak the
timers to whatever you wish. You can drop them to whatever you want
to run internally or externally to try to get your way around this
problem.

The only problem is that your upstream, if you're doing this
externally, has every right to reject your session if it doesn't
like the values you've presented. The RFC specifies quite clearly
that 2 or below isn't an option and it will definitely reject it
there. However, there is - up until very recently, there has been
no way for the upstream to actually reject it based on the time.
There's a very recent 12.0S release for IOS that has a new node
that's been attached to it so you can configure what you believe
would be a reasonable minimum value for the hold timer but, as I
said, that's only recently been added. It's only a 12.0S, nothing
newer and hopefully that won't get too far out there in the wilderness.
Juniper doesn't have any such feature.

So hopefully that means we should be able to tweak our timers to
something more reasonable without our upstreamers getting more
angry. Alternatively, if you have large aggregation routers, it
might increase the load and it might be something to keep an eye
on. If you're running incredibly aggressive timers, you might see
some issues if the router gets busy.

We haven't had any problems like that. We've been running this in
production for some time now. We did some trials and it worked fine
under the stresses we could think of. We didn't see instability of
the peer sessions. We rolled it out around our upstreams. Webcentral
has connectivity to basically all the Tier-1 upstreams. Don't get
link to upstreams all over the place - we don't get that problem.
We're running 5-second keep-alives. We're getting maximums of 14
seconds of blackholing if we experience a silent failure and we
haven't seen a single issue so far. Because we have no next-hop
tracking internally, we're looking at rolling this out internally
on one-second keep-alives in the network itself. It's something
we're going to be trialling in the near future. We think that will
give us something a little bit closer to what we need until the
other features become available.

So in summary, we do have a light at the end of the tunnel. This
is a problem that hopefully will be relegated to history once people
get on board the likes of BFD and providing us IGP-based next-hop
tracking in more commonly used releases. This is a very good interim
solution and as I said, the 3x5-second keep-alives have been working
very well and if you're happy to run Cisco default values then you
obviously don't care about reliable passing of your traffic. Any
questions?

PHILIP SMITH:

Are there any questions for David?

In my haste to get started, I forgot the housekeeping list. If there
are any questions, can you please come up to the microphone, state
name and affiliation before you ask your question.

Thank you very much, everyone.

APPLAUSE

PHILIP SMITH:

OK, while Geoff is setting up for his routing update presentation,
I will go through the housekeeping list that I omitted to do at the
start.

As I was saying, if people want to ask questions at the end of the
presentation, please come up to the microphone. I think there's
also a travelling microphone wandering around as well if you're not
mobile. State your name and affiliation, again, for the benefit of
the microphone, and also I should point out that this session is
being broadcast and audio-casted so it makes it more important that
people use the microphone so that the people who have joined us
from elsewhere can hear what's being talked about.

Afternoon tea is in the level 2 foyer area, so it's basically that
way, to my left, to your right.

The social event is this evening. If you've got your ticket, please
bring your ticket and the details of where it is are listed on the
back of your ticket. APNIC staff will collect the ticket from you
when you board the bus and that will actually leave - the bus that
is - will leave from level one plaza deck outside. Last bus will
be 7:10, so please don't miss it.

Next item - MyAPNIC and the policy flash demo are running all day
at the APNIC help desk. The help desk is available during break
times - morning, lunchtime and afternoon breaks.

Onsite notice board - again, it's advised to have a look at the
onsite notice board on the APNIC website for any last-minute updates
and so forth. There's a special session at 4pm today in meeting
room 3 discussing the APNIC fee structure. That's an open session.
Anyone with an interest in that topic is welcome to attend and is
invited to participate in discussions, although I'd much prefer you
to come to the Routing SIG but that's up to you. That's the
housekeeping.

So the next presenter is Geoff, who will be giving us a routing
update.

GEOFF HUSTON:

I will. Thank you. Good afternoon.

I seem to do these at every routing SIG, giving you an idea of
what's happening inside the BGP routing table. I've got three parts
to this presentation today - one is a status report and then I'm
actually looking at work based on a question that Vince Fuller asked
me a few months ago that I found it interesting to answer and further
observations after that.

Normally I use hourly snapshots I pull from routeviews but this
time I used a complete dump of the data and I must thank Stephan
Millet of Telstra for assisting with some of the data used in the
presentation. My disk is now full. Thank you.

Usual picture. This is 2005 January through February, the BGP
prefixes. It might look like November and December were tailing off
but be assured that January and February of this year, you have
come back again and routing is back on once more.

What does it look like? You put a line across and go, "The number
of prefixes in the default-free zone across last year rose from
150,000 to 175,000 prefixes in 12 months." So life is still increasing
the way it always was. The amount of address space is kind of
interesting - 4.4 billion addresses in IPv4 if you try and use them
all. It started at one point through 6 billion at the start of the
year and finished at 1.5 billionish. Those big jumps there - there
are till two /8s that appear and disappear like lighthouses. It
amazes me that there are /8s that flap but there are and there they
are. You can draw a line across the top of this and I've eliminated
the /8s, smoothed it out and then you see pretty cleanly that the
amount of address space rose 1.36 billion to about 1.5 billion
addresses. Some seasonal variation. Some of you took holidays over
the Northern Hemisphere summer and you're slightly below average
but then got back to work in October and decided to add more addresses
into the network. We appreciate that and thank you.

LAUGHTER

The total number of AS numbers - it's the same kind of curve. Very
consistent. Unlike address space, AS number appearances on the Net
keep romping through so the trend line is spot on from 17,500 up
to 21,000. Somehow ASes are remarkably consistent unlike addresses
or routing table entries. The addresses that keep on appearing,
appear almost like clockwork. It's strange. So the vital statistics
- prefixes up by 18%. Roots and more specifics - are we getting
better or worse at this business of only advertising the aggregate?
And the answer is no - no better and no worse. Because the number
of basic root advertisements that, if you will, encompass new space,
rose by 17% to 85,500 but the number of more specifics predominantly
/24s but a few others in the mix, also rose by about the same amount.
So around 50% of the network is still more specifics and around 50%
are basic root prefixes. The amount of addresses rose by 10% so the
granularity of advertisements is getting smaller, not larger, again,
yes, you know, lots of little advertisements because the address
space is not growing as quickly. The number of AS numbers up by
14%. What can I say? The average advertisement size is smaller.

Address origination per AS is getting smaller. ASes are moving down
the food chain. Interestingly, the average AS path link is remarkably
steady at 3.5 ASes. 14% more ASes but the diameter of the network
is constant, the density of interconnection is increasing. It's an
interesting question - whether this is uniform density increase or
whether there are black points where the amount of density is
increasing.

In other words are exchange points and similar of their ilk actually
gaining strength or is the bombing-out of the long-distance transit
fibre market causing interconnection to appear across longer spans?
Some work is happening, I think in Adelaide, right now around
topology of the network and I'd be keen to see if they're working
on that. It's interesting because the denser the mesh, the more
badly BGP behaves when it tries to converge. So trying to understand
if the explosions are global or local is probably an interesting
thing to understand. However, on a macro level, the network is
getting denser. The advertisement granularity is getting smaller.
More interconnections, more specifics. By contrast, this was v6.
Similar growth except you've got to look at the numbers. The number
of prefixes rose from a phenomenal 700 to a phenomenal 868, I
believe. More patchy - you can obviously see that some people decided
it was a bad time. In August, they mucked around with v6, and then
got bored. After that, they all went home again.

More noisier - the advertised address span. This is weird. Two big
spikes and everything going down. In other words, the blue line is
actually decreasing each time. Why is that? I took away the 6Bone.
And now what you actually see - and this is this issue that a /20
is much, much, much, much bigger than a /32. So here are all the
/32s, blip, blip, blip, /20. That's a /21 and that's a /20. It's
hard to show you the growth in v6 address space apart from saying
two whacking great big allocations happened that year and got
advertised and a few little ones but you can't see them.

There's the 6Bone. You've now got precisely four months to quit and
so far each of you are quitting. Each jumped down as a /24. The
6Bone is slowly being turned off. Here's that combined view now. I
don't know if it's any clear. That's the big picture, that's the
bit left after 6Bone and that's the 6Bone slowly flying off and
without those two big allocations being advertised, that's what we
actually have. You just can't see it. Someone's given me a laser
pointer. The bit without the big allocations.

Advertised AS numbers - noisier. Probably because 500 to 600 - so
there are 600 of you playing in the v6 game as far as I can see,
some in 6Bone, some not.

So what can I tell you about this? Prefixes up by 21%, routes by
15, more specifics by 21. Naughty, naughty, naughty. You're not
meant to disaggregate in v6. Stop it!

LAUGHTER

The amount of address space went up by a phenomenal 50% because two
allocations happened to be advertised and went separate up. ASes
up by 20%. The average advertisement size as a result of those two
massive ones is getting enormously large and origination per AS is
getting large but only because of those two factors. The path link
I can't give you much of a view on. Such a small network that you
can't see what the average AS path is and the interconnection
degrees, I can't tell. This is a network that continues to go large
with little overlays at the edges and trends really aren't there
yet.

Part one.

Part two - more interesting. If you were buying a router and you'd
like it to live for three years inside your network doing what it's
doing and it's trying to run in a default-free zone, what spec would
you tell the vendor to build to? You know, how many prefixes, how
many prefix updates per second? Vince wanted me to answer two
questions - v4 and v6. I took the easy one and did v4. It's hard
to predict v6 because it's such a small network. I can't tell. I'd
like to try and have a shot at this answer in a v4 context.

And for this I'm actually taking a macro view so I've taken the
entire set of update withdrawals from AS1221 for 2005. Because
you're inside a relatively busy network, there's a whole bunch of
local updates also happening from inside the network so I've basically
tried to filter out everything that I don't think came from the
default-free zone so smacked out a whole bunch of updates from
there.

What I'm trying to do is to see if I can relate the number of updates
and withdrawals against the number of entries in the RIB. And at
the same time, I'm also looking at the CPU load records from that
router that was supplying all the updates and again try and see if
there's a relationship between the amount of CPU that's being used
in that router on some kind of granularity against the table size.
Now, if that's the case, if you can do a table size predictive model
and you know the relative number of updates and withdrawals in CPU,
you have a vague idea of how big the thing should possibly be
somewhere in three to five years' time. So that's the methodology
of trying to answer that question.

Updates per day for the year in millions - default-free zone. So
I've filtered out a fair few. At the start of the year, somewhere
around 300,000 updates per second, BGP updates, messages, were being
caught at the router. By the end of the year, it was slightly under
600,000. That's big, an enormous amount of updates. Notice also
that this is not uniform. Now the law of very, very big numbers
says that, if each of you contribute a little, and there are 21,000
ASes, that line should be smooth. So the law of large numbers isn't
working. A variation on that number is so big, up to 50%, that I
suspect that each of you aren't contributing a little. And that's
what I want to talk about a bit later. Notice that astonishing
variation. That is not nice.

The number of prefixes per update message. How efficient are we at
packing it into single updates? Getting worse. In other words, the
update is getting closer to describing one prefix, not a bulk of
them. The network, in terms of the routing policy granularity and
update is decreasing, granularity is getting smaller and the number
of prefixes vary a lot. Some folk are doing very strange things.
I'm trying to understand that. Number of update message per day -
unfortunately, it's close to double, as I said, it's highly variable
and it shouldn't be. The number of prefixes per update message is
falling and I'm just wondering if this is actually due to this
increasing use of ASes to do multihoming at the edge, that I'm
actually seeing what used to be an ISP with 50 prefixes, now an AS
with 20, and a whole bunch of ASes underneath it that are starting
to multihome.

But for some reason, the number of prefixes is falling. And now
time trying to understand - it seems to be that the update rate is
increasing faster than the table rate. That update rate was almost
double. In other words, the number of updates happening in the
network is increasing faster than table size. Is there some kind
of multiplicative factor going on? Or is something else happening?
Some kind of thing getting larger than the routing table itself?
This is not good news.

Now, maybe if I stop looking at update messages and start looking
at individual prefixes. How many prefixes change per day and what's
their trends? What I've done now - I'm actually looking at prefixes,
these are updates, those are withdrawals, in millions. So the
withdrawals, 200,000 going up to 300,000 in a day. Huge number of
withdrawals. The updates of prefixes is even noisier. This is weird.

I can put a trend line on the prefix update rates and, you know,
yes, it has increased. Around 800,000 update prefix messages. You
know, the prefixes that actually got updated each day by the end
of the year.

Withdrawal rates, you're going to actually start to see - that is
an exponential line, and even though it's noisy, there's clearly
an exponential growth factor in the withdrawal rates. So high
variability and approximately exponential but at different rates.
The updates are growing faster than withdrawals. So now, can I
relate that to the size of the network? The actual number of entries
in the network itself? So that's the default-free zone across the
year, that's 100,000, that's 170,000 and you can see pretty clearly
there that that's not linear now. It really is a bend going on
there. I've smoothed it out and done a first-order differential.
The default-free zone in terms of number of entries in the RIB is
growing faster than linear and is actually growing order in squared.
If I look at it as an order two polynomial - how many RIB entries
in three to five years, Vince?

Somewhere between 275,000 in three years and 375,000 in five years
would be a prediction. My guess at the confidence interval was about
20% so it's not that confident, but that fit isn't bad. That appears
to be - if you're looking at three to five years, that is appears
to be the metric you're looking at.

Now, I've done the next thing, which is, for each RIB entry -
100,000, whatever - how many update per RIB entry over the year?
If that was linear, that would be growing at the same rate as the
table itself. If it's more than linear, then the number of prefixes
being altered each day is growing faster than the table size and
this is the number of withdrawals per RIB entry. Is that weird? One
withdrawal per RIB entry, on average, every day, the entire table
is withdrawn or there are a very small number of prefixes that are
doing an awful lot of work - withdraw, update, withdraw, update.
But that's one. Every single entry is updated three times every day
or there is a small number of prefixes that are, you know, pushing
up an awful lot of iron very quickly and that's growing very fast
and, again, very noisy.

So I can answer you at least at a gross level, Vince, on what I
think will happen on withdrawal and update, inside three years,
you're going to have to cope with around 1.7 million prefixes being
changed every day and, by five years, that will grow up to, you
know, around 3 million, which is, know, aine more amount. And the
withdrawal rate, similarly, should be around 1.5 million withdrawals
per day. Which should keep your routers busy.

How fast would your router have to spin? Again, same kind of
technique. I got from Steph the actual 5-minute and 1-minute CPU
loads and across the year in question, the router had a brain
transplant twice and this is a PRP2. Isn't it cool?

So what I tried to do was normalise that and make everything a PRP2.
So what I actually got here was this - this is what happened across
the year, the one-minute load rate on the PRP2 increased by that
rate and just pushed it up per RIB entry. This is growing faster
than the RIB it appears to be that, when I push that forward, if
I'm doing a unit of one by the end of this year, by the end of five
years, I'll need four times that amount of processing power to cope
with that load. That appears to be the projection. Today at 176,000
prefixes, 700,000 update rates per day, withdrawals of 400,000,
you'd need around 250 Mbytes of memory and need 30% of a one-gig
processor.

Three years time - 275,000 prefixes, just under 1.7 million updates.
Almost a million withdrawals, double the memory and 75% of that
processor and in five years, you'll need a new processor absolutely.
About 120%.

It seems awfully low. I think it is low. You really are talking
about trying to cope with peaks, convergences about speed, rather
than reliability. Got to get there faster. It's per-second peak
rates, not loads, that is the problem and it assumes that BGP isn't
going to change. And there's been more than enough words over the
last year or two that we need to do something around securing BGP.
If you think you need to be able to do that inside five years' time,
you may want to think about what exactly are the factors are the
router you're going it buy in terms of security-related protocols
- are we doing IPSEC in our peering or something similar, incremental
workload and so on.


So I would actually say that, if I was going to spec one out, I'd
at least want 500,000 entries in the RIB, no sweat. I'm going to
need an awful lot of adjacency RIB space. If I'm going to be
Conservative and say, if I do this, I think I'm OK, then it's about
6 million prefix updates per day. I think I need at least a two-gig
route processor memory and probably a 5GHz processor for route
processing. What was a Cray1? I think it was less than that, wasn't
it?

How good is that number? What's going on here? I've got a couple
more seconds. Is this uniform? I don't think so. I don't think all
of you are behaving so well as that. Is this skewed? If so, how
skewed?

289,000 prefixes were actually announced that were different. Now,
the table is only 179,000 prefixes. So there are actually 127,000
prefixes that appeared for some period and aren't there any more.
So people are leaking. And then they pull it back bull there's been
an awful lot of leaking. The number of prefixes that had no updates
at all through the year, congratulations, were 12,640. Well done,
gold star, tick, elephant stamp. Everyone else doesn't, they did
some kind of update.

This is a cumulative histogram of what's going on. 50% of the
prefixes contributed less than 10% of the updates. 60% of the were
fixes contributed less than of the 20%. 80% contributed just on
20%. So the top 20% of the prefixes were pretty bad and the top 1%
contributed 15% of the update load. OK. Let's name them.

LAUGHTER

So, if you see yourself on this list, you're on this list for one
of two reasons - one, you're multihoming and you don't know how
because, somehow, you managed 158,000 prefixes for the year, of
which 20,000 were actually just flips in the first-hop AS, the next
hop and, of course, you flap like crazy. All of these folk flapped.
Some of them also re-homed as well. So have a look there. You may
be there. You may wonder why you're there. If you're there and
wonder why you're there, go to the tutorials. Let's have a look at
these people because some of them are systematic and some of them
are night-time stuff-ups.

Systematic - Hong Kong Supernet, this was a prefix that was active,
withdraw update. Green is the flap and the red is an attribute
change so for a period of at least four months, this one prefix
managed to generate around 1500 update messages per day for one
prefix, most of which were withdraw, announce, withdraw, announce.
Precisely what information that added or subtracted from the routing
table beats me with a stick. Somehow, they got a clue in September.
Well done.

Here's another one. This is from ICARE in Hong Kong. This is straight
traffic engineering. This is Hong Kong Supernet again. They have
four upstreams and they're moving prefixes around.

This is one in Turkey. Someone out there in vendorland sold them
something in June that they shouldn't have bought.

LAUGHTER

Because this is systematic sustained 500 updates per second and
moving across multiple upstreams. This one is an interesting case.
Here's another one from Turkey. They went and bought even more of
it at the end of September and did even better with it. Why is this?

Again, here's another one, Amphibian Media, I think they're related
to the folk who do inbound route traffic engineering and here is
another one - Merit. Surely they should know better. I looked for
this network a couple of days ago and it's gone. They must have
seen it flapping and sent it off into routing hell because I couldn't
see it a work or so ago. Phenomenal amount of updates from folk
like Merit. Here's the last one again and, oh, it's our friends
from Turkey. They really did buy something in June. I think they
tested it in April and really turned it on in June. Shouldn't have
happened any other way. And Number 10 - again, a US one, I think
they're InterNAP-related. Systematic, absolutely systematic, so
this is no accident.

Is it prefixes or Autonomous Systems? Look at that curve. This is
the autonomous system one. Go back by 10, look at this one, so,
while there might be a small number of prefixes creating all these
updates, there's a tiny number of autonomous system numbers generating
all these updates. The top 1%, the top 2%, the top 3%, 3% of the
autonomous system numbers generated half of the updates. Thank you
very much. Well done.

LAUGHTER

Here's another way of looking at it. Red is the actual number of
updates. The green is just the top 50 ASes. So the top 50 ASes do
half the updates. 50 people cause your router to have a problem.
50 people cause BGP to have a problem. Let's name them. Here's the
first one, we met them before, the folk who bought this wonderful
thing in June, AS 9121, 206,000 rehomes since June. If they'd run
if for a year, they'd be effectively off the planet and so on. You
could see a lot of multihoming in 17557 and 721. 721 probably should
know better and an awful lot of flaps on all of them.

Here's the signature for our friends in Turkey of the total number
of prefixes they originate and, yes, they actually tried it in
February, liked what they saw. I think they bought it in April,
turned it on hard in June, took a holiday over August, came back
to work and then something blew up in December. This is bad stuff.
I actually had a look at them. They seem to have six upstreams and
my guess is that they're using a tool like OER and doing more
specific juggling to try and get their incoming traffic evenly
balanced. And my suspicion is that they could stop all this if they
bought more bandwidth. Because what's going on as far as I can see
- and you can look them up in BGPlay and have fun with these guys
but you will find that the routes flip across all six at 4:00 the
morning. I know they're hardworking ISPs but I don't think anyone
is up doing massive config changes as fast as they can type at 4:00
in the morning. Something is happening doing bad things globally.
More tutorials and a word of caution to the vendor - you shouldn't
have sold it to them.

The next one, Korea Internet Exchange, number two. This is not
traffic engineering. This is some pathological condition that lasted
for a while, caught it and fixed it.

MCI Europe - traffic engineering.

Pakistan Telecom. A combination of the two? I can't tell. These
enormous spikes are weird but there's a background of activity. So
that's probably harder to put a signature on.

Ah, Hong Kong Supernet, our friends. Whatever it was stopped in
September and we're all very grateful.

TPG here in Australia, again, you know, they're number 6 on the
list, really high spikes intermittently. Can't figure it out. Yet.

9121, our friends in the US military. Something must have happened
in June somewhere that required a huge amount of updates in BGP.

Hong Kong Supernet has two ASes and here's the second one. They
just keep on having more fun than you can imagine.

DACOM in Korea and the Korea National Computerisation Agency - those
are individual problems that happened over a few days.

We're not seeing a level of uniformity in update rates that my
answer to Vince was any good at all. I'm only looking at 100 ASes,
I believe. And I'm looking at 100 ASes whose behaviour pattern is
pathologically bad and they contribute the overwhelming bulk of
updates now and have done for at least a year and I suspect there's
two reasons.

One - there's a whole bunch of automated inbound traffic engineering
software that does route prefix juggling that is stuffing up the
entire network with millions of updates and withdrawals. Literally
millions, in fact hundreds of millions. That sustained consistent
update rate is killing us. If you want to do inbound traffic
engineering, consider buying more bandwidth instead. The rest of
us will thank you. I don't think the routing system can withstand
that kind of abuse.


There's something more than MED oscillation. You shouldn't see
withdrawal. It just simply flips. What I'm seeing in these unstable
ones is a massive number of withdrawals with moving around in the
prefix. We have isolated incidents of unstable configurations that
are causing massive load rates.

So, as I said, the overwhelming number of updates are generated by
an underwhelming number of sources. The uncertainty in the trend
models I gave you is extremely high. That means that, if you really
want me to give you an answer, Vince, on what you need to buy in
three to five years. My answer is I don't have a clue. Thank you.

APPLAUSE

RANDY BUSH:

Geoff, two things. Why did you throw out the internal route
fluctuations when the router is going to have to handle them?

GEOFF HUSTON:

This was the question as distinct from the investigation. The
question was actually asked, "If I was buying a default-free zone
router," and when I start factoring in AS 1221, how do I sort of
know that that's a typical kind of pattern So I thought the way to
do this - and AS 1221 doesn't have an awful lot of upstreams so,
if you look at its topology, it's actually got a couple of domestic
peers and you see this one upstream - I'm kind of looking truly at
a default-free zone single path and that seemed a good baseline to
take for these measurements.

RANDY BUSH:

This is what I need with a single router in the default-free zone
that doesn't connect to anything else.

GEOFF HUSTON:

I'm talking about the floor, not the ceiling.

RANDY BUSH:

And you don't want to know what router to buy. It's what router to
sell, since Vince is not a vendor.

The other thing is those /8s you saw flapping which you don't
understand - I gave a lightning talk at the end of the last NANOG,
where we discovered that there are a number of /8s being used by
spammers. They announce the /8, they hit you with spam from the
dark space in it, from the unused space in it and withdraw the /8.
And that's those flapping /8s.

GEOFF HUSTON:

Thank you. We should name them. I'm not sure it will do any good
because they're stolen.

RANDY BUSH:

We did.

GEOFF HUSTON:

You named them. But they're stolen.

RANDY BUSH:

Right. They cover some sparse allocations and what they're doing
is using it for agile spam generation. They announce the /8 and go,
" ping, ping, ping, ping," and turn it off.

GEOFF HUSTON:

Interesting. Thank you.

PHILIP SMITH:

Any other questions for Geoff?

If not, we should move on. We're a little bit behind in time.

Next up we have Randy.

RANDY BUSH:

OK. Credit also to Steve Bellovin, a lot of his work is represented
here.

What is routing security? It is not router security, it is not
defending your router against attack, that are similar to attacks
that will happen to your Windows or real computer or whatever. The
unique thread is attacking using the routing protocols. It's routing
security, not router security. What are they doing it for? To divert
traffic and to alter traffic. We have some ability to lessen the
danger but not enough. Steve Bellovin published stuff in '89. Work
accelerates in 1996. Kent et alia did good stuff in 2000. Why so
little progress? The problems are technically difficult.

OK. Simple routing, as Geoff just showed you, is not simple. Complex
problems in routing are exceedingly complex. When some of the best
computer scientists in the world are studying your stuff as a
behavioural phenomenon, you should know you're in trouble.  So it's
not traditional communication security. All the boys and girls in
your big stack telcos know that it doesn't apply very well here.
So it's new ground. The installed base is big and a transition
problem is big. And vendors aren't seeing people with big bucks
saying, "Do it." Whether those people don't exist, or the vendor's
vision is poor, is not clear. Please go to the tutorials for normal
operation security. What do we want to ensure? That an ISP who
announces something owns it, origin of prefix. If a router announces
a path to X, that it can actually deliver the packets there. And
if it tells me it can get me to some place, did some place authorise
it to take me there? If I'm told I can get taken to Geoff, did Geoff
want his traffic to go there? We won't get into that, Geoff.

OK. What's different? The well-studied communication and host
security issues are buggy code, are all about buggy code or bad
protocol design. In this case, we have, as good protocol designers,
we can do and the code is, well, we won't go there. But we're still
vulnerable because of a dishonest member of the game. Somebody with
whom we are connected or somebody with whom we are connected to be
connected is lying. OK. So hop-by-hop authentication is not sufficient
because, just because I have a secure connection to Geoff doesn't
mean Geoff's a good guy. OK. What does the attacker want? The normal
situation is site A sends along this path, a nice financial transaction
from A to X to Y to B. Z, the attacker, wants to divert the traffic,
strip off and keep the dollars and let the traffic go there, maybe
or maybe not, so that everybody thinks everything's fine and they
kept the dollars.  Of course, it may not be dollars, it could be
critical information, it could be - I won't go there. All sorts of
stuff. How does Attacker do it? This is going to be a simplified
model. I think the original title said it was an oversimplification,
so excuse me. On a hop-by-hop basis, the attacker owned some router
and lies about the cost and we must assume that random routers on
the Internet can be owned. The current price, going price in the
black hat market for the enabled password to an ISP router, is five
credit card numbers. That's pennies. So, they own some of your
routers today, or mine too. OK. How does Z do it? OK, there are
some costs, so the path here is 5+5+5 is 15, I'm not going to go
to that because it's 30. So Y told Z it was 10 and its cost to B
was 5.  I told X and Z if you can get it to 5, I can get it to B.
X told A and Z that its costs are I can get you to Y in 5 and B in
10. And X is told these are the costs. Z lies and says, "I can get
you there cheaper." So the traffic gets handed to Z. And Z just got
lucky. Why is it a hard problem? X does not really know what Z's
links are. Has to believe what Z's saying. X does not really know
what Y's links are either so they all trust each other regarding
cost. OK. Validating prefix ownership does not help as nobody lied
that B owned the target. OK. Using a routing registry like peering
map does not help because nobody lied about who connected to whom.
OK. One approach and Steve gets credit for the underlying model,
is that B cryptographically signs when it tells Y it says, "Hey,
hello, Y, you can get to my cost for B is 5."

So it is signing forward when it announces. Y signs messages to X
and Z, encapsulating B's message. It is signed by B saying this
cost and it adds in front of it, "I can get you to Y for 10. So I
can get you to Y for 10, signed by Y." So Y is committing to that
and by the way, B, which you can verify, and I'm not making it up,
because B created this signature and the only way B could have
created it is that B can get you there for 5. Y can get you to B
for 5. I can get you to Y for 10 and Y can get you to B for 5. Z
can only sign now that it can get to Y for 10 and Y can get to B
for 5. It can't tell you the lie that it did before. You can verify
this cryptographically. But this costs. So... Use caching and pre
or delayed validation. OK. The fact is, for those who had the
misfortune to be in the database session and hear about my version
of the PKI model a couple of hours ago, you can, or even in Steve's
model, you can validate who owns what when you receive the database,
you don't have to wait till you receive the routing announcement.
And if you can't get it right now, be optimistic, say, "OK, I'll
believe it for the moment," and go check it slowly - delayed
validation. Hopefully computers will get a little faster and as
Geoff has just pointed out, most routing announcements are boring,
because you've already heard them recently. So if the router had a
cache of what it already had validated, even though that was
withdrawn, those people in Hong Kong are going to tell it to you
again.

GEOFF HUSTON:

In the next second?

RANDY BUSH:

And you won't even have to revalidate it. Just when we need
revalidation the most it is done. Geoff pointed out that probably
that stuff is boring. OK. Trust issues - how does X know the plight
of ISP  Y? How does anyone know prefix ownership are the two core
questions. Who owns the address space and who is announcing it? OK.
Address space ownership and ASN ownership by the way, both of them
luckily fall into a natural hierarchy with IANA at the root. So the
allocations to RIRs and RIRs can sign allocations to ISPs, LIRs,
NIRs and XIRs. Who issues the certificate. Who certifies an RIR or
IANA? Who certifies an ISP's identity or an RIR? Is it a web of
trust? Issuing the certificate of who owns that address space can
be separated from issuing the certificate for the address space
prefix.  In other words, the ISP's identity is separable from the
space that is being allocated to them.

I'm going to do something horrible. I'm going to go to another
presentation, if I can see here. And I don't have my glasses. OK.
This is an allocation by APNIC. It is an allocation to me as the
ISP. There's my identity and my public key. And it allocates that
address space with that exploration and APNIC signs the delegation.
Sorry, Steve, you'll survive the experience. We'll have some
opportunities to discuss this. OK. And similarly, for an AS number
it gives to me, and when I make the announcement, it's from one of
the ASes I own, of address space I own, and you can verify it all.

OK, how are the certs distributed? Is it administratively by ftp,
etc, if it doesn't do a good job of it, then somebody should come
up with that in v4 please. Out of band protocol, new cert distribution
protocol. OK. Is there an in-band protocol? There are some on secure
BGP suggestions that suggest that as yet another extension to BGP
and some will work out how to do it with the DMS. So, that was it.
I made it. Steve, by the way, did a lot of this work and helped me
and IIJ pays my salary. Go for it, Steve.

STEVE KENT:

I liked most of it, until we got to the, "We take these blobs and
sign them." The way you describe them doesn't correspond a standard
way of issuing certificates from a format perspective, because you
tied together, if you go back to the slide, which ever presentation
that was. There. You either issue an identity cert or you issue an
attribute search but that is served on separate ways. Someone can
issue an identity cert.

RANDY BUSH:

They can do it from anywhere.

STEPHEN KENT:

And someone else as the responsibility of figuring out whether or
not that identity search represents the people they're dealing with
and then issue the attribute search. The question that comes to
mind is why would I add the extra level of that?

RANDY BUSH:

They already do that. When I contract - where is somebody from an
RIR. Sam, you're the victim. When I sign a contract with APNIC, the
paper part of the contract binds her identity. Real space, my
physical address, my signature - that is, a legally binding contract.
I should hand them the certificate when I do that, when I will use
to transact this stuff. That is my identity. I assert that in the
binding legal contract. I might have gotten that cert. 95% of the
cases, no-one wants to do their own cert management and they'll pay
APNIC $10 and APNIC will give them their cert. OK. But Telstra -
Steve, where are you - I believe paid tens of thousands of dollars
to verify it with someone to get a Telstra CA. And I am not going
to tell Telstra how to do their internal security management, they
are a 500-kilogram gorilla and they'll tell me how they want to do
their security management.

What's interesting here is that they need the identity search of
the ISPs, or endsites need no attestation because they're only used
in two ways. Only used either as a business transaction. I sign a
financial email that I sent to Sam, or I signed a DNS request that
I sent to Sam, or I signed an allocation request I sent to Sam. And
Sam, I signed it with the private key that matches the public key
in the cert I gave her when I paid her money and give a contract.

STEPHEN KENT:

If I, as an RIR, go to the trouble of issuing an attribute search
to someone, the work is exactly the same as if I issued a public
cert?

RANDY BUSH:

It doesn't have a key, it doesn't have keys on it and therefore I
cannot use it to sign requests to them.

STEPHEN KENT:

But your slides have attribute certs in them, issued to the holder
of two identities. You've added this additional thing that a registry
in this case had to do and I'm not sure why one would add it given
that the work you would have to do to do that is just as much as
if you just issued them a cert with a public key in it, binding
those resources in the first place.

RANDY BUSH:

Because what we really do in our operational world - this just is,
I like sBGP because it matches the BGP I know and is very congruent
with it. I like this because it matches our transactions. For
instance, my identity will be the same and the public key I will
announce will be the same for my transactions with APNIC and with,
where is ARIN? Is there an ARIN.

STEPHEN KENT:

You always get to choose the public key anyway so you can have the
same...

RANDY BUSH:

Not if it's in the resource cert.

STEPHEN KENT:

You can. You get to choose what your public key is, if you choose
to give the same public key in two resource certs, which are public
key certs with extensions to them, you can choose to do that. And
that won't break anything. The second concern now is that you said
the only way you use the cert, the identity cert, is in local
transaction with the registry for instance?

RANDY BUSH:

Bound in the attestations.

STEPHEN KENT:

You can only validate a public key cert by walking some chain from
an anchor point. Everyone has to be a trust anchor for everyone
else, which is a very significant.

RANDY BUSH:

No, no. I get the BGP announcement from Geoff. With the two of those
objects, one being the ASN, and one being the IP space. I go to the
IP space and I chase the IP space and I can verify the signatures
all the way up to the IANA that encapsulated the attestations of
who owned the space.

STEPHEN KENT:

If the space is only allocated there, you can't chain.

RANDY BUSH:

I'm going up a chain of the sign attestations. These, and there is
a question of what is, in what name space are these. I agree. But
we have a natural name space which is the IP identifier and the
ASN. Here you go, Geoff. At least Geoff and I are used to storing
these madly to run them for research. And I somehow think they make
a very natural name space which will be run by other means. I think
Geoff wants to speak.

STEPHEN KENT:

I watched you at the point where you talked about binding two other
attribute certificates and identifier certificate which is binded.
A resulting thing like that is not something that PKIs deal with
from a chain perspective. It can be a digitally signed bit but we
only chain certificates and we only chain public key certificates,
not something else. So it would be a new thingy of some sort and
I'm having a hard time relating to that.


PHILIP SMITH:

Just a quick comment.

GEOFF HUSTON:

What Randy is saying is that there is an implicit chaining inside
the series of attribute certificates up the food chain by nature
of aggregates and more specifics. If I see down at an ISP level a
particular prefix, we can search through for an attribute certificate
that expresses an aggregate of that. My question to Randy was why
I can see that, how precisely does it work in the AS number space?
Or are you assuming AS number blob arrangements?

RANDY BUSH:

AS numbers are allocated - I'm sorry. AS numbers are very like the
IP space - they don't have dots in the middle of them. But IANA
does hand out blocks of them to the RIRs. So there are ranges of
this.

GEOFF HUSTON:

You would say it's the range search and would use the same thing.
OK.

RANDY BUSH:

I will fess up that I haven't, you're at the edge of, while I've
thought this out, and so has Steve. OK. You're at the edge of how
far I've thought this out and luckily I think we have a couple more
meetings in the next five days to beat on each other further on it.
But this maps the business and social relationships that we actually
have. It's intuitive to us. But, again, we're harping on this stuff
when this presentation was supposed to be about the router security
problem. George, bail me out.

PHILIP SMITH:

Thank you very much, Randy. Next up we have George Michaelson. He
will give us an update.

GEORGE MICHAELSON:

A rather horrible slide pack that is, especially  for a workshop,
but it's a fairly good reflection of the work we've been doing. I'm
going to cover what our current goals are here. I've got quite a
lot of stuff. Some examples of the activity and where we think we're
going to go next. The immediate short-term goals we were looking
at were to try and get something that was the demonstrator, something
that would get us off first base, with a prime focus on using free
and open source. We badly wanted to avoid re-implementing anything.
We have a body of code that we do in-house and it's built around
Perl, the mod Perl extension to Apache 2. It's quite an investment
we've made in the last 2.5 years and it's worked very well for us.
We were looking to make use of that same mechanism. That means it
had to fit in with this thing that is a particular way that we're
developing code at APNIC.  We're interested in trends in the WIDE
community. We're looking at things which are doing REST, which is
a way of doing things. We were looking at using encoding because
there is so much code out there that you can use XMLs. If you can
pump the load on to the client side, make things happen on people's
own iron rather than yours, that lets you get rid of quite a few
of the problems. We've been doing the bootstrap and to support the
basic infrastructure. I've got to say the infrastructure work, this
is really primitive stuff. How do we handle a cert, where do we
store it and make statements out of it. It's not up at service level
where we say this is a valid search. It's much lower than that. But
we've been able to leave for experience in the bootstrap phase. We
actually found there's quite good code out there.

There's a library called convert ASN1. It maps the code into a Per
1 hatch structure and it had a module for passing certificates.
That was just a test demonstrator but it's quite good. Although
it's very badly documented, the code worked very well and we were
able to use this to understand how to construct ASN1 sequences and
interact with them. We targeted Open SSL deliberately because we
thought it was the most successful. We thought hard about going
into the OpenSSL coding. But it is frighteningly complicated. It's
bizarre what Tim and Eric do to make SSL happen and their interface
is very confusing. It has abstractions and make your tea, cut your
hair, it does everything. It is poorly duplicated. We steer clear
of that. Instead, we're looking at using its command line interface.

We have basic functions. We can issue a CRL and do basic things
very straightforwardly. Verification - that's built into this tool.
If you present a well-formed certificate, CRLs, you can say is this
cert OK? The problem is they've written the verification in a way
that doesn't understand extensions. If you think about it, you can
you write a generic tool that will understand a totally arbitrary
extension? You can't. And what we've basically had to recognise is
that this is for us at this stage an inherently 2-phased process.
We're going to verify the crypto stuff. We'll have to do it out of
band by looking at them ourselves. The other thing we're getting
is tools to convert from PEM  ASCI encoded form. You get opportunities
to look at things and the Perl is a big advantage to us. But there
is that problem that it's undocumented and pretty complicated.

A quick overview - the way it works is it has an MS DOS configure
file. You give the block a name and do a variable value assignment.
We found there's an option to add an extra config file. We decided
we'll add it as an external fig and pass it in. There's a weird
pack where you put in any flags about its criticality or importance
or what colour of milk you drink and say the rest is a DER encoded
sequence and embed it hex. So if you've got ASN1 and put it into a
sequence, you can bang any extension into the framework and use it
to construct the signing.

An example would be is an arbitrary member - they have 1 ASN, 17814.
They've got a /20 of space as well. Using this mechanism, this is
what it would look like in the encoding in a config file to run
through OpenSSL. We've defined an arbitrary extension name in the
configuration file and we have mandatory components the RFC says
we have to have.  We must pass down the CA bit because you have to
use the certificates. The subject key identifier has to be put in
as a hash and the security identifier has to be present and a key
usage stream. There's a bunch of functions you can do. The critical
here is the interesting one. This has the behaviour of making people
not be able to do things with your cert. It puts restrictions on.
I will say I have suspicion there are hacks out there that ignore
things and do things with or without them. In a community of
well-behaved people, I think the flags are quite good. Sorry, Steve,
is that possibly me being naughty or fair? They're more mandatory
or only if you play the rules.

STEVE KENT:

I suspect that most widely used applications on security with a
critical extension is not.

GEORGE MICHAELSON:

I noticed that it said you couldn't use X,Y,Z after you've used it.
If you're happy, that's good for you. People think there are things
you can do. The important stuff is the second set. These are the
mandatory extensions that aren't in the set of attributes. There's
a no ID tag and dollar are two which are defined by the RFC and
there's are the IPv4 address and IPv6 address components, both of
which are critical. If your ASN is as good as mine was 20 years
ago, there is something like a number of and encodes a representation
of that number of prefix. To give you an example of what we get
from the Perl, in the cert running program, we do a configuration
print against the no ID. And the PDU is the string equivalent of
the stuff we were seeing before. Does my voice does not reach this
microphone from over here?

You get a hash array which consists of a high-level identifier
that's in the ASN1 and has the decode value. That's Payload and
this is the instance of data we were looking at. The instance is
the policy identifier and this is its Payload value. That's an
assigned number that's managed by IANA. I think Steve may run the
registry that assigns those? Yes or no. Russ does it. Russ assigned
that number. So creation act is a command line. You're doing a call
of a CA function and passing in your own config and passing extension
file and saying use inside that. And it's just absolutely normal
command line arguments for certificate processing. You could look
on 100 web pages, "How do I make my own certificate", and see this
sequence with very small functions. We basically copped out in the
short term.

We weren't comfortable with the ASN1 part of encoding these projects.
It's not that we don't think ASN 1 would work but we are one step
back. We modelled the resource signing phase of constructing an SHA
1 sign across a body of text. We're doing detached signature. We've
bypassed the issue of what it's like, how to manipulate them still
coming up there. We can sign anything. We have tested signing an
RPSL object. There is a problem in the tool kit, if you want to
apply the verify function, you must have the public key in the ASN1
format. I don't know if you've ever seen how OpenSSL does it. It's
made available as a text  you've done of the elements followed by
a body. You have to do the conversion into ASN1 if you want to use
its command tool to do a verify. It doesn't tell you that anywhere
in its doco which was painful.  We're expecting to have to put some
tools into our own facilities to provide this to people to help
you. If you want to see one of these babies, this is what an example
certificate looks like. Can you all see that and read that? Great.
We're deliberately using a name which is quite clearly not a valid
- do not use this in the wild name. We have short-life validity.
We wanted to get into a cycle of having aged certificates that
people could get against. We have a whole bunch of mandatory
components. And down here, you have a typo because the people that
submitted the code into OpenSSL got one finger wrong. So instead
of SBP you've got SQP. Almost every OpenSSL out there will see that
string until one minor upgrade goes through. You cannot actually
see the elements. They present as arbitrary data.

We expect we'll have to write code and submit it to OpenSSL community
to present this in a structured manner and show people what the
elements are so you can do extraction and manipulate it yourself.
Then you get the certificate as a bunch of text at the end. The
current status - what we did is we took all of the top-level resources
that we've given out to our membership and generated certificate
instances for all of them. Our own file is about 8 to 10 k of text
and that is a single certificate that covers the entire space we
have responsibility for. We can do somewhere around 1,000 signings
in 30 minutes, not that I would suggest in your wildest dreams you
should resign the entire state of the world in one hit. That's
silly. If you have to, it's not expensive. On a Dell 1750, which
is a 2 gig CPU you could do it.. Most of your time is spent doing
IRO.  The encrypto component of it is very small. I don't know why
I keep seeing the same numbers come up and I'm wondering if every
single certificate number in the world has the same prime number.
Isn't that bad?


SPEAKER FROM THE FLOOR:

It's the exponent.

GEORGE MICHAELSON:

It surprised me we're told the way this stuff works is that you
have huge numbers of relatively prime number.

SPEAKER FROM THE FLOOR: BECAUSE THE WAY:

But, the public exponent that used, E, it can be any one of a variety
of things. To the 16th minus 1 is a good choice.

GEORGE MICHAELSON:

Basically, at this point, if you say, "Trust me," it's OK.

SPEAKER FROM THE FLOOR:

It's a good number.

GEORGE MICHAELSON:

I dropped $200 worth of glassware one day and they were cross with
me, so I didn't do too well. I went to the beach 100 times in one
year, so.

The other thing about this is we've deliberately made the certificate
names a blind. We haven't given people institutional names. I don't
know what your telco is, we've created this space using names based
on this arbitrary number field. We chose completely a random prefix
FC00 and thought it was a nice number space to use. Every one of
these certificates has a 40-bit centrally assigned unique local
identifier magically associated with it as a value. You can work
out who it is because you plug the AS or the IP into whois to find
out who has got it. But we're not making a data dump. These are
flattened and we think there is potential in deliberately having
certificates with anonymous names.  There are many reasons to think
about names that get attached to these things. This is important.
We're playing with effectively artificial flat name space. We thought
it was useful. We have made the configuration files and the private
keys available. Never, ever give people your private keys. However,
we are. If you want to play with this stuff you need the private
key in order to do the signings. Our database, and I've put the URL
up there, and it is here has for every certificate, it has the
private key. So that you could download these and pretend to be
that person and use that certificate to do signings and tests.

When we go live, we won't do that. I think that's about it. We're
just starting to come up to speed on our own framework to manipulate
these things.  We're about three months behind with some other stuff
but we'll have our facilities able to manipulate these. And the
next steps - we've got test certs from BBN. Charles Gardener has
been amazing. He's been checking off stuff. We have a nice body of
code. We want to put up this little demonstrator to giving people
a chance to pick some numbers, be a certificate, do validations,
even make some bogus certs and verify that you can show it's not
valid and put up a time line to get this stuff out the door. A lot
of this is subject to other work. That's it, folks. APPLAUSE

RANDY BUSH:

Randy Bush.

GEORGE MICHAELSON:

How are you?

RANDY BUSH:

I'm good. The RIRs and particularly APNIC, since this is my home
region, is getting this done because this is longly timed stuff and
there is going to be some Monday morning, we're going to wake up
where we have serious routing attacks and this stuff has to have
been all been done beforehand. The vendors isn't solving the problem
for us, you're doing the substrate that has to be done first and
thank you.

GEORGE MICHAELSON:

Has crash dummy here, I'd say that it's really comforting the open
source tools are going towards our goal. It's a little worrying
about some of the behaviours but not a big step to give them back
some food that improves the quality of what they're doing. I'd be
fairly confident of what we get out the door may not be the fastest
or the best but it will fly. I don't think the community is going
to have a problem here. I think we can fulfil the role that's needed.

PHILIP SMITH:

Thank you very much, George. So this brings us to the end of the
first session of the routing SIG. After the break, we have those
three sessions to entice you back. And I hope you'll come back for
the second half in about 20 minutes time.

Thank you.  PHILIP SMITH:

Welcome to the second half of the Routing SIG.

Before I actually introduce Randy, our first speaker, as you probably
noticed, the network has gone down and, yep, it is being fixed as
we speak. So it just means you'll have to pay attention. You really
will have to pay attention rather than just pretending to pay
attention.

Anyway, second session, we have three speakers. First up is Randy
Bush, then Greg Hooten and then finally Henk Uijterwaal. Henk is
just fresh off a flight from Amsterdam so I've put him last on the
agenda so he can recover from that experience.

First up is Randy.

RANDY BUSH:

You picked the worst time to lose the Net, huh? Then, people would
have something to do instead of having to listen to me. Still, you
win some, you lose some.

OK, this talk is really a few years old, but Philip asked me to
give it because I think he thinks it's still relevant. It's about
complexity and really what it's about is simplicity.

The Internet actually does work. Right? I know this will come as a
shock to most people here but, you know

VINCENT FULLER:

It's not working in this room right now.

NARELLE CLARKE:

Trust me, it's still out there.

RANDY BUSH:

It is still out there. IP forwarding really works. MPLS switching
is a label, look-up, IP forwarding is an IP look-up. It's all done
with T-cams, it's all the same story.

Actual measurements show the quality of service is just fine. If
you remember Steve Casner's measurements of transcontinental-US
VoIP, Jitter, etc, etc, on just connected up to the commercial
network, it works. OK.

Anyway, QOS is a decision of which packets to drop. I don't know
about you, but I get paid not to drop packets:

So there are reasons that the Internet has taken over the data world
and has taken over the communications world and so trying to turn
it back into the other seems to be swimming up stream.

Reliability and resiliency are the core strengths of the Internet.
The Internet was designed to provide reliable service over unreliable
infrastructure. Somebody was talking about the reliability - Geoff
was talking about the reliability issue. The idea is components are
going to be unreliable. They will be almost as unreliable as humans.
OK? But the Internet handles routing around problems. Right?

Our weakness is security, as it was once the telcos' by the way.
If you remember, they used to mix controlling data and 2600 and
Captain Crunch and all that stuff, OK?

IP routing yields as good a service as MPLS switching and better
in cases of multiple failures. Routing will find a way around. MPLS
- you better have configured it.

To quote Mike O'Dell, the hero of many of us, the real problem is
scaling. All other problems come from that. If you can make it
scale, the game's over, OK?

Complexity is the arch-enemy of scaling and this is key. Because,
if you do something complex, your costs are non-linear as you scale.
The telco culture started to glorify complexity as a competitive
tactic in the 1970s and into the '80s. They wanted to compete with
each other so the big 500kg gorillas added feature, feature, feature
and hung boxes on the sides of switches and boxes on the sides of
boxes in order to provide perceived features to compete with each
other.

But look what it did to them - Geoff showed you this morning the
wonderful chart of those people are dying on the profit and loss
statements and they're dying on the earnings per share and they're
dying on the capital market. OK?

And we're all in a commodity market. We all buy from the same vendors
as the competition. Right? Making things complex will only raise
your operational costs and raise your capital costs. I do have to
remind you of RFC 1925 section 2.3 - "With sufficient thrust, pigs
fly just fine." The question is, do you want to pay for the fuel?
Out of your income statement? I don't. And who cares about flying
pigs anyway?

'The Hitchhiker's Guide' has a wonderful saying about the Sirius
Cybernetics Corporation, their products - "It is very easy to be
blinded to the essential uselessness of them by the sense of
achievement that you get from getting them to work at all."

LAUGHTER

How many of us are working with networks that we're amazed when we
get it to work. Well, maybe we've put junk in there we shouldn't.
"In other words - and this is the rock solid principle on which the
whole of the corporation's galaxy-wide success is founded - their
fundamental design flaws are completely hidden by their superficial
design flaws." OK? Stop building artificial make-believe circuits
on top of switching on top of circuits. OK?

I have worked for a number of - I have worked for the world's largest
telco and I've worked for the world's fourth - ex-fourth-largest
telco which no longer exists because they pursued this path. And
so I'm now anonymously going to tell you which place I learned this.
But the optics people in the telco, the people who are responsible
for fibre, said, "We can give them all the real circuits they want."
Building circuits on top of layer 2 is costly to the company and
damaging to the company.

The problem is two things - one is the internal cost model and that
company had an internal cost model which, if you were the first
user of fibre, you just wanted one line of that fibre, you had to
pay for the whole thing and this is very common in the telcos. So
instead of buying another fibre, they buy more router, switches,
or whatever you call these monsters these days, and build MPLS on
top of them.


And the second one is what Geoff referred to this morning, is the
convergence game, which really isn't convergence. It's one department
who's been at political war with the other departments for the last
150 or 100 years saying, "We can provide this converged network and
therefore we will subsume the people, the ATM People, the Voice
people, the IP people, etc and we will give you one network and,
oh, we'll manage it all." Now, what's interesting is what they did
was they took a profitable frame relay business and actually even
- hard to believe, still as it was, late in the '90s and into the
2000s, a profitable ATM business and turned it into an unprofitable
MPLS business.

Where the smarts are is the big difference. Traditional Voice had
stupid edge devices - the telephone instrument we all know and love
with that dialler button on it and a very smart core. These monstrous
switches that are very sophisticated. The Internet has smart edges
- this computer, undoubtedly it's smarter than I am, but that's
easy. With sophisticated operating systems, applications, etc, etc,
and a very simple stupid core which does packet forwarding.

And a key thing here, which Geoff was pointing out this morning,
underlining innovation, which is critical, is adding an entirely
new Internet service, such as Skype, such as HTTP, etc, etc, is
just a matter of distributing an application to a few consenting
desktops - let's forget NATs.

And you fielded it. You do not have to change the core. Think about
what it takes if you want to add a service to the telco Voice
networks - massive time, massive money and you have to change the
whole core of the network.

Where is the reliability? The Voice network has very smart central
organs which are heavily armoured and have rooms full of battery
backup, etc, etc. The Internet assumes component failure and achieves
reliability through the redundancy in the protocol designs.

For instance, the root servers can be seriously attacked without
anyone noticing they were and people have to actually show measurements
of which ones weren't reachable when because none of the users knew.
Right?

The protocols find a working one and remember it until it fails.

Great ones - carrier class reliability. We've got fibre 69s, we can
give it to you. The famous 5ESS switch regularly has five nines in
operation and has even hit six nines in the field. We think we want
that in routers and other Internet boxes. Can we achieve this? Let
me tell you a secret about the 5ESS. The 5ESS somebody designed
with a poor (pause) - there goes another noun - relational breakdown
of its data structure. So the data is redundant round inside the
switch.

So there has to be a supervisory function which continuously runs
and cleans up the internal inconsistencies in those data structures.
It is the majority of the code. And, if it's removed, the switch
crashes in a few hours. And that's your five-nines reliability. Can
you imagine this approach scaling to Internet routing? You can't
distribute that? OK? Does not play here.

Spread it across the layers. Again, RFC 1925  - "It is always
possible to agglutinate multiple separate problems into a single
complex interdependent solution." In most cases, this is a bad
idea." Don't do it. This is why ATM-1 failed in the Internet. It
tried to solve QOS, traffic engineering, circuit simulation, all
that stuff.

RFC 1925 again says, "Every old idea will be proposed again with a
different name and a different presentation regardless of whether
it works." And we are now facing ATM-2.

Trade-offs across the layers or how to get power and simplicity. L
2/L3 technologies such as Frame, IP, MPLS, have costs proportional
to software costs. They drop very slowly.

Fibre bandwidth and pricing seems to follow Moore's law - it's much
like hardware cost. Which do you want to bet on? OK? So, instead
of increasing the L2/L3 cost with pseudo-muxing, DWDM is your friend.
Every year, they get twice as many bits out of the same piece of
glass. Bet on it. The cost of bandwidth is falling faster than 32
feet per second squared. Routers aren't costing less. They're costing
more. My OPEX is going up because of the complexity. Get a clue.
What do you do? Bet on simple and cheap.

Layer-1 costs are driven by hardware. Layer-2 is driven by software.
Provision the bandwidth you need.

What happens when fibre keeps falling and Google, and Yahoo, etc,
provide cheap transport and the last monopoly is broken and
peer-to-peer dominates? And VoIP keeps exploding, even though it's
not bandwidthed? There's only so long the government and lawyers
can save the telcos.

The second question I wanted to ask Geoff this morning - and I'll
ask it now - is I think I see a game being played. It's especially
visible in the States but it's leaked here and it's leaking to Asia,
and that is that the trademark and copy right lawyers on the right
hand are trying to label content as property so that, on the left
hand, the entrenched carriers who are being protected can sell the
transport of a commodity product instead of a commodity service.
So that that's what's being called - what was it? Oh, God - he asked
the question... um, the whole thing where they're doing, "We'll
charge Google for carrying their bits but we'll carry our bits
better." Net neutrality issue, etc. So what's happening is, on the
right, they're productising content, the motion pictures association,
the record association, etc, and, on the left, they're nailing it
as, OK, "Now, we will give you differentiated carriage of that."


GEOFF HUSTON:

I'll have a quick response to that. Yes, that is plan A from the
media industry and plan A isn't working. Expect movies to have more
insidious product placement because plan B is distribute the movie
more but pack the ads inside so you actually can't filter them out.
My suspicion is that movies will be sponsored by various media
outlets - Coca-Cola, etc. And that's their plan B.

This whole issue of the telcos defending their space - because they
employ a lot of people and otherwise the Department of Social
Security would have a huge problem on their hands - is now this
last desperate card they're playing.

The media stuff - Google I think has proved that the media industry
is strippable and they are working through it. BitTorrent is proving
that the traditional distribution systems are inefficient but that
doesn't mean there's still a strong industry there. There is. You
just place the ads in different spots.

RANDY BUSH:

That's what's happening with the whole media thing. The newspapers
are losing the ad revenues. The newspapers are going online and now
Google is ahead of them and they're going to be in deep yoghurt.
So the lawyers aren't going to save it but they're working hard to
muddle it in the meantime.

Telcos have to save themselves. They're going to try and climb up
the stack but what they need to do is get in front of the technology.
If VoIP is so cheap, then provide it already. Provide innovative
services and not video on demand but mediated peer-to-peer, right?
And do it as a commodity service with simplicity, not complexity.
Because, if you complicate your network, you're just going to take
any money you might have made and throw it right down the drain.
OK?

Going back to the cannibalisation of the frame relay business by
the MPLS, what happened to the profit side, the margin of the frame
relay business was it got turned into capital expenditure to put
in more and more MPLS routers and into the OPEX to manage a very
difficult technology. And so your margin went down the tubes by
complexity. OK?

So I think with enough complexity we strongly suspect that we can
operate an approximate Internet in polynomial time and dollars.
That's a researcher's joke. Sorry.

We are working on a proof that operating the Internet can be made
to be NP-hard and then we'll just wonder where the profits went.
Just like the voice network. Never learn. The United States didn't
learn from Vietnam and we didn't learn from the telcos. I think
that's the show.

ED LEWIS:

I agree with what you're saying about complexity. But there are
some things that the telco companies have, some services like
emergency phone calls - we have 911 in the States. I don't know the
number here - looking at trying to put that stuff into the Internet,
you start seeing a lot of really complex solutions that are above
the telco line in the software now. It looks like we're just pushing
the complexity around sometimes to achieve some of the services
that we've had for years in the telephone system.

RANDY BUSH:

I think Geoff is going to do a better job of this one than I.

GEOFF HUSTON:

When you start arguing with desperate people who see the problem
about if they're going to be in business next year and they start
bringing up a whole bunch of reasons why they're socially useful
and you should fund their continued existence, most of the stuff
about 911 is actually nonsense. Indeed, realistically comes the
issue of where and why is there a rollout there. The telcos actually
operate a damn fine SDH switch with Voice at the moment and they'll
continue to operate for some time yet. This is not a here and now
problem. What they're truly trying to argue is deregulation is
hurting a lot because other people are taking niche points and
taking money away from them. You're seeing desperate people clutch
at straws and argue why their role is still necessary and important
and why the money should come in this way.  I'm not sure it's
believable but that's the case they're making.

RANDY BUSH:

Another way of looking at it is why isn't the demand being made of
my television that I should be able to make an emergency thing over
it? Why isn't the demand being made of my car? Why is it being made
of the Internet service? All you're trying to do is stack stuff on
top of it because, oh, my God, I can make it look like a circuit.
But, if you don't do that, your head won't hurt so much.

ED LEWIS:

This isn't an essential service we've provided for years and now
the Internet has to do it. I'm looking at it as someone using the
Internet and watching us trying to replicate the same service.

RANDY BUSH:

Don't do it. If it hurts, stop. Trying to solve the wrong problem
or else in a disastrously wrong way.

ED LEWIS:

The comment that led me to this was the comment about battery back-up
with telephones. People say when we have power outages, you can
pick up the old pots line and call someone.

RANDY BUSH:

Go pick it up. Just because I bought a car doesn't mean I'm going
to let go of the pots line so, if you want that service, get that
service. Don't try to impose it on automobiles or televisions.

ED LEWIS:

Sometimes we take having no complexity too far, making it too simple.

RANDY BUSH:

I wouldn't do something like that.

PHILIP SMITH:

Any other questions for Randy.

NARELLE CLARKE:

Item number 8 in RFC 1925 says "It is more complicated than you
think."

RANDY BUSH:

1925, yeah.

PHILIP SMITH:

Thanks very much, Randy. OK, next up we have Greg Hooten talking
about real-world use of route analytics technology.

GREG HOOTEN:

Hi. My name is Greg Hooten and I'm from Packet Design. We put out
a product called Route Explorer and I want to talk to you today
about route analytics and how it's used and where it's used.

A little history about the company - I'm not going to go through
it. You'll see it in the slide set if you want to pull it out off
the Web.

So why route analytics? I've worked for a lot of large ISPs. Some
of them went belly up. Most of them couldn't bandage their networks.
And a lot of the reason was because everything was focused on Layer
2. How do you manage a Layer 3 network? It wasn't really known and
HP Openview really built up this big process at Layer 2 to fill out
how the Layer 3 worked and that didn't work either. Most of Layer
3 problems are caused by misconfigurations in the routers, hardware
failures and they cause the majority of the problems in the networks.
What we saw and at Home Corporation, where I worked before this,
was, when we froze the network, for example, for a holiday, essentially
our outages went down bill over 90%. So nobody is allowed to make
changes on the network - outages went away.

That didn't catch on with a lot of people but we tried to minimise
our outages by being smarter about how we did them. But there was
no way to really test except for brute force, whether the Layer 3
was working or not.

Route analytics leverages the strength of routing by listening to
the routing protocols. So, if I'm listening to OSPF, I listen to
the LSA information, record what's happening in the network, I keep
it historically and it allows me to diagnose the problems either
currently, or historically, or over time, that are happening in the
network. So, if a problem happens once, it may happen again. If
it's happened 15 times, it probably won't show up in a Layer 2
management system but it will show up in the routed infrastructure.

Layer 3 is designed to survive through redundancy and through
rerouting. That's good and bad. One way it says, you know, "I'm
going to try and get around this problem," but it also disguises
there is a problem so we're trying to get visibility into that Layer
3 topology.

Simple topology (refers to slide). Here's the routers, here's the
path across the network, some BGP, some colours, it looks like a
diagram. It can be changed any way you want. We tried to keep it
as simple as possible. Through the reports and through very lightweight
touch on the network peering with OSPF, you get the data about what
the routers are saying in the network, what routes are up, what the
metrics are, what paths are down, what links are down. And then you
build up over history a timeline of what has happened and do analysis
on that type of data.

These are some of the companies that are interested in this type
of stuff (refers to slide).

So MPLS seems to be our big focus recently. I don't really know
why. It's a really good way of selling very large routers to the
edge of your network instead of smaller routers tell edge of your
network. But MPLS DPNs are catching on in a lot of Tier-1 ISPs. One
of the big problems that they're running into is change validations
saying, "I'm adding a new customer to my network. How do I know it
works?" What they're doing right now is they're doing as much testing
as they can, calling up the network and saying, "Alright. Try it."
When you go out and add new routes to the network, how do they know
that works?

So they're running into per-customer reachability issues, privacy
issues - are the customer routes really distinguished one from
another? - and policy issues - am I getting the hub and spoke that
I really want or am I getting the mesh that I want? And these are
the problems that they're running into - customers leaking routes
between each other because of misconfigurations, providers trying
to monitor the policy as they change it, provider misconfigurations
for various manual processes.

This is a summary page of the changes that are happening in a
network. Detail based on customers - so you can name them pretty
much anything you want - but what we're looking at is trying to
baseline what's happening into a network. So I've got three active
Pes, 10 active routes for this customer, baseline routes of 3 so
I've got seven new routes. The question is why? So the process is
really more like, if I'm making a change in the network, someone's
responsible for that. Someone's responsible for saying, "I'm going
to add a new PE. I'm going to add a new set of routes to my network."
There's also an acceptance process of, alright, once those routes
are in, once the new routers are in, how does that get accepted
into my network as an operational piece of equipment? And so what
we're trying to do is give that data to those groups so when I make
a change to this network, customer 5, they've got 10 active PEs,
two withdrawn routes, 100 new routes. Why do they have 100 new
routes?

So being able to look at these routes and say, "These routes were
assigned to this PE at this time and this is the active routes,"
with this route discriminator, that gives you the ability to view
in more detail rather than saying, "I planned a maintenance and
executed that maintenance and then this is the result of that
maintenance."

So we've got those 100 new routes that were put in for customer 5.
13 are coming from brand new PEs for that customer. They're not
part of the baseline. Is that a mistake? If they look at the work
order, they should be able to tell. These are the new sets of PEs
that were supposed to be added. If there weren't, did Pepsi Co get
crossed with Coca-Cola? Is this something we want to do? Then you
can do justification on your maintenance before you contact the
customer and say this is working or this is not working.

If there's instability in the network, then we can display that
instability and you want to do it with as much detail as you can.
So here is just a graph of it over time but we could also give the
details about what routes were being affected at that time.

So we had a large WAN RFP that required routing converging SLA. So
when does a network converge? Right now, it's difficult to measure.
What parts do you measure? Do you measure when OSPF starts to go
away or once the timers have expired and the changes are occurring
in the network? There's no standard for it so what we're doing is
monitoring the convergence based on the propagation delay across
the network. Is that the ideal solution? No. But it worked well
enough for this company to justify to their customers that they
were within the specifications that were required. Do we want better
solutions? Of course.

Right now, being able to say, "I have a monitoring system on one
side, monitoring system on the other," and being able to generate
route changes into the network, inject and withdraw, look-back
address of /32, measuring that propagation delay across that network
so that you can see this is the change and this is your graph of
the change over a long period of time.  Does that meet the SLA that
you're interested in?

By being able to look at BGP over long periods of time, you can see
changes like huge deluges of routes coming into the network, whether
it's a redistribution from OSPF into BGP or new changes, loss of
peering, how does a loss of routing in AS1237 seven hops away from
my network affected the way that I send data out of my network?
Does it make an effect? If it does make an effect, do I want to
find out why and do I want to change that process? So we've seen
this at Tier-1 ISPs all over the place. UC Berkeley, new route
leakage into the network - why did they come about? Where are they
coming from? Being able to determine which customer was advertising
those new set of prefixes, where did they shift from in the entry
of my network? So, in this case, instead of looking at 108,000
prefixes that shifted or 330,000 events, we're going to create a
pretty picture about it and it's actually a moveable picture.


So we can play over time what's changing in the network. We see a
large loss of prefixes across one edge. So, in this case, Calran 2
is advertising prefixes across this edge, it's losing prefixes
across this edge and so it's fairly simple to see what's changing
in the network. It's easy for a customer on this side to contact
Calran 2 and say, "Look, you're losing peering here, you're picking
this up. It's not affecting us right now because we have duplicate
address advertisements, but it needs to be fixed." And, in this
case, it was fairly short. The more complex problems where Sprintlink
may be losing connectivity or, in this case, where Quest lost peering
with Calran or withdrew a bunch of routes to Calran and instead of
having a simple back-up, the back-up path was six hops. Really
difficult to find from route advertisements. Really simple to see
from a graphic.

Being able to categorise route changes before and after with a
delta, here's 96,000 route cal change based on next hops on this
next hop being able to visualise those either through a table or
through a graphic makes it easier to diagnose why that was happening
in your network.

Probably most of you have heard of MED oscillations. Probably not
too many people have seen that oscillations in the met work. Is a
very small cut of the data we collected in a Tier-1 ISP about a
year-and-a-half ago. Interesting thing about it was it went on for
over two weeks. It consumed the full processing power of three GSRs
and nobody really knew what was going on except that they are three
GSRs that were saturated. The changes were happening so quickly,
essentially at the speed of the three processors for the GSRs that
they couldn't, by the time they typed a command to see what routes
were changing, they were now 100 route changes behind. The changes
were happening so rapidly and they caused so much change in the
network that essentially the GSRs were useless.

They started having to route around thousands of routers to try and
get data connectivity back into major parts of their network.
Eventually they shut down two of the routers and they were ready
to replace them with larger GSRs which would have just exacerbated
the problem. But we were able to record some of the data that was
happening at that time and show them what was happening. They
implemented an always-compare MED and that essentially changed -
eliminated the problem.

So, over time, when you're capturing large amounts of data, you can
also do analysis over long periods of time and look at simple
changes, even for a single prefix on the network. Customer /24 was
flapping continuously over a one-minute period for approximately a
week. They always got back-up through a NAP but this new service
that they brought from the Tier-1 - they spent a lot of money on
the primary access that essentially was doing them no good. We were
able to pull out the data and look at that and show them where the
problem was and how bad the problem was.

So this is a fusion, a new type of fusion, that we're working on,
a gentleman named Pabo Yu (?) started this, Steve Casner and Van
Jacobsen decided that, if we're understanding how the routing works
in the network, then there must be a way to take net flow data,
figure out where the net flow data is coming from, where it's going
to in the network and then instead of listening to it throughout
the network, flow it across the network. So the idea here is instead
of collecting net flow data everywhere in my network, what I really
want to do is collect it at the key points in my network - entrance
points, key data centres. Given that I understand how my routing
topology works, I want to be able to take that data and then flow
it out across that topology. What that gives me is the ability to
say, "If something changes in my network based on either routing
or on data flow, how will that affect the other circuits that are
now being overloaded?  Do I buy more bandwidth? What happens if I
lose a router? Where will those packets go? Will they flow the way
I expect them to? Will they saturate the circuits that I have as
back-ups?"

In a complex network, even with complexity coming from redundancy
rather than from things like MPLS, the question is where will my
data go? It's designed to be resilient. I want to make it as
predictable as possible so being able to fail pieces of equipment,
proactively testing them, seeing where yesterday's data would have
flown if this router went away, gives me the ability to predict how
it will happen in the future if I really do lose pieces of equipment.

So being able to look at the routing topology, being able to look
at net flow data and then being able to fuse those two together
gives me another tool that will allow me to better understand how
my Layer 3 topology is working, how the data will flow in the
network, what capacities I need for the future, what peering I need
or what private peering I need rather than buying transit from
service providers. So it's another way of looking at the same types
of data we've been collecting for a long period of time and hopefully
it will add a little bit more clarity into the Layer 3 network
rather than trying to build that clarity up from a Layer 2 network
and then try to interpret from Layer 2 what the Layer 3 topology
will do.

Are there any questions?

I was either very successful or you're very asleep.

OK.

PHILIP SMITH:

No questions for Greg? No.

GREG HOOTEN:

Great. Thanks a lot.

PHILIP SMITH:

OK then. Thank you very much.

APPLAUSE

PHILIP SMITH:

OK next up and last speaker is Henk Uijterwaal. He's talking about
ASNs missing in action.

HENK UIJTERWAAL:

This is work I did together with RIPE NCC an ASN missing in action.
OK. So, I will talk. This is why we started doing this. We started
to look at ASNs. I'm not going to explain what an ASN is because I
assume you all know that. What you do know about ASNs is that each
AS needs its own unique identifier. They are assigned in a hierarchical
way. Local RIRs and users. And it is guaranteed uniqueness. You
have to be able to identify it uniquely. Another observation is
that ASes are a limited resource. At the moment, a reserve for an
AS number, 60 bits means 65,000 something. A couple can get used
for private use. But you can't use all the numbers. So you only
have 64, 510 of them available on the net. So, who has an AS? It's
quite simple.

If you want an AS you go to your local RIR and you ask for one.
They all have policies, the five regions have five different policies.
They are all based on a single document. With some global places
added. If you're in a region and you think you need an AS, you just
go to your RIR and you ask for one. I want to explain the policies
in detail. And it's at the bottom of the slide and it says you get
the AS for as long as you need it. If you don't need it anymore
you're supposed to return it. I now start to look at this. I want
to see the ASes that are handed out are actually used. Something
seems to be fairly obvious. The ASN is a network of ASes. If you're
looking for your router, you should find all the ASes that are in
use at a particular time. And ASes, like I said, in the previous
slide, are demonstrated by need.

Somebody shows that they need an AS, obviously connects to the net,
and should be your router. So next you think all the assigned ASes
are in RIB. Not quite. I look at this, three years ago by now, and
I noticed a couple of things. Early 2003, the RIRs are assigned
20,000 ASN, 300 new ones per month. If I look at RIBs on a couple
of routers, only 14,000 ASNs are visible, only 200 new ones showing
up every month. So 20,000 compared to 4,600 are missing. And there
are about 100 handed out more every month than are actually showing
up on the net. So, yeah, my question was what's happening here?
This work is a result of a study trying to find out what was happening
here or what is happening here. So to study this program, you need
some data.  Fortunately, there are quite a few data around. The
first thing is - data sources around. The RIRs publish Stats Files,
where a list of all the ASes they have assigned and the day-to-day
assigns. This is a daily report, they have weekly and monthly reports
before. Of course, you can always work your way back where by just
taking the file or say today and if I wanted it for yesterday I
take today's report and remove everything that was assigned in the
last day and I know what was assigned yesterday, I can work my way
backwards. If you do that, you find some small difference, as
sometimes these files change. So we worked through all these files,
we removed them, and we found mistakes and double counting in there.
This is a list of all the ASes that has been assigned at a particular
time.  The second thing you can look at is what's happening on the
net. The RIBs, there are a couple of projects around that have RIBs
around the place. One is RIPE NCC. It's a project that is RIBs from
450 peers. IPv4 and IPv6. All BGP updates. We looked at the data
from 18 August, 2000 to 1 August, 2005. We had AS patterns. We took
and broke them down into their components, the ASes and as soon as
an AS showed up, we found it was used on a particular day. We have
a long list of ASes that were used on particular days. You sometimes
see private AS numbers, we remove them. And we also find people
make typos and things like that. We remove all ASes for less than
a week. And then there's a data sources. CIDR report, that's a
weekly report on the Internet from AS4637 has been available since
1994 and includes all the ASN seen in RIB.  And what do we have
after this? We have two lists. One is sort of the ASN is assigned.
RIR Stats Files. Theory - that should be out on the net. The second
is the ASN in use - RIS and CIDR report - Practice. The normal thing
you would expect from an ASN appears in both lists. It's assigned
as somebody is using it. You found the differences. The ASes are
used on sites and there can be two reasons for that. ASN in use but
not assigned. Some people have inappropriate use. And then there's
sometimes problem with the registration mechanism. We'll get to
that later. Then there are ASes missing in action. ASNs in use but
not registered. Over the course of the five years we found 436 ASNs
used but not registered. Some of them were used for a short while.
255 were still visible on August 1 of last year.

If you look closer at them, 215 of them are in RIPE NCC's ranges.
We went through the basement of RIPE NCC and looked at other files
other than the statistics files. Found that maybe old registrations,
10 years ago or more. Digging through more files, we found some
data for 214. These ASes are probably registered as they are in
files but they don't show up on the publicly available files. That's
something that can be corrected. We found, we still don't have any
idea who owns the ASNs, who are using it and if and how and when
it was registered. Of the remainder, those are reported to ARIN and
APNIC, the good thing is, that a lot of them are in the ARIN ranges.
None of them are found in the ARIN ranges. This is presumably a
problem with transferring of data. Over the years, records were
moved from one registry to another.  Seven of them fell through the
cracks. So, people often mention, and ask about how the files is.
We have 33,000 assigned, 41,000 without data and it's probably a
lot less. Given this is a mechanism which has been running for like
15 years or so, 0.12% with no records. So next thing is, here are
two curves. The purple one and the blue, that is blue, the data
available from 2002. And then the purple line is working your way
back which is simply removing everything from the first statistics
files, working your way backwards. And then the red, you see here.
And the green is what you see in the CIDR report. You can see a
couple of things. First of all, this is fairly straight line. This
is sort of 1999, that is start with the Internet logo.

Now, the Internet bubble lasted for a couple of years, and in 2002,
late 2001, early 2002, you can see in effect on what you see on the
net.  You don't see that, so people are still making plans, still
getting ASNs assigned that don't appear on the net. The other thing
is - you see down here, see on the net, these were pretty much
parallel lines. Yes, there are a number of ASNs missing. However,
if you look at the last couple of years, the difference and the
number of ASNs missing is global. For modelling later on, if you
look at this graph, you can see this is lit. The behaviour. We
looked at it and we think that the growth of the number of ASNs
assigned is linear We did a couple of tests. Fit to linear and
exponential curves. It still seems to be linear. Just to show that,
this is the last couple of years. Graphs and data, the solid black
line is linear, and the dotted line is exponential. Exponential
starts to deviate at the end. Growth rates - so how many ASNs are
appearing every month? Three lines here - the most important one
is the red one, that's new allocations every month. And the blue
line, that is what's disappearing. If you look at the red line,
since 2002, it's pretty much flat at about 284 a month. Also, with
this varying, it's pretty flat there. And one thing you should note
is this bit at the end, I'll explain it in the next slide. So these
are growth rates for all five registries.

So I split it up into various regions. The first one is ARIN. Here
are three curves. The red - new assignments. The green - new
re-assignment. And the blue - that's what is disappearing. And you
see a couple of things. The first thing is, from 2002, from 2004,
there was no recovery.  And then ARIN starts to recover ASNs. That's
the blue area there. The second thing you see is that the green and
the red lines are new assignments from the never used pool, assignments
that we used before are deviate. What is being recovered is being
reassigned up here. Look at RIPE NCC, a couple of things. Very
little recovery. And the other thing is the curve seems to go up.
It is clearly going up. And so far, this is compensated by ARIN's
recovery error efforts. It is still going as a linear curve. If
ARIN ever stops doing this or recover anything they can possibly
recover, it might cross from linear to exponential again. You can
see the recovery effort over early 2003. Next one is the fraction
of ASN seen. I took everything assigned over time and divided it
by what's shown up on the net.

How much you see here is OK. This is 1998 was only 40% and has been
growing up quite nicely but for the last couple of years it's been
pretty flat to about 63% which is visible on the net. And there are
some numbers here. 33,000 were assigned, 20,000 there - 60%. 5,000
were used for a while but they were retired. Next interesting
observation is the age of retired ASN. We plotted how long an ASN
was seen before it disappeared from the Net. It simply means that
people use - looking like 50 or 60 months and then disappears.
People seem to think this and then stop using it. It can be plotted
over time. This is plot, this is time and a fraction of the ASNs
used. This is time when ASN was assigned to a fraction that's still
used. This is 2005 and this is 2004. In 2004, the ASNs assigned to
2004, about 80% was used.  And then it goes down pretty rapidly.
So ones that were assigned 10 years ago, only 40% were still active.

Um, next thing I'll look at - wait a second. Why does this drop?
There are two effects that cause these drops. First thing is, sites
go out of business. And when a site goes out of business, the need
for an ASN disappears. But people are sort of scrambling out, getting
it ready, trying to find new jobs with their CVs. The last thing
on their minds is to send an email to the registry saying we get
an ASNs from you and now you can have it back. And there is very
little recovery effort there as well. The second thing that happens
is the network merge. Often when there is a merge one ASN disappears
but there's no incentive to return it to the unused pool. People
often merge their networks ASN1 and ASN2 and they call them ASN1
and one gets lost.  You then have to go through a registration
process and apply for a new one. That causes the drop there. Then,
the activation delay - how long does it take for an ASN that you
apply for to appear on the net? This is data for the APNIC region,
so on the bottom is assigned, the difference between days of
assignment and appearance. And the three curves, pink, blue and
purple are the various years. You can see a couple of things. The
first thing is, if you wait two months, 60 days, about 40% appears
within two months. If you wait 200 days, a little over half the
year, about 2 out of 3 has appeared. If you wait for a really long
time - a year and a half - essentially this curve flatters off. 80%
only appears after a year and a half and it's fairly constant. The
other way you can look at this, 20% of ASNs that has been assigned
never appears.  So that's observation. I looked at the policies.
All the regions have policies on what you have to do with an ASN.
If you read the policy, it says that there must be plans to use the
ASN  within 30 days after assignment with ARIN. In RIPE NCC, there's
no policy. There was a discussion on the mailing list three years
ago. And here in the APNIC region, policy is that you must reach
requirements on requirements upon receiving an ASN on reasonably
soon or after. So, if you look at the policies and the theory and
reality. They all say months, three months, soon, but in practice
the time between assignments and appearance on the net is a lot
longer. And the second problem is that 20% of the ASNs that are
assigned never make it through on the net even though there was
demonstrated need. Those are the raw numbers.

Now some modelling. The first question that people have asked is
when will the Internet run out of 16 bit ASN numbers? 33,681 were
assigned last year and 30,000 are still available. We have 284
assignments per month. And we have about - which means you run out
in 2016. It is sort of worrisome. If we cannot have ASNs, what do
we have to do now? So, the first thing you can do is instead of
solving the problem, postponing the problem. First we can reclaim
what disappears, to 284 -105 that equals 179 assignments/month. You
can even make it be a little bit more aggressive, by also reclaiming
what is never used. Go down to 60 assignments per month and the
period there can be 33 years. Yeah. If that's not long enough, next
thing is, let's make the ASNs a bit longer.

There is a solution that there is a proposal here. I'm not going
to make anymore detailed predictions. There is a draft that has
been around in the RIR working group for a while with an extension
of ASN numbers. Now, based on this work, and several studies by
Geoff Huston, a policy proposal in all five regions which says we'll
start handing them out. Handing out is one thing, it also has to
be implemented. You have to update your routers, have them deployed.
And you'll need a couple of years. So whenever your router is there,
make sure that this draft is implemented and that's something that
requires a push from your side. Other ways to make things last
longer - you can obviously, don't move to upgrade your kits, you
might think about changing policies, current policies are basically
demonstrate need.

20% never makes it to the net. It's probably too easy to demonstrate.
So we visit policies. That's something for the various RIRs. Not
for this forum. A couple of things - the essential thing in this
game is uniqueness. You want an ASN to be unique. You don't want
it to be used and reassigned. The first thing to do is using it
again. There's no good mechanism for recovery. A solution to that
might be the certification efforts, which are going on in the APNIC
and RIPE region. Certification is very simple. It shows that it's
assigned to somebody. This somebody is for a one-year period. You
can always renew it if you still need it. So you never have to
renumber. And if you get an ASN, you start to use it, and after a
year, if you still need it, renew it. If you don't need it, the
certificate will expire. As time goes on, it can be reused. It has
one requirement, and people need to check these certificates when
setting up. However, that's probably something that will become
standard practice for securing the routing system that's deployed.
284 ASNs assigned per month from the unused pool. The actual growth
is only 160. The pool will be empty by some time between 2013-2016.
Ways to make it last longer - reclamation, certification. If people
want to do this, it will last longer. If people don't want to do
this, then one should start to think about deploying 4 byte  ASNs.
There's a full report from the RIPE website. And with that, I have
to acknowledge a couple of people who helped with the data and a
few transport companies as well. (refers to projector slide)
 What else - any questions?

PHILIP SMITH:

Any questions at all?

DAVID CONRAD:

Do you think there should be global consistency with regards to the
ASN reclamation policies?

HENK UIJTERWAAL:

Well, I mean, my personal opinion would be it would be good if this
would be done - if this was done across all RIRs. There is no
advantage to go to one registry or their policy differences. But I
don't really have an opinion on this.

CHRIS CHAUNDRY:

The one you said that was totally unmanned, early on in the talk,
surely you can, you have the path information from the RIB and
surely the people that the ASN number is connected to know that the
number has connected?

HENK UIJTERWAAL:

You can find that out. Inside the RIPE NCC building, we couldn't
find any data on that.

PHILIP SMITH:

No other questions or comments? OK, thank you very much. So that
brings us to the end of the Routing SIG, the two sessions. I would
like to thank all the seven speakers that we've had this afternoon
for their presentations. Finished a few minutes early, so you've
really got only about 40 minutes now between this session and the
APOPS, which is the next session in this room. The APNIC social
event is tonight. Those who are going to that please remember the
last bus leaves at 7:10. If you're late, you'll miss out. Please
be there on time and directions for the transport are on the back
of the ticket, in case you're wondering what to do. Otherwise, thank
you all for coming. Thanks very much to the stenographers for their
work. See you at the next Routing SIG in six months time. Thank
you.

APPLAUSE