______________________________________________________________________ DRAFT TRANSCRIPT SIG: IX Date: Thursday 2 March 2006 Time: 4.00pm Presentation: Exchange point operational experiences Presenter: Stephen Baxter ______________________________________________________________________ CHE-HOO CHENG: Once again, slides are over there (indicates screen) and transcript is over there (indicates screen). Because we have actually nine talks for this session, I think we need to start on time and also for each speaker, there will only be 10 minutes. So I think we have to be quick. OK. Let's start the session. OK, this is the second session for the IX SIG. I am Che-Hoo. I'm the co-chair of IX SIG. I think I won't go through the housekeeping notes because Philip has already mentioned those. OK, let's get started. Our first speaker is Stephen Baxter and he will talk about exchange point operational experiences. 10 minutes, Stephen. OK. Philip said you have 15 minutes. STEPHEN BAXTER: Very kind. Thank you. Good afternoon. As already mentioned, Stephen Baxter from PIPE Networks. Some of you may have been in the peering BoF yesterday, where I did some gratuitous self-promotion so I won't do that today. I'll talk about the operational experiences that we had starting and running PIPE Networks as well as a bit about how we do IX operations to give you some sort of context. A little bit about PIPE Networks. We're a company in Australia. We run peering points. It's effectively a distributed peering point model whereby we take peering to concentrations of IXPs. We've got 14 sites in six cities and in each of the cities, it's connected by our own metro 10-gig on our own dark fibre so it's all good. It's a good, stable, profitable business. If you have any questions, please don't hesitate to talk. If I'm going too fast for over there (indicates stenographer), I'm sorry. That's just me. We peer. We're effectively an MLPA exchange. We're a single VLAN-typeset-off. There is lot of bilateral peering as well. It's a single VLAN. We have ATM access into our peering points with a great partner, with our business partner, to provide ATM accesses into our peering points. We provide the routing for that to occur and participants can get their own PBC into that so it slightly breaks the ethernet model about it allows us to get a great critical mass up early. We use route servers for routing updates. We have a centrally managed web-based route entry system and we automatically generate filter lists on a day-by-day basis. I'm an ex-ISP operator and we can't resist writing and building our own operational systems. The system that runs it was written inhouse and has been maintained inhouse ever since. A bit of a disclaimer here (refers to slide). I'll get into stories about what customers have done to us later on. I've removed names to protect the guilty. The reason we can reveal this information is because as a peering co-operator - and this might get up Bill's nose - we're a commercial operator and we have staff that care about our peering. So when something goes wrong, we know about it, we do a lot of pre-emptive checking and the people I have looking after this spend all of their time ensuring there are no accidents across that fabric. We're going to talk about what happens in that peering points, the type of routers we use, switching, how we do the routing. Some other experiences and some suggestions along the way. A lot of this is probably common to a lot of peering points around the world and if it is, I'm not trying to tell you how to suck eggs, but it's just exactly what we've done. Funnily enough, a lot of people use our peering points for transit avoidance, least-cost routing. That might seem obvious but needs to be stated. We're not as big as LINX but do decent peak in the 1.2-gigabit a second, a 7-day average there. In Australian terms, it's nice, it's quite big. We don't have a lot of spurious on-net peer-to-peer content that a lot of Australian-based operators might receive a lot of traffic from. It's all fairly genuine traffic with real content. BILL WOODCOCK: Before going on, can you explain what that is measuring? STEPHEN BAXTER: That is measuring the total ethernet port in and out. BILL WOODCOCK: Across all of your locations? STEPHEN BAXTER: Across all of our locations. Transit supply. A lot of customers use PIPE as a mechanism to source either primary sources of Internet access or secondary sources. They do this either via a secondary BGP session across the common VLAN or they'll nail up some sort of IP and IP tunnel, VPN sort of GRE and some of them use sessions between Linux boxes to get that next layer above and they use things such as the Netflow tools or the Cisco rate limits to control how much traffic flows to a particular peer or from a particular peer in that case. There's a lot of layer 2 forwarding resale. In Australia, the way you actually access a lot of the DSL and broadband offerings now from the wholesale carriers is via MLSs and Layer 2 sessions effectively. A lot of guys flick across the exchange fabric and picking up less tail services - we see that delivery. Instead of going to wholesale carriers and picking up multiple tails, they can pick them up via the exchange points. It's an interesting use of the peering fabric. Inter-city high-speed, low-cost fabric - there's been a bit of consolidation in the industry since we started, not as much as before we started. As a result, a lot of businesses have merged and consolidated. They'll retain their peering link in two locations in the same city and use it as back-up or a transmission path between the locations with the 10-gig metro rings we deployed it's easy to do that but we don't encourage it. We try to move them on a VLAN it shouldn't be a shared fabric for a start and it tends to skew our peering statistics. We like to think what we're seeing on our peering ports is actually peering traffic. Access to USENET feeds. We make those available to read access, to our ISP's customers' customers. We used to have a lot of Internet servers until we started moderating the groups we carry and so they don't carry as much copy right-protected content as they used to. As a result, the usage has gone down. I had another one in here where, if you had one ISP stealing another's customers. One ISP in Brisbane found a good DoS path and it didn't cost much to teach the other ISP a lesson. We jump on that stuff and stop it but it was comical for an hour or so. We use Cisco IOS-based routers. It's hard to pass up. They're primary and secondary routers. All of our routers are in diverse locations. While we use the same platform, they're in different physical locations. All our locations are backed up with diverse rings as well. We can lose a site and still get transit between the other sites or transmission between the other sites. We use some Linux and Quagga and Zebra based platforms on a low-value content. So we have some free content areas where people put in mirrors and other file transfer areas like that. And it was a really cute solution at the time. Those boxes with an appropriate-sized CPU can transfer reasonable traffic and they've been kicking along ever since. The routing table size is 4% of the global table. If we did aggregation, I don't know how big it would be after that. A lot of our customers aren't the best at aggregating. I often joke that I could do it on 2,500-series routers but, if I did, my customers would walk away so we use 7,000-series routers. We haven't deployed flap controls on the routing table and that's good for us. We've recently deployed a multicast framework. We've set that up. Zero ISPs have taken advantage of that, which is a shame. We've put a certain system out there with respect to multicast and we're waiting for people to say if it's good or bad. We first started using Cisco 3,500-series switches. They didn't have any 10-gig support, which was tough when we started bursting a gig regularly in places. We looked to replace that. We needed a lot better port control. On the platform at the time, it wasn't available. Spanning tree has always been an issue for us, especially customers sending it to us. We hate that. We ended up going to smaller locations, and they've been quite nice until yesterday in one location which is another story. The spanning tree control has been fantastic. Great for the price. At the time, it was hard to pass up so we've got a lot of 10-gig capability in the network. Routing - the way we route is all customers have to enter routes into a central web-based managed system. From that we actually burn access filters every day, 7:30 in the morning and basically, if you don't tell us you've got it, we're not going to let you route it. It's about that simple. Each IX point has its own ASN but I don't know if APNIC like us for that but we've got lots of AS numbers. We use IP space from APNIC under the IXP allocation rules. We monitor - interestingly, we monitor all the routes our customers send to us and it's a really great tool for working out. If the customer rings and say they're seeing routing issues, we can look at the graph and see the dip or rise in routes and jump on the problem quite fast. We don't mandate RIRs. The first few customers we brought on board, basically recoiled in horror and didn't want to do it. So we took it out. We don't require it now. All based on customer feedback. Some other interesting experiences. All the TCP vulnerabilities became an issue - within a day, we issued passwords and pushed it out automatically to the routers. We tied down that hole there. We see a lot of cool tough that customers send them, God bless them, a lot of our customers. A weekly or daily incident is sending us a default route. There's never any accident - we get to see in a system what's happened. Lots and lots of global tables. The amount of IGP we see is staggering at times. A lot of V1 from certain customers and all parts of the RFC 1918 address space came along, not to mention everything from /32s onwards. You know when an ISP de-aggregates into /32s, you're in trouble. Some suggestions from us to peers. Don't turn on commands unless you know what you're doing. IP verify unicast reverse path is one of them. We ask them to take that command off if things aren't working properly and magically it works. If you don't know what you're doing or haven't got as much experience in routing, please turn it off. Critical routing solves so many issues. The BGP decision algorithm - you shouldn't rely on it to make a commercial decision, because it makes really poor ones. It's got all sorts of knobs and dials that you can use to get a good commercial outcome. By default, it's not the smartest. It's got no idea what your transit provider is charging. Please turn off spanning tree. ACLs and communities - you can tell us what location a route has originated from and we'll leach that out as a community. That's good when an ISP peers in one location so people can make intelligent decisions based on where a route comes from if we're seeing it in more than one location. Common aggregation policies if you're peering in transit - that's our biggest bugbear. We exist by sending lots of traffic across a peering point and when a provider sends us a /19, we put it down into 32 /24s to the rest of the world, you don't see as much traffic. They ring up and see they're propending out to the transit provider. The smallest one always wins. I'm sorry guys. Try it again. We had one fantastic customer who brings their secure management network on to the IX VLAN. Don't use spanning tree, not so much with nuclear switchers. We'll shut your port down, it's fantastic but you still ring us and get angry. Don't do it. Thank you very much. That's time for me. That's how we do peering at PIPE Networks. CHE-HOO CHENG: We can only take one question. No thank you. STEPHEN BAXTER: I'm disappointed.