RIPE 58 - Amsterdam, the Netherlands

The plenary session commenced as follows:
CHAIR: Welcome back from lunch, hope you enjoyed lunch. It's time to start with the plenary programme, the EOF. We have the presentations in this slot. Two of them are to have to do with DNSSEC and its deployment and the third one is on reverse trace route. So, without further ado, we will start the first speaker is Olafur Gudmundson.

OLAFUR GUDMUNDSON: Hello. I am usinging the head phone mike, I have never used before. I am not going to give any introduction to DNSSEC because I assume everybody here knows all the gory details and all that. But (DNS) keep going on to talk about the TAR issue and what we can do about it and what it is.

Yes. What is a TAR, Trust Anchor Repository is something that stores Trust Anchors. The DNSSEC deployment project published a discussion paper about a year ago on TARs and it's a good background reading on it. This is a definition that I am going to be using today in this talk and basically, if somebody collects Trust Anchors for their own use and doesn't make it available to anybody else it is basically irrelevant and we shouldn't care about it and we don't need to set policies about it.

Then there is always a question why do we need DNSSEC TARs? Well, we are going to have a little problem, not all the parents -- not everybody is going to have a signed parent or the parent may have a parent that is not signed, etc., etc. And until we have universal deployment of DNSSEC, we may have TARs in existence. There is a second problem: If we allow multiple DS digest algorithms people may start using that are not universally supported and to overcome that hurdle and be able to change, people will start publishing TARs around it. There are other issues.

If you have a business relationship with somebody, like a collaborative project and you don't trust the parents to do the right thing and you want to have an informal exchange or a formal exchange of Trust Anchors to use, then you may set up a private collaborative TAR.

What are the big issues? Well, there has been a disagreement on what TAR is and what the role is and to a large extent, that has been related in my opinion that people are looking at TAR from a certain point of view that is related to what they do rather than taking a global view.

There is also a lot of people have opinions on what TARs can list and what they cannot list. We also are dealing with, this is a modern, relatively new technology so there is very little experience of what is going to be done. There is no guidance right now anywhere, and I personally have the opinion that the way to deal with TARs in the long-term is to simply give people guidance on what they should do and how it should be operated.

Now, let's start talking about what a TAR domain functions in an operating TAR are.

TAR, Trust Anchors and they get them somehow and how they get them and how they get in is what I call the TAR admission policy. There is a very wide spectrum on what these different policies can be. You can have something ranging from the scanning a discovery process where, basically, somebody goes around poking at every domain name they can find and looking for DNS key records and pulling it in. You can go to the point of saying you are going to have to all jump through the following hurdles before you can have your trust anger listed with me and are saying you have to get a certificate by one of the certificate authorities and then I will list you, or if this is like the inner mongolian or outer DNSSEC advocates group if, one member says this is a good T A they can put it into their Trust Anchor Repository and it is basically, you are trusting the other members then.

Then, there is the policy question. Even if the parent is signed, should the trust anchor be added?

AUDIENCE: No.

SPEAKER: Somebody yelled out the answer no F that was not picked up. Then there is a question of how is the TAR maintained. TAR, that is right once, I don't think is is a good TAR and nobody should use F it is not operated, if they don't check their contents frequently, it is basically a useless or misleading information. The question is does the TAR follow the RFC5011 basically for key roll over F a key if enough there if, it follows the process described there, does it make a change into TAR, how quickly get it into the TAR, if a TAR says yes we follow 5011 but they only check it once a year, they are not really following the standard. For operators, it is important to know how frequently the Trust Anchors are scanned because if you want to do a key roll over within the tightest constraints possible would you like the TAR say to scan every week or every day.

The revoke bit must be honoured by the TARs or they have to tell you if they don't. In the cases where you have registered to add the keys into the TAR, how is the process when owner says I want you to change the contents, is that done automatically, is it done through some other check and balances, etc., etc..

Deletions. This is probably going to be the one that people will be dealing most with when they don't want to be listed in a TAR. But, in the case of when a domain goes insecure or domain disappears or is transferred, the TAR, the TA should disappear. If a registrant requested, it should go away. Then there is the case what if you can't talk to the TAR. If you don't have a language in common if, you have no relationship, etc., what should a TAR do if the publisher of the key is doing things strangely? Should it drop it, should it keep it? If a TAR is listing a domain that is suddenly parent gets signed, how long a grace period should be provided until the TAR -- the Trust Anchor is taken out? And under one conditions could it stay in. This is a policy issue for a a TAR. It could also be a policy issue for the owner when they go in and ask for a removal.

OK. How is TAR accessed? There are ways but basically there is the one that configurations, you download a file, rsync a file, something and you put it inside the validator and tell here are the Trust Anchors you trust. You can do them on the fly, like with a look up into your database that is for your validateers, you can do it with a DNS query such as the DLV look up. For each TAR, it has to have policy, who can access it. Because TARs somehow need to be operating and if you want too far stable operation that requires funding. And it has to -- and the funding can be provided, for example, by subscription and the subscription can be paid by the people who are adding TAs to it or the people using the TAR. One of the things that people have to think about before they start using a TAR is what is the long-term viability of the TAR and what will happen if the TAR disappears overnight? If you are a zone operator and using DNSSEC and there are TARs out there that are including keys with limited discrimination, what can you do to control, whether you get listed or not? There is a bit in the DNS record that says this is a secure entry point into my zone. One proposal is that TARs should only list keys that contains this. Basically if you don't have it, that means you don't want to use it as an entry point into your zone even if you list. Yes.

Even if you have a signed parent, it is not a bad idea to avoid problems to still follow the RFC 5011 procedure when do you key roll over. Just to give the people who are somehow hostages of some TAR somewhere in the world a chance to catch up.

If you are really concerned you should be monitoring the TARs that are in existence and contact them. But it is going to be very hard if we have 1,000 TARs to find out who they are, where they are operated. It's going to be the same problems as we have with the Spam black list. How many of them are in existence today? 50, 100? I don't know. (Today).

We have a few live examples of TARs. IANA published their interim TAR which they are going to operate until after the route is signed which list the Trust Anchors. TLDs. This is a good example of a limited scope TAR. It will only list one level down. And every -- all listings in this zones are at the request of the operator of the domain underneath them, so there is no scanning and this is full cooperation between the two parties. The access is very widely available. You can download it via http, you can look it up, you can use ftp, they say you are supposed to have an R sync access. It hasn't worked for a week or when I tried it. They provide a tool to convert their file format into one that can be plugged straight into DNS validators such as bound and unbound and they sign the file, so you can check it with PGP signatures on it. And it is very easy to use. All changes to the file are directed by the owner of the key.

Another example is the, ISC operates which they call their DLV. Their admission criteria was the first one, which was registration which was followed up by an e-mail which contained a request to put record into the zone, sign that record just to prove that you had access to the key and the zone contents to prevent somebody from registering a Trust Anchor that they didn't have authority to do. (Anchor). But recently, they went to hybrid model and now that includes the ITAR contents. The access to this is basically via the DLV mechanism which is code change it looks up a magic name or actually appends a string to a name and looks up for a type DLV and then that is basically a DS record. And anyone in the world can access it. Right now, all updates are handled via the registration form but they say soon they are going to provide 5011 support.

On the other extreme, is this TAR which is operated and basically they take anything. They go outside scan any top level domain they can get contents of, any domain name they see, any key record they see will go, that looks like a Trust Anchor will go in there and if you want, you can go and click on a web page and put the name in there. Currently, there are over 33,000 DNS Trust Anchors listed in this system. The access is either via http or they are just start to go provide a DLV mechanism and they monitor the zones that they have Trust Anchors for using 5011.

Before I started this talk, a good colleague of mine said I would be killed by the end of this talk. This is the reason why:

To be listed or not to be listed. Some people have the opinion that their Trust Anchor is under their control and nobody should ever list it without asking them for permission first. Others say DNS information is published and I can do whatever I want with it. So, I don't think these people will ever agree. So the best thing we can do is tell TARs please don't list this key in your TA, somehow. Another extreme is if you list my key, here is my contact information, please let me know that is the listed in your TAR so if something goes wrong I can contact you. We could possibly use the SEP bit to signal that this can be listed in a TAR or not, he could add another bit but there has to be some guidance or people will do the wrong thing.

Then, there is the bigger validation issue on the downstream. If the contents of a TAR you trust differs from what comes out of DNS, which one do you trust? There is no one right or wrong answer in that. A DNS guy I would say DNS should be on top (as a) but I am sure there are others that are going to disagree with that statement.

So fire away.

AUDIENCE: OK, thank you. Any questions for Olafur? Please state your name and you go to the microphone.

AUDIENCE: Steve Kent BBN. Just say no. This is one of the good things to remember from the regular an era. Would you encourage people to run their own parallel DNS hierarchies.

SPEAKER: No.

AUDIENCE: Why do you think TARs are a good idea other than the I TAR which has a clear scope for narrow point in time and a reason for going away? Which is another way of saying I wasn't persuaded by the reasons you gave of one doing this.

SPEAKER: BBN for example has a registration in dotcom. Come is not signed, it's not going to be signed for at least another two years. If BBN decided to use DNSSEC today would it be useful for them to have their listed in the ISCs DLV for people who wanted to use it or not? You are a security guy, you tell me

AUDIENCE: I would tell RIT organisation just what I suggested here, no.

SPEAKER: Wait for /KO*PL?

AUDIENCE: Yes. Wait. DNS has the advantage that the public infrastructure environment has lacked for many years, which is it is an authoritative hirer arcy that the world tends to' ply upon. You don't have to use that word trust which is not transitive and not qualitative and giving it up by creating mechanisms to codify TARs as posed to just saying no, encourages all the problems that the public infrastructure arena has had a had for a very long time so taking those on and you did a good job of enumerating all the bad things that could araise, I am not sure all but enough of them to make people want top rethink this would encourage me to say no, just to put pressure on people BoF you in the hierarchy to be signed and don't create mechanisms, don't create ITF standards that would encourage this behaviour be standarding it and getting an impremature of standard Internet fashion that. Would be my perspective.

SPEAKER: Thank you.

CHAIR: I think Keith was next.

Keith Mitchell: ISC, thank you for the overview of the TARs including our DLV. If I may respond to the previous questioner, why are TARs a good idea? I think we would all like to live in a world where the route is signed and there are no unsigned islands of trust or disconnected and disjointed. Unfortunately, we are not in that world yet. I think it's pretty clear that DNSSEC is on straightforward path deployment. I think one of the values of messing around with TARs now is that we can gain operational experience of putting DNSSEC out. I mean, I won't -- over the past month or so, I won't say it was straightforward but we have learned some valuable lessons about how to do DLV, thousand make some fixes to bind as well. (Fixes to). And I think that we can learn from that. I think that to some extent, practicing and putting TARs in place is a way of showing that this technology actually works and there are benefits and I think that in itself creates incentives for the route to be signed and for the whole hierarchy to be deployed. That is really my comment. The other thing I will say is that we are doing a discussion panel about TARs at the meeting on Saturday which is open to anyone who is interested. Thanks.

SPEAKER: Keith can I put you on the spot? Is there any lessons we can learn from TAR that we cannot learn from the operation of say .SC or .B R?

AUDIENCE: I think there are potentially -- I think there are lessons that can be learned because it's about trust between organisations, not necessarily just within one registry. There are scaling issues there. There are -- there are still lessons, yes, and there is also lessons about the various software platforms too. It's not unequivocal but yes, I think there is still value.

CHAIR: Thank you.

AUDIENCE: Can I control the distribution of my zones KSK? No, I can't. Well that is a reason I don't sign, right. Seriously speaking, I guess I agree with you on that point, you can't really control the distribution and let's not go down that whether the DNS data is public or not but speaking as one of the co-chairs of the DNS working group I think I remember that this community has made a very strong statement on what it believes a TAR is and what properties a TAR has. Focusing on the IANA TAR but listing a lot of very interesting properties and non-properties, so what I wonder is why are we having this discussion right now and here and why do you think or do you think, and if so, why, do you think it helps deployment to say bastardize the term TAR for no good reason only pleasing everybody who is doing some Trust Anchor collection and spreading that around the world, which everybody is free to do, but please can we stop to call this a TAR because people have some expectations on what properties a TAR has.

OLAFUR GUDMUNDSON: My opinion is that if you are a user of a TAR, you should know its policies, you should know what you are buying and yes, if the TAR is following a policy use deemed as acceptable you can use it but if it is not doing what we call safe operation, you shouldn't be using it. Unfortunately, there is no way to discover what the policies are, sometimes. Yes.

AUDIENCE: The point was that if a certain set of policies is followed, then it's a TAR; if it's not, then it's not, and the stuff you have been listing and that is TAR and a my nephew and dog are all TARs, that doesn't really help deployment. That is what I am saying. Many of these very enthusiastic deployment aids just fire back.

SPEAKER: Yes.

CHAIR: Peter, question for you: Who owns the meaning of the word TAR? I mean, you decide - just said this is a TAR it's not a tar. It's not clear to me by whose definition something a /S* is a TAR or not.

AUDIENCE: Don't bring me in temptation that I regret but didn't actually register that term. I think the one answer I can give is that, again, we had some very strong statement, probably coming out of this or is similar room, what properties we would like to see in the TAR in one tar.

CHAIR: In one TAR?

AUDIENCE: Well in the TAR and I think for the purpose of the enhanced user experience, excuse the term, it would be very helpful to stick with that.

CHAIR: Here is a comment to you and possibly to Jim as well as as co-chairs of the DNS working group. RIPE is not a standards body, we -- but RIPE has done recommendations in the past, recommendations on flap dumping in the routing working working group and so on, perhaps the DNS working group could think, consider to put its view down on a RIPE recommendation and take it from there rather than have -- I don't think this is a closed subject and this is why I invited Olafur to come here and present and pretending that it is doesn't actually neighbouring a closed subject: The discussion is going on and that is why I think it's useful to discuss it.

AUDIENCE: There is no doubt about that, that discussing the properties of these key collections is very helpful, sure.

SPEAKER: Yes. And whether the RIPE DNS working group wants to say just what Dr. Kent said, just say no.

Jim: Thank you. I want to pick up on some of the points that were made by Steve, Peter and Keith at the mike and the points you were making yourself. To be flippant first of all, if we want a definition of what a TAR is I think definition is what Wikipedia says it is. But I think your idea of having the DNS working group scratch its head and maybe come up with a RIPE recommendation and I think the foundations we have done and the attributes we think are appropriate would pretty much 90, defining what we as a community that I TAR should be and I think that is where we should try and get something done. And in that context, a TAR probably then pretty much means is we have got something that is very close to what ICANN is doing or IANA is doing rather, that is eTAR or the TAR because it's deal with TLDs and stuff of that nature. And I really get very annoyed a little bit with what Keith was saying and I strongly disagree with him, things that DLVs in a very loose sense of the word are TARs because there are stored there so is a repository of trust material but it doesn't have the attributes of a TAR as definition we came up with in the working group for a idea of what a definition S there is all this policy stuff, these are important attributes about what you consider this Trust Anchor is worth and I think actually DLV type things are very destabilising, there is usually know policies around them at all, you have no way of verification or authenticity of where the data came from and that is another cause for concern. I think there is a longer term problem, DLV is the work of the devil because in my opinion DLV is actually taking pressure away from TLD operators to sign the zones. If I have got a delegation dotcom and I can't get it signed my option now is find flavour of the month DLV guy to sign my key so at least that is something to get my zone signed but it takes the pressure off Verisign to sign dotcom.

CHAIR: I will let the man with the horns and red tails speak.

AUDIENCE: I am not quite sure why Jim is getting quite so worked up. I am certain not asserting that DLV is in any way more authoritative than any TAR. Primarily a tool to allow people to do dynamic look ups on existing TARs. And.

Jim: I think we had a discussion on what or didn't a TAR, TAR got something a whole bunch of policies and stuff read about it and most of these DLV things don't have these so probably they don't count as TARs.

AUDIENCE: I am not going to give you chapter and verse here but we do have policies as to what goes into TARs and just because somebody else says it's a TAR doesn't mean that that is what I am saying. Or ISC is saying. And I mean, I take your point about it does take pressure off. On the other hand if you have a very large number of domains within a zone that are registering with some third party TAR I think that sends a pretty strong message to the TLD operator maybe they should be doing something about signing.

Jim: I would disagree, going somewhere else to get their DNSSEC stuff.

CHAIR: Could you have some time on the agenda to further to discuss this. The objective now is to give everyone an overview and take the nitty-gritty detail at the end. Thank you very much. Thank you Olafur.

(Applause)

CHAIR: Next speaker up is Eric Osterweil.

ERIC OSTERWEIL: My name is Eric Osterweil and I will be presenting some joint work that I have done with Dan Massey and Liz I can't /KHA*PBG my advisors at various institutions.

So, the work I am going to be talking about is basically some observations that we have made over the past few years about DNSSEC as it's rolled out. Mostly pointing out things I think we ought to be aware of, not pointing out things that spell doom and gloom.

So, I don't know that I actually need to go through this too closy or into in too much detail but DNSSEC helps prevent things like cache poisoning. I don't think I need to tell you guys anything because I think pretty much everyone European ISPs have blazed the trail for DNSSEC deployment.

So but I think the real question that I am going to start, happens DNSSEC overstressed DNS? Has it done things that maybe some expected but didn't actually anticipate it being a real problem. So DNSSEC has taken lots of packets to the standard DNS packets, keys can be anywhere up to 4 Kbits in length and zones are sort of -- supposed to have at least two of those, potentially more especially with roll over standards. Each of these keys has signatures associated with it, these signatures of course vary in length according to the key size that generated it. So is happening is resolveers and name servers need to start sending each other DNS packets are larger than before. And so, I am actually going to talk about specifically what problems have come were this that we have been able to observe and quantify.

So I will start off with some background and I don't want to beat anyone to death with that so I will try and be sensitive to we know this already. I will talk about the set-up that I need to give you for the types of path work that justify our observations, I will give you brief background how we make our measurement results and I will show you some results and give you what I think might make sense to keep things afloat as we go.

Before I get into this, how many how many people are would not hear any background because you know it perfectly well? OK. We are not quite 50 percent so I will try to go fast and you know hum or snap or something like that if it's just completely redundant.

So DNSSEC basically at a high level prides 3 basic services origin authenticity, data integrity and secure denial of existence. It helps you be sure you have got data from the owner of that data, that it was not modified on its way to you, inserted have records inserted or removed from and in the event there is no data it can prove to you there is no data, thus keeping an adversary from tricking you every time you ask for data that there is nothing there.

Very quickly, does this with public private key, zones the private organisation generates for and the public portion is entered as a record into the zone so resolvers can get that public key, use it to verify signatures that come attached to all the data. So, signatures are generated across RR sets, if I have an A record or a dub dub dub record from my zone and it has four IPs in it the signature covers all four so therefore you can't insert or remove an IP from that without me knowing.

Very briefly: We have here is is a standard NS set after applying the crypto to it, the signature that makes sure it's safe. And that is pretty much all I am going to say about DNSSEC, does that seem sufficient? So the large message support is separate than DNSSEC, it uses a mechanism called which is RFC 2671 and it uses something called negotiation in which a resolver can actually in its queries, I can accept messages of the following size and response, my buffer size essentially is this big. So nameers get that message and say that is fantastic because I have something that is too big to fit in the standard 512 DNS message but you said could you take a 4 K packet and that is great I have 3.99 K packet or something like that so I will send that to you. Of course, this doesn't take into account the network path, and actually the RFC actually says that, you know, if there is any kind of PMTU understanding that should be taken into account when this negotiation happens, and that in the event that there is a problem fitting data over that initial advertisement, find ago message size that makes sense is considered preferable to the outright use of it. CP. And but what really happens today somehow is a resolver supposed to know that it's ever going Gerry, 4 K will probably work I will just go with and that tends to be what happens and unfortunately, what that leads to essentially is false advertising. My resolver is now advertised I can take a 4 K packet in response and the path simply won't tolerate that in some cases. And so I will show some evidence of this.

And so this is really fast and it's not meant to sort of bore you guys, for the sake of completeness, a network path is is a sequence of links so each link has an MTU, maximum size packet that can fit over it, or something that can fit over it and the smallest one of those along a path is the path maximum transmission unit, the thing the path can tolerate, the PMTU. Unfortunately DNS make this a little more complicated because of middle boxes, fire walls and NATS and so how routers and the problem is that these devices have for various reasons in the past decide they know how DNS should look and when they say one that doesn't look like they should drop it just to be safe. And so there is all sorts of different reasons that they will drop a packet or, you know, it's not always because of inspection but nonetheless, these are above layer 2, above layer 3 and they are essentially (above) in some cases limiting DNS packets from getting through so I essentially overload for the rest of this talk the term PMTU basically anything along the path that impedes from me from getting message. It may not be L2 or 3 it just means that that path that my has a maximum transmission unit of some kind that I have exceeded or not.

So, what is very important now is to be able to disconcern clearly when there is a network problem whether it's a PMTU problem or something else, so the last thing we want to do is say there was a network outage so let's Dail that. The way we sort of say that could you disconcern this is, a random drop should be easily overcome by retransmission, if it is something got zapped in the network as random drop I should be able to with a reasonable try out retry a few times and that should overcome random drops. The other thing that we want to distinct from, let's say the name server is down and I don't want to say I did a walk down 515 /#12* and couldn't get anything through, we use -- if I can issue the same query at sort of the same rough time, with a different buffer size advertised I get an answer, the name server is not down. If I can't get an answer but at the same time someone else somewhere else can get an answer. Name server is not down. If that problem of that first advertisement is always a problem but I can get to it other times, name server is not down. And if TCP works name server is probably up. That is kind of at a high level the way we will talk about how. SecSpider our measurement. We currently have 8 poll Lars stationed around the world, Europe Asia and North America. We are definitelily always looking for more so these are not any kind of gold standard, they are people being very friendly with us, they are very, very tiny DNS reflector, we use TSIG to make sure they are not open but we are always looking for more. If anybody has interested in hosting a polar you won't notice it's there. It takes no memory footprint or CPU, I would very much chatting with you. Just because it really helps us -- eight points around the world is nice but it helps to -- tough to quantify problems. The best he can do is show evidence of problems now. So I have sort of played with the graph a little bit to characterise our PMTU walking, I wrote the code by doing a flowchart first so either this is right and I am showing it to you or it's wrong and we do it better in the code.

So in order to trigger a P -- SecSpider to be triggered to do a walk it has to receive a 3 successive time outs and it starts off with default buffer size of 4 K. The reason I call this default buffer size I failed to put citation this far up, recent result that one of the route named servers showed over 60 percent of the traffic it saw it DNS as 496 so we start with that as default. If that times out we try TCP initially to make sure that TCP works at all and then we start doing up a binary search between the lower bound which we call 515, DNS messages we presume will always fit because I don't think -- that is a presumption. And the upper bound of 496 so we do binary search to find the exact threshold without a truncation bit set or there is no size that will fit across that path, in other words right after we get a truncation bit, if we make the message any larger we get a drop. So, when we actually -- I should probably back up. One of the things that is actually pernicious about this problem is if you issue a query that exceeds the PMTU, you are happy, I can have a 4 K response. The name serve sees a query, says that is no problem, I can fit a message, sends it back and it gets dropped in the network, you don't get a truncation bit, you are not told to retry smaller size. You get nothing. And then you try 2 times and you get nothing each time because it gets dropped. It's a particulary nasty problem that bay so to know to do a walk F you get a truncation bit do a walk or TCP but at least you know something.

With one more ado, here is a citation I mentioned that said a reasonable default is probably 4 K. And so I think I said most of this. I am going to measure basically, what I am going to show you issy leanings 3 different things, how often does 496 work, like and then I am going to show you how often 4096 doesn't work but if you had done a walk or you had a crystal ball that is a size that would work. If you had just said 1.5 K or 2 K that would have worked. And then I will show you how often there just is no size like you just basically have messages that are too big to fit over a PMTU to one of our pollers. And so, spirit of where we are, I will show you results just serve a snapshot from our poller, very grateful to NetLabs hosted our very first and hosted for a long time and I will compare it to sew hoe router which as you might expect doesn't do as well.

Here are the 3 curves, the top one, the red one shows you, you know, here is just a number, you know it's not representative of anything, what it means is this is how many zones we at any given point, the X axis is date, we have June July July July July August August August. So this is essentially a two-month window. And you see most of the queries go through great, the green line down below shows you how often we need to initiate a walk but that we could actually find a size that fits and maybe remember it for next time or whatever and the blue line is the bad one, that says the zone is serving DNS keys that exceed the PMTU and essentially won't fit to my resolveer. So you can see our poller has trouble with about ten zones.

So here is a sew hoe router in Los Angeles. All queried at exactly the same time from all of our pollers. Here you can seat same number of zones have successful, it's hard to tell because it's log scale but 10 K and, you know, 100 zones need to be PMTU walked. Clearly very good connectively and the sew hoe router say this big problem and roughly the same number of zones are serving keys that won't fit. Without this PMTU walk we get this silent drop problem, I asked for a key, I guess there is no key. So clearly, it matters where you observe these things from. So here is a summary of all of our pollers, what they have seen. What you can see here the green bars are plotted against the Y 2 axis the one on your right and show how many times did each of our pollers need to initiate a PMTU walk, this is an absolutely number. Some a lot more trouble than others. The red bar shows what percentage of those walks succeed. There is a size that fits. So you can see, for example, poller number one it polls some number of times, almost doesn't matter but when does it PMTU walk almost 100 percent of the time it's successful finds a size would fit. Versus poller 6 much, much much more trouble. Issues a lot of PMTU walks and roughly half of them succeed, half of them don't fit at the same time that we poll from all these other pollers. One thing may come to mind: Is this just a few zones kicking off a problem. How many zones are actually causing each of these PMTU walks because this is a summary of what we have seen over time and that leads me to probably the most irritatingly complicated graph I have ever generated, I only keep it around because I think it's express sieve once I manage to explain it to myself and others. What I have here is each of our pollers rated along a separate curve and what you note is the X axis is the percentage -- is rank order of zones, but what it says is a zone for example at tick 20 for one poller or another has to issue PMTU walks 20 percent of the time. Versus a zone at zero never has to versus at 100 every time we query that zone we have to do a PMTU walk. On the Y axis you have what percentage the zones does this cover? So it's a cumulative distribution function on the Y axis. So for example, poller 0 the red curve that sits in the middle, what that essentially says is 70 percent of the zones plotted against the Y axis have to do PMTU walks roughly 20 percent of the time or less. So about 20 percent of the zones fit within that bucket. At the same time, poller number 6 the dotted line on the bottom what you can see is 60 percent of the zones coming across from the Y axis need to do PMTU walks 90 percent of the time, so that comes all the way up to this elbow. Or 40 percent of the zones have to do PMTU walks at least 90 percent of the time so from that tick to the far axis. And yes, that takes May while to wrap my head around so I would be happy to fill a question about that.

So this is a little more, to be more succinct in how to represent this. This is picking up from a metric that I have a citation for here which tries to quantify what the availability problem is. So along the X axis we have each of the zones and we have actually -- we haven't looked at our entire corpus, we have tried to look at zones that we call production and we only do that distinction because we want to try and prune out to zones that are testing, just because they may not be trying to make sure their keys are available. The way, it's not meant to be absolute ground truth just something meaningful. About 5,000 zones that fit this bill. What you can see is over on the left axis you get some zones whose availability dispersion is basically like some pollers can see me at the same time some have to PMTU walk to me. Gets about 20 percent. So what you want to have one is, means everybody can see me the same, all the resolvers that you are representing can see me just fine and 20 means one could see you most of the time and the rest of them had trouble getting to you. This is a metric that we have measured repeatedly over time and we use exponential weighting, if one day your zone had a bad day and got a bad availability score because you had an outage or something, it gets smoothed out over time. So this curve is particularly interesting when you contrast it to the same curve in the citation and unfortunately, it's interesting in a bad way.

So, question? So, I am going to do one more sort of interesting I think that we have observed. So here is this same type of graph I showed you a few slides ago for the two pollers except it's blown out over several years. What I draw your attention to is this: So here is is a big jump in the number -- there is actually a jump in the number of walks we need to do, the green, and the walks that did not work, the blue. So walks in green is not all walks; it's just the successful ones, the blue one is the scary one. So this is one of our pollers. I think this is the NL NetLabs pollers. This one wasn't born yet, nor was this one. Etc. And so forth. What I can say is from our apparatus what we observed is in September, there was a big problem with -- well, there is something that changed that cause a big spike in the number of PMTU walks and that, going back, what you can see is that some pollers started off with walks that worked and wound up with walks that didn't work so I am not exactly sure exactly what happened specifically yet but the green shows a concern, the blue shows a problem. And so, I am looking into what happened here, I am not sure I will be able to tell, although I should be able to do more forensics than I have had time to yet but it makes me want to get to the next slide as fast as possible which is what can be done now. I don't think this is in in way the end of anything, I don't know that I this shows evidence that the protocol doesn't work at all. I think it shows there are other constraints we need to take into account. One of the things I think can be done tactically, reg /STEB your zone at SecSpider and we will poll it. Each has a drill down page so you type in your zone name and brings up your page. It will show you the availability that we see to your zone. It will show you, you can see a bit of stuff here, I encourage you to poke around and that way you know. I am not sure exactly how else you would be able to tell if you had an availability problem because if you have to literraly pull from a bunch of places and it's not that that is beyond your capability; it may be beyond what you intend to do. This way you don't have to. Sign up with a monitoring service and let us help you monitor. That is the big message here. If there is any issues, whatever, we are absolutely more than willing to help out. Shoot us an e-mail and we are very responsive. More strategically, I don't think that I am the right person to say here is the silver bullet, if you are using SecSpider to monitor your zone and you want to try different configurations that is an interesting experiment, that can teach us a lot about what works and doesn't. I think we can use the results that have sort of investigation to say what may be best practices ought to be, whether there is some extra considerations that should be put into those documents, and potentially even incorporate this notion when we actually issue you know best practice documents that there is a problem about this sort of stuff and what you should look out for.

So, in summary basically, we have this notion of availability dispersion that I have sort of quantified, mostly just for illustrative purposes and I think what I really want to under score, I think this is only really possible with a distributed monitoring S I really think that this is kind of thing, I am not saying me but I am saying for this deployment, it's imperative to have something like this where you can, for example, see that spike I showed you. I don't think even the zoners, I am not sure who they are, but I think not even they may realise this has happened. Right now the deployment is manageable enough if there is a problem we can get out in front of us. It's before there is a wide scale adoption in which people of all practice has led to a wide scale outage and people are scared to deploy. I think this is the right time to find a problem and wrap our heads around it, that is good thing to have caught now. Here are my references and they are available on the web so you don't have to write them down. And that is it.

CHAIR: Thank you.

(Applause)

Are there any questions at this point?

AUDIENCE: From what I saw, it seems that most of these problems are somewhat closer to the core than closer to your probes, is that a correct interpretation of your data or would you say that some of the problems could actually be very close to the probe instead of the core of the network?

ERIC OSTERWEIL: I think it's probably a lot of times closer to the probes, I tried to highlight that with comparison of the NL NetLabs. There are certainly some that are problematic in the core and I think --

AUDIENCE: I should be thinking of the November -

SPEAKER: I am not sure, I wish I had spent more time investigating it, I actually worry it was someone, some key set that became too large. I suppose could you say for the core or maybe just all -- sorry.

AUDIENCE: Have you tried switching the SoHo router in front of the probe to see what happens, it was SoHo router that --

SPEAKER: As opposed to the carrier network?

AUDIENCE: Yes.

SPEAKER: No, I hadn't tried that.

CHAIR: Peter?

Peter: Good stuff, very interesting.

SPEAKER: Thank you.

Peter: I guess you admitted upfront this is probably about ethernet zero than DNSSEC, and DNSSEC is just giving the opportunity to make these mistakes and providing the huge payload. From a service perspective I would agree with you that the zones are making the problems but the real problems is of course on the way between the probe and one or more of the name servers responsible for these zones.

SPEAKER: Yes.

AUDIENCE: First question would be, have you had the chance to investigate the particular name servers and it could be the case that a single name server or small set is responsible for a large number of the problems. Second question: Did you or did you do or consider doing similar measurements outside of the -- sorry, in the production DNS outside or parallel to DNS because it might that be many of these zones are kind of experimental and, therefore, the servers are in kind of a different state than usual production stuff would be. SPF or D kem records would suffer from the same problems.

SPEAKER: You are absolutely right. To underscore something, I think you said as well; this is not meant to pick on DNSSEC at all, it's exacerbated a problem that I believe existed anyway. It needs these messages more than a lot of protocols have, D P F has support requirement. I knew I was going to forget your first question. I actually --

AUDIENCE: Whether you had the chance to check whether a particular name server or groups was responsible for most of these problems.

SPEAKER: Yes, I am not sure if you mean like a software version or if you mean an actual IP address.

AUDIENCE: Probably the position of that name server is somewhere in the network topology that exploits or leads to the fact that the path PMTU doesn't work somehow.

SPEAKER: I have that data, I don't have a process but I have the act to dig into it. If you want to send me some -- - don't mind dropping me an e-mail saying this is what could be good. I do have that information.

AUDIENCE: I hope you will still be here on Thursday so we could have a small discussion in the DNS Working Group then.

SPEAKER: Absolutely.

(Applause)

AUDIENCE: From AfriNIC. The comment: Thank you for this presentation that illustrates the need for a non ICMP based PMTU discovery. It is really needed and not only for DNSSEC. Every time you have IPv6 in IPv4 we need just robust methods. And the question is were you inspired or are you implement ago flavour of RFC 4821 which deals with robust methods --

SPEAKER: I am agnostic. Basically I observe this problem kind of from first principles so I didn't actually -- I understand that there is actually a long history of people with opinions in DNS about PMTU and I haven't gotten involved in that.

AUDIENCE: This is an opportunity to kick that off and put maybe more tools because the RFC has been existing for ages now and it is really very poorly implemented so if DNSSEC is pretext or occasion or an opportunity to do it, let's do it.

SPEAKER: I will admit my naivety, I should read the acrticle.

CHAIR: George Michaelson from APNIC. Geoff Houston and myself got into this research expert of deliberately clamping the interface MTU below the v6 tunnel boundary so that the MSS was guaranteed to fit under at least one and very probably two v6 tunnels. We were purely observing a problem to do with v6 tunnel reliability. This caused a lot of upset to people in various communities at least two of them expect these meetings and I expected to get some flack from them but I observed we did this on all our NS hosts, I have seen no degradation or drop, if anything I have seen an increase although I can't demonstrate that yet. Do you have any comment about the advisability about high availability servers clamping on the host, not on the link, on the host and just short-circuiting this problem. The only down side I see I wonder if I have forced traffic to TCP, I don't think so but I am not seeing a drop of service.

SPEAKER: Yes, honestly from my perspective I think it's hard for me to have an opinion about that because my position that looks like second order, for me the real interest is what should be done in order to have that window actually work properly, not so much about what the -- what the protocol can do to actually engineer that properly as opposed to what can we do assuming the protocol doesn't get it.

AUDIENCE: True, I see the difference between what in a deployment space we have elected to do and what you in this measurement space are addressing but the fact remains you have as a host control over your released maximum defragmented size and you can short-circuit this problem.

SPEAKER: Yes, honestly I am agnostic but for the resolvers that can get a 4 K packet through and that saves them, there are case where is these packets come bundled through with signatures and keys and lions and tigers and bears and helps with NS authority. If you restrict that the name server may strip those out but that may require an additional query. So yes, I believe that that could potentially be useful or be a lesson. One other thing, I was asked if I had paid attention to production zones vs. Cruft zones, we did try to so we have a simple test to focus on which zones we call production. In order prune out zones that are clearly operating at test capacity, we catch some testing zones by mistake and exclude some but we do try and do a loss pass filter to try and isolate that. Thank you.

CHAIR: Next talk is on the topic of -- by Ethan Katz-Bassett.

ETHAN KATS-BASSETT: I am from University of Washington and I am going to be talking about our system to measure the reverse path back to you from an shall tree destination without requiring control of the destination. This is work with collaborateers at University of Washington California, San Diego and Minnesota.

Traceroutes are the most used diagnostic tool today, to check it a destination is reachable, if it is you can see what path is taken. If it's not reachable you can try look at where the path seems to be broken. You can use traceroute to troubleshoot problems if a path seems to be higher latency, longer than necessary, maybe there is out of date local pref. At University of Washington, we map the Internet that, predicts performance and compares ISP performance and I was at RIPE last year in Berlin talking about my system Hubble.

What I realised with Hubble, was restricting my ability to tell what was going on. So if we consider the communication between me and the top left and web serve down at the bottom there is some path that traffic takes from my computer to the web server and in this case the purple and there is some path that traffic takes from the web server back to me and in this case the red path and generally paths on the Internet are assymetric and different. Measure the path from you to anywhere, just measures the path from you so here it can measure the purple path did you it doesn't give me a way to measure the red path. I would need to run traceroute from the web server. So with Hubble this meant that if the problem was on the forward path could I get an idea of what was going on. But I was basically flying blind if the problem was on the reverse path. I am really only getting part of the story and that was limiting my work. And similarly it, seems to limit what we can do operationly. So I was at NANOG in January and there was a tourtorial on troubleshooting problems "the number one go to tool is traceroute, is path A requesting "completely invisible. So my -- some of my colleagues are working with a large content provider to try to troubleshoot performance problems that that -- some of the clients that have content provider are having so they found that 20 percent of the clients were experiencing severely inflated litencies even though there was a serve nearby. Some clients in Taiwan were experiencing poor latencies of over 500 milliseconds even though they were being redirected to a data centre in southeast Asia, they suspected circuitous reserve path and I have their conclusion up there. In order to more precisely troubleshoot problems they need the ability to gather information about the reverse path back from clients to their nodes.

So we are left with wanting the reverse path information in multiple instances and in fact I can it's useful in many cases. But in order get it we actually have to run traceroute from the destination we care about so generally we don't control those destinations. There is a couple of things, we can use public traceroute servers only a couple of 100 of those in the world. Maybe we can mail a mailing list and ask for help but obviously that is not a very scaleable approach. We can assume symmetric routing and but moth paths on the Internet are asymmetric we end up using them because we have to not because we want to. Reverse traceroute that works without control of the destination. We can't just use TTL limiting like with traceroute because the destination is going to set TTL value to normal value and even if there was a TTL expired membersage along the path that would go to the destination and we don't control the destination so we wouldn't see T instead of using TTL use IP options. IP options are generally reflected in the replies so they are going to work over both forward and reverse path. The first is the record route option, record route will record the first nine routers along the path if the destination is within eight, then all the remaining slots will be filled out on the reverse path. So this is great if we are within eight but the average Internet path is about 15 hops in both direction, 30 round trips this is only going to work if we happen to be near the destination we care about. The second option that we are going to use is the time stamp option. Time stamps let's us specify up to four IP addresses and they are ordered and each will record its time stamp if it's traversed in order. So the key is that the time stamp cheques for the routers in order, so router is only going to stamp if it's the address in the next open slot not if it appears further on F we ask for the destinations time stamp then any stamp after that will have to have been on the reverse path. So in particular, what we can do if we want to figure out if a route certificate on the reverse path back from D, we can ping D and ask for D's time stamp and the router R's time stamp F we see a time stamp from R we know it's on reverse. What this let's us do is use existing Internet maps to see what adjacent possibilities and we can use time stamp to check which one it is. The problem here is that time stamp is filtered all over the place and it's very limited to deployment. So this ends up being great if we happen to be near enough to use record route or if we are interested in a destination that supports time stamp.

We now have a couple techniques that can sometimes measure reverse hop only useful close to the destination. Use distribute advantage points and source address spoofing in order to extend the coverage that have. In this picture you see my computer up there, I am trying to measure the web path from the web serve back to my computer and another computer at the bottom. Is going to let us use advantage within 8 hops. Let's say that that other computer is within 8 hops, what we are going to do is have it ping the web server with record route enabled but it will spoof the address and say it's coming from me. (Say it's). The packet will get to the web serve, it will think it came from me and reply back to me along the red path and so it will be able to measure the red path without having to probe the forward path towards the web serve. So we can now measure rehearse hop back from anywhere as long as we have some vantage point. I said we are going to use spoofing. We are well aware of the security issues but we think it's feasible to employ it as a measurement tool with limits on T using restricted version just on measurement test beds and we will have a list of who we are able to spoof as only ever going to spoof as nodes that we can control, we are always going to be accountable for the packets that we send. We are basically using the spoofing as a reply to address getting the reply to go to different one of our vantage points. Rate limits and restrict the destination that is we probe. So between the rate limiting and only spoofing other nodes we can control this will leave little with in common with nefarious uses, it's sort of more like the address rewriteing that NATS might use, when we are trying to control the traffic to a cooperative node we have. We have sent millions to at least tens of thousands of destinations at this point. We have received no complaints. A small number of spoofing vantage points are able to support our entire system. I will use 30. You can filter all other locations and give us a few sports at few sites and that is enough to support the system.

As I said before IP options and generally sent to the route procertificates, some filter or ignore them in order to reduce the burden and potential for abuse so we look at how much support we see of options so Planet Lab is is a test beds with hundreds of computers around the world and we use that to assess it. Issue set of IP addresses saw how many of them supported the options. So we found that 58 percent of the IP addresses were within 8 hops of some Planet Lab node, more than half are close enough to, get a reverse hop. Last year Rob Sherwood presented a study where he looked at recorder route support, and about 9 percent don't record but don't drop the packet, pass it on without recording.

We also looked at coverage of time stamps, we found about 37 percent of IP addresses gave valid time stamps, additional 18 percent replied but with a time stamp of 0, not useful in our case but not dropping or filtering the packets. We looked at he at the top 100 ASs and found 61 of them gave valid time stamps for most of the routers, the rest dropping the packets or getting filtered along the path. At least some supportive options. Neither record route nor time stamp is a complete solution but the combination with spoofing ends up being pretty powerful. Now we have a way of using IP options to measure a hop back from the destination, we need a way to make a path out that have and build the bath incrementally by exploiting the fact that most routing on the Internet is destination based. This means that once we get a certain hop along the path the next hop tends to depend only on the destination not on the source or path so far. If we know the D goes through router R we need to determine the path back, we needn't stitch it together. This assumption always hold but holds often enough for to us base system around it. So here, we have a source S, trying to pressure the reverse path from destination D back to S, V1, V2 and V3 around the world I can use. Some of them can spoof. The first thing I am going to do is issue standard trace routes from vantage points to the source, if I intercept one of these paths, I am going to assume path I care about follows it back. Going to find some vantage point that is within 8 hops, in this case let's see V3 is. V3 will then send a record route ping to the destination but spoof and claim it's coming from S so when that packet gets to D, some number of the hops will be filled out let's say 7, D will add itself in and reply to S. So when the response gets back to S there is now reverse path information in it, we know R one is on the reverse path and it looks something like this. We can now repeat that process, we will assume destination based routing so we have to get the path back from R 1 and let's say V2 is within eight hops of R1 we will send a record route from V2, and this time when it gets back to S say we found two new hops R2 and 3, path looks like this. Let's say none of our vantage points is within 8 hops so we can't use record route. This is when we use time stamp. We will look at established map of the Internet, we just use one of them. And we will look at all the routers that are adjacent to R 3. These are the possibilities for the next hop. Let's say R 5 and test them one at a time. So we are going to send a time stamp ping from the source to R3 and we will ask for R3 and then R4 time stamp, if we see R4 that means it was on the reverse path because R3 had to stamp first. R3 stamps, we get the response back. There is a stamp for R4 we now know it is on the reverse path so it looks like this. We have interpreted known path we have going to assume it's going to follow V 1's blue path. Looks like this. So, the techniques combined together to let us build up this path incrementally.

So I have given you the technique now and we can look at how often it works. To do this I chose 200 random destinations around the Internet and I used those as the destinations, the sources were eleven PlanetLab sites scattered around the world. If in a particular case we are unable to get a complete path we will issue a standard trace routed from the source to the destination, we will assume that the last hop is symmetric and try use reverse traceroute if, that doesn't work we will assume last two are symmetric and so on. The metric captured in this graph is the number of hops we have to assume are symmetric, so on the X axis we have that, the number of hops. It's a CD F so the Y axis is the cumulative fraction of source destination and the easiest way to explain it is point out the highlighted points. We see that just over 40 percent of the cases, the purple point there, we are able to -- we have a value of 0 which means we didn't have to assume any symmetry, able to measure complete recertificates path. In the median had to first assume was symmetric and assume reverse rest of the way back.

I have just shown we have can often measure reverse path would like it to be accurate and one measure is whether it's going to give you the same thing you would see if you had control of the destination and measured a traceroute directly. So we are going to use that. We lack at ground truth -- we are going to do is use PlanetLab as destinations this time and when we use that for the reverse traceroute, we will assume we don't control the destination and just use the techniques that I presented in this talk and we will compare that to a traceroute when we do control the traceroute. In terms of how often the technique worked it was almost identical to the previous data set and what I am capturing how similar, the one where we do use the destination and one where we don't. So on the X axis we have the fraction of hops along the measured traceroute that we saw on our reverse. Had is is a CCDF so to explain it, I will point out a couple of points. The black line is how well you will do if you assume path is symmetric and in the median case we see you get 38 percent of the hops. The receipt line is R technique and in the median case there we see 87 percent of the same hospitals that you would see if you issued a direct traceroute. (Issued a). One problem is that it's hard to know if two hops are actually the same when we are trying to compare paths, we know we are doing pretty as well. It's not clear why there is the gap. Traceroute will generally give you the incoming interface on a router whereas record route will often give you -- it's hard to know which ones belong in the same. We borrowed existing techniques they don't always work. So some of the gap between the route line and an ideal value which would be just one, might just be because we don't know hops are the same. To look at that, in addition to looking at if we were getting the same router we looked at whether you were getting the same POP and here I mean given AS in given city and we identify POPs based on the DNS names. So the blue line captures how well we are doing in terms of POP similarity and there we see in the median case we are able to identify 100 percent, all of the POPs that would you see on directly measured traceroute. It seems a bunch of the gap there we don't know which routers are the same. It's also possible the traceroute and reverse traceroute both see correct paths but different and that could either be because they are getting load balance differently or there could have been a path change in between when I made the two measurements. So I looked at that by issuing traceroutes one day and the next day and that was about the granularity between the traceroute measurements, I saw about 26 percent of the trace routes change. So we are sometimes missing a few hops it seem like they are doing pretty well, either we can't identify which IPs are on the same router and also because path changes are load balancing or slightly distorting the correct path and this is even considering those cases when we had to assume a couple of hops of symmetry. There might be some other reasons why there would be differences between what we'd see and what traceroute would see. There is hidden routers, about 16 percent of IP addresses turn up in record route but not in trace, a similar percentage in traceroute but not record route. There might be instances when our destination based assumption doesn't hold, for instance if we probe into the middle of a tunnel we might see weird routing and I am not aware of any case but maybe some of you will tell me packets with option are handle differently than other packets.

So I have shown that our technique works pretty often and can give you very similar value to if you can measure traceroute for the destination. I am going to walk through and this is motivated by that original example on the beginning when the content provider wants to be able to debug. Measured round time of 150 milliseconds to an address in Seattle and that is 2 /#30R times what you might expect. With current practices you might issue a trace routed towards the destination and check and see if it's indirect. And hopefully you can see the traceroute starts in Florida, goes up to D. C. And comes back down to Florida before continuing across the country Washington. So that big detour up through DC wastes abouts 50 milliseconds and that explains some of the latency inflation that but doesn't explain between 9 and 10 from 53 to 149. So with traditional pools we'd really only have a partial explaining of going on. If you looked at the fifth you will see there is inflated litency of 1 -- in this case because that doesn't carry on through to the 6th top it's not affecting the round trip. With our tool you can supplement that traceroute with is he verse and look for indirection along the reverse path. That is what I did. Issued the reverse traceroute it, starts in Seattle, goes down to LA, comes back up and proceeds across the country through Chicago and Virginia, that does explain the rest of the latency inflation that we saw. If you look closer you will see what is happening it's going down to LA on InterNap, switching over it transit rail and coming back to Seattle before continuing across the country. So this is not necessarily problem. It might be the case that InterNap and transit rail only peer in LA and so it has to go down there, we are able to verify with a traceroute they peer in Seattle as well. And so maybe it's some out of date information that is going on and talk to operators from those ISPs and verified this was an unintentionally inflated path. Without our tool there is really no way to understand what is going on or who you might want to contact to fix it. To summarise:

Traceroutes are really useful tool, I use it all the time in research. But it doesn't provide the reverse path. Our reverse traceroute technique fixes this limitation and provides complimentary information to what you get to traceroute. It's able to give you most of the same hops would you see if you were able to issue traceroute from the destination but without -- I think it's really useful for troubleshooting and give you more of a complete picture. If you look at the slides that are on-line there is some extra slides at the end of the talk and I would be happy to talk to people about them.

I have presented a technique that we think could be very useful and we really want people to use it so I would like to conclude by briefly talking about that. We are building a downloadable tool, doing internal testing right now, in June we plan to put autopsy website that will let you use reverse traceroute tool to measure the path back from any destination to our sources, once we have tested that a bit we are going to go public with full downloadable allow you use our traceroute system and to measure paths back to your source from any destination that is you care B you can e-mail me, or talk to me while I am here if you would like to be an early user that have and we would love you to be able to use it in your work. Also, the coverage of the system is tied to the distribution of vantage points that we have, and so we'd like for people to, if anyone wanted to host a vantage point for us. It would be similar to host ago public traceroute server, equivalent load, we are going to be the only ones who are talking to it and that will help everyone get pather paths. So, if we are developing a code right now, spoofing safety admission control and please contact me if you are interested in doing that. Thanks, and I'd love to take any questions or ideas that you might have about this or applications for what we should be looking at.

CHAIR: Thank you very much.

(Applause) are there any questions for Ethan

Bill: So did you separate out your success rate on number of hops V success rate on correctly identifying hops?

SPEAKER: So, are you saying is the reason that we don't do well in some cases because we are not identifying all the hops?

AUDIENCE: Or you are identifying too much hops and trying to identify hops that don't exist or whatever.

SPEAKER: Put not guilty extras hops. The one part I looked at is the first graph that I presented was how many hops do we have to assume that we can't directly measure and if you just look at the paths we don't have to assume any or one, then we do better than that, 87 percent, I think we get 93 percent of of the same hops. I haven't looked much at this question of whether we are -- whether extra hops in the middle. You sometimes see that even if you are measuring a forward path, about 16 percent are going to round that don't respond ton traceroute. I haven't looked specifically at how many ones.

Shane: You documented some of the limitations of the protocols, I remember quite a few years ago someone had the idea to actually introduce a traceroute that records the full route as an ICM 6 option, have you done any protocol design or document which would you like to see so you don't have to do this kind of heavy-handed way of getting reverse traceroute

SPEAKER: I haven't specifically looked at that. I know a number of people have with IPMP and a proposal for v6 recorded route but that is not something I have specifically looked at.

AUDIENCE: OK.

AUDIENCE: If I understand how the tool works, it seems it can be used also for quite interesting survey which is a deployment of PC B 38. You said in practice, it's very -- you can spoof IP address most of the time; you didn't meet a lot of ISP where it was prevented, filtered out.

SPEAKER: So, we from PlanetLab, about 20 percent of the PlanetLab nodes, there is a sum of 100 we tried, about couple of 100 were able to, the M I spoofer project is looking at that and they found similar results, about 20 percent across the Internet were able to test, were able to spoof. There was a discussion on NANOG, it was probably August or September that was prompted by preliminary version of this work that I gave and there was some discussion, some operators thought that the 20 percent was actually very low. There didn't seem to be too many people who thought it was high. But yes check out the M I spoofer project if you want more information about who is filtering in.

AUDIENCE: Google. I am just curious, in forwarding past traceroute you even reach ECNP nodes and it's kind of difficult to process data and on return you should also be hitting ECNP inside -- how you identify parallel paths and what is your observation on number of parallel paths you may encounter from source A to B.

SPEAKER: Right. I haven't done -- I would say this was a preliminary study and so I have seen that we are measuring parallel paths. I haven't looked that much at how many were seen. I think that a promising addition to this work that is something that I want to look at, since we are using multiple advantage points anyway we can make redundant probes and try to measure the possibilities and use that to identify other, if we see different routeing from one vantage point, our destination assumption is breaking down. Certainly if you look at the probe data you do see the parallel paths but I haven't assessed it in detail yet.

AUDIENCE: I would just suggest to you to collect some statistics because apparently, the number of parallel paths that you incur in contemporary Internet is quite high.

SPEAKER: It certainly is. I think that the ability to measure any path that might be the right one is is a good step and generally even if you are seeing parallel paths they are -- might not abone that a particular data packet traversed but they will give you a good idea of what is going on.

CHAIR: Thank you very much, Ethan. This was the last talk for the session. It's time for the coffee break. All newcomers might be interested in going upstairs to the balcony over the winter garden where the RIPE NCC is hosting meet and greet newcomers coffee break where we are asking all the RIPE Working Group chairs to go and talk to people who are attending the meeting for the first time. So see you there.