Access to Public Information

Access to Public Information
By Dr. Neal Krawetz

I am a huge fan of the Internet Archive (archive.org). They have a fundamental belief that information should be retained. Unfortunately, other organizations seem to be intentionally blocking access.

About Time

When I first spoke with the Internet Archive’s Brewster Kahle, he described their service as providing another dimension for the web: time. Google, Bing, Yahoo, and other search engines are great at showing you what is online right now. However, they don’t show you how a web site has changed over time. This is where the Internet Archive comes in: they record snapshots of web sites so you can see what it looked like on a specific date.

As an example, Facebook often changes their terms of service. It’s easy to see their current terms of service. But what did it look like last year or two years ago? This is where the Internet Archive’s Wayback Machine comes in. They occasionally mirror Facebooks terms of service, allowing you to see every version going back years.

Beyond their Wayback Machine, they also have “collections”. Unlike the web mirrors from the Wayback Machine, the Collections are groups of files uploaded by hundreds of people. For example, I had my own mirror of North Korea’s Flickr account — made shortly after Anonymous compromised the DPRK’s Flicker account and hours before DPRK deleted their account. If I keep these files to myself, then they’ll be safe but unavailable to anyone else. So instead, I created a Collection at the Internet Archive and uploaded them for public access. Now anyone in the world can see pictures from North Korea’s (now deleted) Flickr stream — both the official pictures and pictures uploaded by Anonymous.

Filtering

I run a web service called RootAbout. This is a search-by-image service that has indexed the Collections at the Internet Archive. (It does not yet index the Wayback Machine; only the Collections.) Although it searches for pictures at the Internet Archive, it doesn’t mirror their content. RootAbout depends on connectivity to archive.org in order to retrieve metadata and thumbnail images.

Over the weekend, my RootAbout server began sending me alerts, informing me that connectivity to archive.org was down. I received one alert on 2016-12-24, a handful on 2016-12-25, and a complete lack of connectivity on 2016-12-26.

Fortunately for me, I’ve been working on a new traceroute system (same general purpose, very different method). This allowed me to rapidly identify the source of the blockage: Comcast. Specifically, te-0-2-0-26-pe02.910fifteenth.co.ibone.comcast.net (50.248.118.157). Here’s one way I identified it:

  • Identify the target. This is a simple hostname look-up: archive.org has address “207.241.224.2”. I repeated this from a couple of different locations on the Internet, just to make sure it wasn’t a DNS issue.
  • Traceroute. I performed a traceroute to archive.org from my RootAbout server:

    $ traceroute archive.org
    traceroute to archive.org (207.241.224.2), 30 hops max, 60 byte packets
    1 ip-65-183-76-61.rev.frii.com (65.183.76.61) 0.442 ms 0.410 ms 0.363 ms
    2 * * *
    3 * * *
    4 * * *
    5 * * *
    6 * * *
    7 *^C

    The “* * *” means it timed out. Using Wireshark, I see no reply at all. This means that there is no connectivity right outside of my hosting provider.

  • Varying addresses. Generally, when connectivity goes down, it goes down for an entire subnet. In contrast, filters are often applied to single network addresses. The Internet Archive has a very large network range (207.241.224.0 – 207.241.239.255), but their “archive.org” web server is only located at one address in that range. I decided to test other addresses in that range. Since archive.org’s IP address ends with “.2”, I decided to test “.1” and “.3”.

    $ traceroute 207.241.224.1
    traceroute to 207.241.224.1 (207.241.224.1), 30 hops max, 60 byte packets
    1 ip-65-183-76-61.rev.frii.com (65.183.76.61) 0.393 ms 0.352 ms 0.304 ms
    2 te-0-2-0-26-pe02.910fifteenth.co.ibone.comcast.net (50.248.118.157) 2.726 ms 2.709 ms 2.756 ms
    3 hu-1-3-0-3-cr02.denver.co.ibone.comcast.net (68.86.84.161) 3.596 ms hu-1-2-0-5-cr02.denver.co.ibone.comcast.net (68.86.86.125) 4.133 ms hu-1-3-0-0-cr02.denver.co.ibone.comcast.net (68.86.83.5) 3.424 ms
    4 be-10817-cr01.seattle.wa.ibone.comcast.net (68.86.84.206) 29.929 ms 29.921 ms 29.884 ms
    5 hu-0-10-0-1-pe05.seattle.wa.ibone.comcast.net (68.86.88.126) 27.845 ms 27.838 ms 27.804 ms
    6 as11404-1-c.seattle.wa.ibone.comcast.net (23.30.206.34) 29.863 ms 30.063 ms 30.032 ms
    7 cr2-sea-b-te-0-0-0-9.bb.spectrumnet.us (174.127.140.158) 28.229 ms cr2-sea-b-te-0-0-0-8.bb.spectrumnet.us (174.127.140.154) 29.098 ms 29.312 ms
    8 cr1-529bryant-te-0-0-0-18.bb.spectrumnet.us (174.127.140.146) 38.349 ms 38.568 ms 38.576 ms
    9 cr1-200p-a-hu-0-7-0-21-0.bb.as11404.net (192.175.28.143) 39.250 ms 39.527 ms 39.491 ms
    10 agg2-200p-a-te-0-0-0-3.bb.spectrumnet.us (208.76.184.46) 38.693 ms 38.779 ms agg2-200p-a-te-0-0-0-5.bb.spectrumnet.us (208.76.184.50) 38.739 ms
    11 archive.org-BE.demarc.spectrumnet.us (208.76.187.90) 38.493 ms 38.505 ms 38.470 ms
    12 207.241.224.1 (207.241.224.1) 38.767 ms 38.749 ms 38.714 ms

    and

    $ traceroute 207.241.224.3
    traceroute to 207.241.224.3 (207.241.224.3), 30 hops max, 60 byte packets
    1 ip-65-183-76-61.rev.frii.com (65.183.76.61) 0.379 ms 0.339 ms 0.295 ms
    2 te-0-2-0-26-pe02.910fifteenth.co.ibone.comcast.net (50.248.118.157) 3.952 ms 4.004 ms 4.117 ms
    3 hu-1-2-0-3-cr02.denver.co.ibone.comcast.net (68.86.84.61) 4.278 ms hu-1-3-0-3-cr02.denver.co.ibone.comcast.net (68.86.84.161) 4.811 ms hu-1-3-0-5-cr02.denver.co.ibone.comcast.net (68.86.84.169) 4.777 ms
    4 be-10817-cr01.seattle.wa.ibone.comcast.net (68.86.84.206) 29.870 ms 29.849 ms 29.811 ms
    5 hu-0-11-0-1-pe05.seattle.wa.ibone.comcast.net (68.86.88.154) 27.909 ms 27.892 ms 27.919 ms
    6 as11404-1-c.seattle.wa.ibone.comcast.net (23.30.206.34) 27.886 ms 27.935 ms 27.926 ms
    7 cr2-sea-b-te-0-0-0-9.bb.spectrumnet.us (174.127.140.158) 29.397 ms cr2-sea-b-te-0-0-0-8.bb.spectrumnet.us (174.127.140.154) 28.120 ms cr2-sea-b-te-0-0-0-9.bb.spectrumnet.us (174.127.140.158) 28.097 ms
    8 cr1-529bryant-te-0-0-0-18.bb.spectrumnet.us (174.127.140.146) 38.419 ms 38.407 ms 39.744 ms
    9 cr1-200p-a-hu-0-7-0-21-0.bb.as11404.net (192.175.28.143) 39.616 ms 39.439 ms 39.397 ms
    10 agg2-200p-a-te-0-0-0-2.bb.spectrumnet.us (208.76.184.44) 39.040 ms agg2-200p-a-te-0-0-0-7.bb.spectrumnet.us (208.76.184.54) 39.703 ms agg2-200p-a-te-0-0-0-2.bb.spectrumnet.us (208.76.184.44) 39.022 ms
    11 archive.org-BE.demarc.spectrumnet.us (208.76.187.90) 38.631 ms 38.623 ms 38.589 ms
    12 wbgrp-registrar.us.archive.org (207.241.224.3) 38.811 ms 38.809 ms 38.802 ms

    This is a clear indication that the filtering is specific to one IP address (archive.org, 207.241.224.2) and not a generic network routing issue.

  • Decoding Hostnames. Many network provides embed informative strings in the hostname. In this case, te-0-2-0-26-pe02.910fifteenth.co.ibone.comcast.net is extremely informative:
    • “comcast.net” identifies the provider as Comcast.
    • “ibone” identifies Comcast’s Internet backbone service. (This is different from cbone, which is for Comcast-to-Comcast routing.)
    • “910fifteenth.co” identifies the building. This router should be physically located at 910 Fifteenth Street, Denver, Colorado. According to Google Maps, that’s in the heart of downtown Denver. In that building are a couple of big colocation providers, including Level3, CoreSite Denver, and Massive Networks Denver.
    • “pe” typically denotes a peering service. Peering is one of those big net neutrality debate issues. Companies typically pay a premious for a high speed dedicated peering connection that bypasses most of the Internet.
  • Other Routes to Archive.org. I tested traceroute from sites that could connect to archive.org. This includes my office, a coffee shop, and a friend’s home network. They could all reach archive.org (it doesn’t look down to them). However, traceroute showed that none of them went through the same Comcast router (50.248.118.157). Instead, they all followed the same route starting at 68.86.84.206 (Comcast in Seattle). This is consistent with the previous finding, that Comcast’s router in Denver (50.248.118.157) is filtering the network traffic.
  • Other Routes through Comcast. I spot-checked connectivity from my server to other online services. Facebook, Google, Bing, Slack, and many others were working without issue. Then I got the idea to see who else uses Comcast’s router in Denver…

    $ traceroute nih.gov
    traceroute to nih.gov (54.235.145.223), 30 hops max, 60 byte packets
    1 ip-65-183-76-61.rev.frii.com (65.183.76.61) 0.478 ms 0.437 ms 0.368 ms
    2 te-0-2-0-26-pe02.910fifteenth.co.ibone.comcast.net (50.248.118.157) 2.543 ms 2.593 ms 2.507 ms
    3 hu-1-2-0-2-cr02.denver.co.ibone.comcast.net (68.86.84.177) 4.450 ms hu-1-2-0-4-cr02.denver.co.ibone.comcast.net (68.86.86.113) 4.408 ms
    hu-1-2-0-5-cr02.denver.co.ibone.comcast.net (68.86.86.125) 4.350 ms
    4 be-11724-cr02.dallas.tx.ibone.comcast.net (68.86.84.230) 16.845 ms 16.794 ms 16.734 ms
    5 be-12441-pe01.1950stemmons.tx.ibone.comcast.net (68.86.89.206) 16.304 ms 16.259 ms 16.581 ms
    6 50.242.148.102 (50.242.148.102) 17.284 ms 16.171 ms 16.095 ms

    There it is, at hop #2. As far as I can tell, the connection path from my server to most .gov sites route though 50.248.118.157. This would appear to be someone trying to filter access to or from archive.org for some gov sites, and my server happens to be seeing the filtering from the other side.

As a regular user on the Internet, you’re probably not taking this same route and will probably never see the filtering. However, if you have a hosting provider that uses Comcast, then you might be seeing this filtering.

I reported these observations to the Internet Archive. While they haven’t made any official statement yet, they did confirm that I’m not the only one seeing this.

Why Filter?

I’ve been scratching my head, trying to figure out why someone would want to filter one specific address at archive.org. It isn’t like this address is used to attack the Internet. For example, the Internet Archive runs lots of bots, but they use different addresses than this service. As far as I can tell, the filtered address is only used for the web server at “https://archive.org/”. They are blocking access to archive.org, and not from.

While I have nothing authoritative, I did come up with some plausible scenarios:

  1. Accident. Someone edited a configuration file and accidentally pasted in the wrong network address for filtering.
  2. Deliberate Content Filtering. A month ago, the Internet Archive made a big announcement: they are building a new backup facility in Canada. This is in direct response to the new President-elect, who has repeatedly mentioned filtering access to information on the web. This filtering could be a first attempt to restrict access to information on the Internet. (He’s not even President yet, and we’re seeing content filtering.)
  3. Deliberate but Wrong Filtering. A week ago, the Internet Archive announced a new priority: mirror content from government sites. Donald Trump believes many baseless conspiracies, including unjustified beliefs about Muslims, China, Russia, and climate change. There is a strong belief that Trump will impose a revisionist history — either removing or altering government documents that include proven facts but run contrary to his unproven beliefs. By mirroring these pre-revisionist documents, the Internet Archive ensures that publicly funded research will remain accessible and unaltered.

    With my traceroutes, I have demonstrated that this router at Comcast sits on a path from archive.org to government sites. Thus, this filtering could be an attempt to prevent the Internet Archive from mirroring these public documents. (If there are no public copies, then who can claim that it was revised?) If this is the reason for the filtering, then the filtering was implemented wrong; they banned access to a web server that doesn’t do the mirroring, and didn’t ban the IP addressed used by archive.org’s mirroring bots.

  4. DoS. Maybe someone else wants to deny access to archive.org. All it would take is for someone to poison some critical routers and block network connectivity. But if this is the case, then they really screwed up. They blocked access for a minority of services on the Internet, and not for most users.

There may be some other reason, but these are the only ones I could think up that match the observed blockage.

If the filtering changes, spreads, or is removed, then I’ll update this blog entry. Meanwhile: “Hey Comcast! Why is your router at 50.248.118.157 filtering access to the Internet Archive?”

December 27, 2016 at 06:12PM
via The Hacker Factor Blog http://ift.tt/2hKGUf8

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s