API from a Cloud/Webhosting Provider?

Avatar image for rick
rick

507

Forum Posts

33

Wiki Points

0

Followers

Reviews: 1

User Lists: 1

#1  Edited By rick

After auditing our traffic over a few days I've found that there are A LOT of servers running in Cloud Providers, Web Hosting, Co-los, etc. hitting us with scrapers and heavy API usage.

My question to you is are you running something in say AWS EC2, Azure, Heroku, Rackspace, etc. that is using our API? I'm considering mass blocking all those IPs and more because I really don't see a reason for us to be getting traffic from those IPs. All the traffic that comes from those places are bots and they account for a large percentage of our traffic, taking resources away from real human users.

I've seen nothing from any valid video provider like Roku or any Podcasting services coming from the sources I mention so they won't be affected. Mostly its scrapers and scrapers using our API to pull our whole database over and over.

Let me know if you think that's a bad idea. I may end up blocking all those IPs and add exceptions for any of you guys that are in there doing valid things.

UPDATE 8/17/2015 This has happened. If you are getting blocked, please send your CIDR to ipbans (at) gamespot.com with a brief explanation of what you are doing and we'll get you whitelisted.

Avatar image for tjockapa
tjockapa

18

Forum Posts

1

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

I'm not familiar with all those terms, but it sounds good, even though I'm at a web hosting provider (binero.se). So maybe a possible exception for that?

Avatar image for chaser324
chaser324

9415

Forum Posts

14945

Wiki Points

0

Followers

Reviews: 1

User Lists: 15

#3  Edited By chaser324  Moderator

This is probably ok in general, but given how popular those solutions are for web hosting, I can foresee at least some people trying to implement legitimate services and ending up confused or frustrated.

Avatar image for conmulligan
conmulligan

2292

Forum Posts

11722

Wiki Points

0

Followers

Reviews: 0

User Lists: 11

@chaser324 said:

This is probably ok in general, but given how popular those solutions are for web hosting, I can foresee at least some people trying to implement legitimate services and ending up confused or frustrated.

Yeah, there may not be any legitimate API clients on EC2 or Azure now, but that doesn't mean there won't be in future. Mass banning IPs associated with those services seems like throwing the baby out with the bathwater. Would it be impractical to just detect abusive clients from those IPs and revoke their API keys?

Avatar image for alexw00d
AlexW00d

7604

Forum Posts

3686

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#5  Edited By AlexW00d

Wasn't the whole deal with the API "Yo if you wanna do something cool with go ahead brother?" I know that was before you started working here, but you may piss off a lot of dudes if you just block them all. Especially as a lot of people use it for college projects, or so I've seen from threads on the forums. Obviously you guys have your rules for it, and if they're breaking them then obviously you gotta protect your interests, I've honestly no idea what any of the projects are.

Or I guess an option could be for people who wanna use the API to have them register where they'll be hosting stuff so you know what's what.

Avatar image for bisonhero
BisonHero

12791

Forum Posts

625

Wiki Points

0

Followers

Reviews: 1

User Lists: 2

@alexw00d: People are still free to use the API in non-harmful ways, but it sounds like there are way too many sketchy places (pretty much created and run by scripts) scraping the entire Giant Bomb site and basically mirroring it on some other domain, but that domain owner is getting all of the ad revenue for banner ads.

For example (don't actually go to the site in question): http://www.giantbomb.com/forums/bug-reporting-33/what-is-giantbomb-dot-gamefoxy-dot-com-1775113/

So people can keep doing their cool API projects, but it sounds like they'll just have to check with edgework regarding what hosting solutions they can look into that aren't being abused too much by bots.

Avatar image for zombiepie
ZombiePie

9224

Forum Posts

94836

Wiki Points

0

Followers

Reviews: 3

User Lists: 19

@alexw00d: A college project would most likely NOT use an enterprise-grade cloud computing platform. As you can see here a brief check on the clients that are utilizing the API has revealed some startling results. As it stands there are a number of API users that are doing two to three times the requests to the API that Google does, and that is having its own set of strains. The sources to a majority of those accounts have had the same patterns when the engineers have done IP bans, in that the IPs trace to a third party cloud server run by Amazon or Microsoft.

As it stands, all of the user API requests that we are aware of are not doing this, but this is a yelp to see if there are legitimate API users that utilize a enterprise-grade cloud computing platform.

Avatar image for chaser324
chaser324

9415

Forum Posts

14945

Wiki Points

0

Followers

Reviews: 1

User Lists: 15

#8  Edited By chaser324  Moderator

@zombiepie said:

@alexw00d: A college project would most likely NOT use an enterprise-grade cloud computing platform.

I think describing things like EC2, Azure, Rackspace, etc. as purely "enterprise-grade" is pretty inaccurate. Most cloud hosting platforms out there scale from very low performance instances at cheap (or even free) pricing all the way up to supporting big load-balanced setups with many very high performance instances that can get very expensive. In the case of student projects in particular, I know that AWS and a few other providers have programs specifically for student use in educational applications.

Avatar image for szlifier
szlifier

1518

Forum Posts

120

Wiki Points

0

Followers

Reviews: 0

User Lists: 4

I hosted QLCrew.com on EC2 for a year for free and they still have that program. I moved to alwaysdata.com hosting service now and use the API from there.

I don't make a lot of requests to the API, but blocking that would be a bummer as I think about going back to EC2.

Avatar image for deactivated-5a1a3d3c6820c
deactivated-5a1a3d3c6820c

3235

Forum Posts

37

Wiki Points

0

Followers

Reviews: 0

User Lists: 2

@alexw00d: A college project would most likely NOT use an enterprise-grade cloud computing platform. As you can see here a brief check on the clients that are utilizing the API has revealed some startling results. As it stands there are a number of API users that are doing two to three times the requests to the API that Google does, and that is having its own set of strains. The sources to a majority of those accounts have had the same patterns when the engineers have done IP bans, in that the IPs trace to a third party cloud server run by Amazon or Microsoft.

As it stands, all of the user API requests that we are aware of are not doing this, but this is a yelp to see if there are legitimate API users that utilize a enterprise-grade cloud computing platform.

I have to admit, this is a pretty bizarre post for a web engineer to make.

Avatar image for chaser324
chaser324

9415

Forum Posts

14945

Wiki Points

0

Followers

Reviews: 1

User Lists: 15

#11 chaser324  Moderator

@khann: ZombiePie isn't an engineer.

Avatar image for deactivated-5a1a3d3c6820c
deactivated-5a1a3d3c6820c

3235

Forum Posts

37

Wiki Points

0

Followers

Reviews: 0

User Lists: 2

Avatar image for rick
rick

507

Forum Posts

33

Wiki Points

0

Followers

Reviews: 1

User Lists: 1

OK, we are going to do this. This past weekend was a shit show of dipshits messing with us from various webhosting providers. If you're on one, contact me and give me your particular CIDR (not the webhost's) in most cases it should be a /32 or maybe /31. This will happen today July 20th

Avatar image for subkamran
subkamran

37

Forum Posts

364

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#14  Edited By subkamran

My site is hosted on Azure (keeptrackofmygames.com), so yes, this affected me. Sent a PM to resolve it; my background jobs rely on the API. However, I'm a good citizen and I only request deltas (i.e. games/releases changed since last run date)--and I rate limit my requests myself so I don't pound the API (1req/2s usually). Awhile ago I removed the coupling from my web site itself to GiantBomb, otherwise this would have really screwed me. Again, I've said this before, if it's possible please send us emails--my API key is tied to my account which has an email; get a query going to grab those emails of API devs and send a mass email :) I found this out yesterday. Even better, if you really have to do IP blocking, let us allow IPs in a API dashboard or something. Then you can track/revoke as necessary. I thought the whole point of an API key was so you could prevent this kind of thing--I guess it depends on your hardware, at work the hardware we use can block requests before it hits web servers, so there would never be a case where you could DDoS the site if you provided an API key as long as you blocked at the hardware/load balancer level.

Avatar image for clidus
clidus

31

Forum Posts

101

Wiki Points

0

Followers

Reviews: 0

User Lists: 2

I am also fairly unhappy this block was put in place with no notice to developers (outside scattered messages in this forum, which is otherwise ignored by the engineering team).

The message being returned from the API before I was unblocked also stated quite clearly that Giant Bomb doesn't allow Amazon, Azure, RackSpace, etc traffic from these sources unless you are a legitimate business partner of CBSi? This restriction seems utterly insane. EC2, etc are not just for businesses...

I've really lost faith in the utility of an API that hasn't been updated in years and is now being so restrictively blocked. I guess I will start to look elsewhere...

Avatar image for villainy
villainy

819

Forum Posts

141

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Was this stickied at any point? A ban on all of the biggest cloud providers seems completely backwards for any site exposing an API, especially with so much of the content returned by the API being community driven (ie. wiki). The only option was really to swing the biggest hammer you can? Rate limiting, disabling API keys, automatic time-based bans, none of these are feasible? I fear for the infrastructure of Giant Bomb.

Avatar image for sigomatix
sigomatix

5

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#17  Edited By sigomatix

It would be interesting to know why the API key is not used for blocking instead of blocking cloud providers IP ranges...I'm confused...

Avatar image for gkhan
gkhan

1192

Forum Posts

2

Wiki Points

0

Followers

Reviews: 0

User Lists: 4

@edgework: If it's affecting real users of the site, of course you should do everything you can to stop that stuff. I think it would be a good idea to implement some sort of whitelist system for responsible users, though, so things like Quick Look Crew could operate out of services like EC2, either based on API key or whitelisting individual IPs.

Amazon EC2, in particular, is a fairly common choice for hosting of any kind of web service (and they even include a free year), it would be a shame to ban it completely. Let responsible users have some way to use the API.

Avatar image for euantor
euantor

79

Forum Posts

699

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#19  Edited By euantor

@sigomatix: Blocking the API key requires processing the request, looking the key up from somewhere (eg: a database or cache), performing some logic then closing the request. Blocking an IP can be done at the firewall level so that it never hits the server and never needs further processing. The latter is obviously faster and takes less infrastructure.

Avatar image for sigomatix
sigomatix

5

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

@euantor thanks for the explanation

Avatar image for rick
rick

507

Forum Posts

33

Wiki Points

0

Followers

Reviews: 1

User Lists: 1

@clidus: Webhosting providers are blocked because 90+% of the traffic from them to us is malicious. Anyone can spin up an EC2 instance with a stolen credit card or a prepaid card and then go wild. Its very common for websites to block EC2, Azure etc.

@gkhan: There is a whitelist. That's why we ask you to give us the CIDR for your servers. Everyone who asks gets one.

@sigomatix:We can only whitelist by CIDR because this block is before you even get into the site. There's no database access for us to look anything up (like API key) to see if you're OK. Only the CIDR is available at the point the requests are blocked or not.

@clidus: Regarding notice, this thread and others in the proper forumsis the notice. Are there other places you think this notice should be?

Avatar image for sigomatix
sigomatix

5

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#22  Edited By sigomatix

@edgework: could you explain what a CIDR is or point me to a link that shows how to get one on azure ? The only thing I've managed to get is a reserved IP for my cloud service, so all trafic will go through this IP, is that fine ?

Avatar image for rick
rick

507

Forum Posts

33

Wiki Points

0

Followers

Reviews: 1

User Lists: 1

@sigomatix: a CIDR is a range of IP addresses 1.2.3.4/32 is a single ip address 1.2.3.0/24 is a 255 address range from 1.2.3.0 -> 1.2.3.255.

More Info Here

So you'll probably provide your "reserved IP"/32

Avatar image for sigomatix
sigomatix

5

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Avatar image for clidus
clidus

31

Forum Posts

101

Wiki Points

0

Followers

Reviews: 0

User Lists: 2

#25  Edited By clidus

@edgework: Putting some information on the IP bans on http://www.giantbomb.com/api/ would be a good start...

I don't think you should reasonably expect API users to religiously read a message board that is so rarely commented on by staff.

Avatar image for expensiveham
expensiveham

394

Forum Posts

7275

Wiki Points

0

Followers

Reviews: 0

User Lists: 1

This is a bad solution.

Avatar image for rick
rick

507

Forum Posts

33

Wiki Points

0

Followers

Reviews: 1

User Lists: 1

Avatar image for jslack
jslack

1186

Forum Posts

1165

Wiki Points

0

Followers

Reviews: 1

User Lists: 6

#28  Edited By jslack

@expensiveham: @edgework: @clidus: As far as the information on the API/IP bans go, totally agree with you. Historically we have been bad relaying information or updating API documentation pages. Not that it's a good excuse, but we tend to deal with the problem first as a priority, and changes are constantly being made, making it difficult to finalize implementations. We should do better than that - after dealing with high priority issues. Currently, due to recent events we will be doing more work on the API, and I will make it a point to update the documentation pertaining to important changes in policy or usage.

As far as it being a bad solution - as @edgework said, what's a better solution? I agree that it sucks, I get it. Unfortunately, GiantBomb is a pretty well known site for it's API, and unfortunately it can get abused by people who violate the terms. The huge majority of unique users who use our API are awesome.

I think everyone who works on the site really cares about the API. We want to make it better with you guys.

Avatar image for clidus
clidus

31

Forum Posts

101

Wiki Points

0

Followers

Reviews: 0

User Lists: 2

@jslack: Thank you. It's great to see this problem acknowledged and hope things will improve in the future.

Avatar image for lastcontract
lastcontract

3

Forum Posts

10

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

  1. As @subkamran suggested - allow users with API keys to apply IP addresses to a white list.

  2. You need a DMZ server to establish these connections - pre-approve them - and run diagnostics and tracing to detect abuse. Then through a validated learning model you take what you've learned through tracing to determine odd or abusive behaviors. You take these and build conditions that can be used as flags. Use machine code to test / track conditions and then flag API Keys / accounts that are potential abusers. Rate limit them even further. If the rule is 100% clear that the behavior is abusive you can permanently ban the API Key.

  3. For your more generic API requests (plural catch-alls like "games") you can route users back to a static xml / json file. This would allow you to take advantage of CDN and skip the database calls all together.
    1. As a matter of fact, that's what my website currently plans to do and has something like this setup with Cloudflare. When a xml . json file updates, we simply make a curl request with the URL of the file to flush and re-cache.

  4. Build in more filtering options for the detailed requests. As far as I can tell with a few days of using the API there is no reliable way to only grab recently updated items since "2015-10-07 00:00:000" for example. This will limit payload and I will have less API requests to make. "What's been updated in games", "What's been updated in companies", "What's been updated in people", etc etc etc.

  5. Move away from API keys all together. Use OAuth to validate an account and establish a trusted connection with the end IP.

Sorry I don't have more suggestions - late night.

Avatar image for lastcontract
lastcontract

3

Forum Posts

10

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Actually I'm going to add one more point to my above comments. Add a PAID API option.The fact of the matter is you get what you pay for. And none of us are really paying for all this access. And a lot of us want more flexibility, more requests, IP whitelisting, or have unique situations.

Imagine paying $10 a month and updates and changes to the GiantBomb API database are pushed via notification to your app in real-time - with all the updated details you need to know (at least the proper API urls to call for full details).

You should get paid too. It can be super minimal - to cover the costs to improving the architecture & integrity of the API, or it can be super expensive - to cover the costs of hiring a dedicated programmer, whatever. I guess this is an option that can only be predicted by you - is there enough demand and what the price points are.