API legal use question
@erickb: Hi. One of my various hobbies is messing around with the Giant Bomb API (see: Giant Gopher and this blog post). I just saw your post this morning and I'd like to help if I can though I admit I'm a little confused by your use case.
I checked out your PDF and it seems to me that since it's a static document with no dynamic content that you might be better served by just including hyperlinks to the appropriate pages in the GB WIKI. Alternatively, you might consider creating an appendix that contains brief game summaries quoted from the WIKI.
However, both hyperlinking and adding an appendix have drawbacks. The appendix in particular will balloon the size of your PDF making it harder to print. Which I'm guessing is a primary concern considering the document is provided as a PDF. It's also worth considering that the user of the PDF may be better served and become more informed by a self-directed Google search than they would be by having their attention funneled to the GB WIKI.
All that being said, it might be possible to use the API in sort of a "pre-process" manner to help generate the hyperlinks and/or appendix mentioned above. Though even then the effort wouldn't be without difficulties.
It's easy enough to scrape a list of games from your PDF. It's much harder to corollate those games with entries in the WIKI. We can use the GB Search API to search for the game by name get a list of candidate entries but it's important to realize that games often don't have a canonical name. This means that the game name used in advertising may differ from the name used in your PDF which may differ from the name used in the GB WIKI. There are also no guaranties in which order the search API will return it's results. This means that it's not sufficient to just use the first result of the search. So hand filtering will be required.
Once we have the summaries for the games we are going to quickly find other problems. The summaries will sometimes contain weird HTML markup, contain jokes, have odd unicode problems, and/or will just be plain low quality.
Given all that; don't lose hope. Though I don't think a fully automated solution is in the cards I do think it's feasible to create some tools to help generate the hyperlinks/appendix provided you are willing to do some judicious editing and are ready to take responsibility for the accuracy of the data.
One last thing. I don't know a lot about PDFs but I do know that they provide some scripting capabilities. It would be a "bad idea" to try and use them to query the API. The biggest problem with this approach would be that you would have to embed your API key in the PDF. This would make it ripe for extraction and abuse.
Anyhoo, if you want to explain a bit more about you plans or if you have any questions I'd be happy to help.
User Lists: 0
@erickb, that is awesome, and I love to see it. If you need help with the programming, let me know.
@paulwgraham: I guess I missed the part in explaining where we get much much more data then we can fit in our pdf booklet, so we are building an online version which will have hundreds of games in it (new ones added every month), hence thinking to use the API. Otherwise yea I would just manually link the digital pdf to summary pages on the wiki. Thanks so much for your reply, sorry I missed that important piece in my post.
@erickb: Cool, cool, cool. It seems like you have it well covered but my offer of help still stands. It's not too bad but the GB API can be quirky. If you run into any hiccups there is a decent chance I've run into them too so ping me and I'll be happy to share (what little) I know.
While its not ready for public release our site is live and I wanted to share it with you all. Some of the game names we had didn't match the ones in the API so we are working through those and matching them manually but its a pretty easy process and once done doesn't have to be done again. Thanks for all the help. http://gametherapy.org/
Please Log In to post.
Log in to comment