Here is a list of the potential changes to SVMTrainer that were suggested to me during this weekend’s conference. Searcher Implement conditions on acceptable web document sizes to optimize document retrieval time Try using a small initial search as a seed to get other search terms and expand the diversity of my training set –…
Read moreFocusing
The last month has taken me through a couple of choices in how to focus this project. First, I attempted to design a ‘version 2’ that would use project files. The idea was to give a project a group of categories to use, as well as persistent result sets and a dictionary that can be…
Read moreHumble Beginnings
I ran my first couple of training sets today. I must confess, the results are not pretty. Let’s start with the summary: Summary The training set for the text categorization example given by Joachims contains 2000 weighted example vectors. The precision of the resultant model, as estimated by svm_learn, is 93.07%. My first training set…
Read moreSearch Limits
Today I learned about some under-documented limits on Google’s AJAX search API. While working on my Searcher class (that will eventually generate training sets for the SVM) I asked Java to print the first 50 page titles that Google returned. Every time I ran the program I would get a JSONException after 28 results. Upon…
Read moreWhy JSON?
I was wondering as I researched last night why Google AJAX Search API was using JSON. I have never even heard of JSON (JavaScript Object Notation). I fully expected the AJAX API to be using XML… it is Asynchronous JavaScript And XML, after all. But, at least for the RESTfulinterface (another term I’ve never heard)…
Read more