More Useful SlenderT
I think my SlenderT library is starting to get useful:
- I've run quite a few benchmarks on it
- I've worked on the load to make that pretty fast
- I've implemented the query
- I've played with a fairly large database, finding that it delivers a very expressive tool
- I've written some documentation that can be found at "GitHub":http://github.com/davidrichards/slender_t
Some quick examples from the documentation:
>> db = SlenderT.load('spec/fixtures/business_triples.csv')
>> db.find('BSC', 'name', nil)
=> [["BSC", "name", "Bear Stearns"]]
That tells us that BSC means Bear Stearns. This tells us who we know Bear Stearns contributed to recently, and how much:
>> val = db.query(['?contribution', 'contributor', 'BSC'],
?> ['?contribution', 'recipient', '?recipient'],
?> ['?contribution', 'amount', '?dollars'])
=> [{"?contribution"=>"contrib285", "?dollars"=>30700.0, "?recipient"=>"Orrin Hatch"}, {"?contribution"=>"contrib284",
"?dollars"=>168335.0, "?recipient"=>"Hillary Rodham Clinton"}, {"?contribution"=>"contrib287", "?dollars"=>5600.0,
"?recipient"=>"Christopher Shays"}, {"?contribution"=>"contrib288", "?dollars"=>205100.0, "?recipient"=>"Christopher Dodd"},
{"?contribution"=>"contrib290", "?dollars"=>17300.0, "?recipient"=>"Frank Lautenberg"}, {"?contribution"=>"contrib286",
"?dollars"=>5000.0, "?recipient"=>"Barney Frank"}, {"?contribution"=>"contrib289", "?dollars"=>13000.0, "?recipient"=>"Michael
Dean Crapo"}, {"?contribution"=>"contrib294", "?dollars"=>4600.0, "?recipient"=>"Pete Sessions"},
{"?contribution"=>"contrib295", "?dollars"=>5000.0, "?recipient"=>"Paul E. Kanjorski"}, {"?contribution"=>"contrib292",
"?dollars"=>6600.0, "?recipient"=>"Nita Lowey"}, {"?contribution"=>"contrib293", "?dollars"=>5000.0, "?recipient"=>"Deborah
Pryce"}, {"?contribution"=>"contrib291", "?dollars"=>102260.0, "?recipient"=>"Joe Lieberman"}]
>> val.size
=> 12
It'll get some more lovin', but it's plenty good for this week's deliverables.
New Statisticus 1
So, I spoke at URUG last night, and that was good motivation to clean up some code. Statisticus is doing what I think it should, here's a quick rundown.
h2. Basic Interface to R
To do something that R already does:
stats_class :choose Choose.call(49,6) # => 13983816.0
This example basically knows that R understands choose, which takes two parameters, n and x. N is for the number of possible outcomes per choice, x is the number of choices to make sequentially, and the return is the total number of choices combined. So, for a lottery with 49 possible numbers per ball and 6 balls to choose, then there is a 1 in 13,983,816 chance of picking the lottery number.
The stats_class method is just sugar for:
class Choose include Statisticus end
Meaning, stats_class :choose is the same as above.
To do something which I defined locally, I write an R lib (my_obj.r):
my_obj <- function(n,x) choose(n,x)
This is a trivial example. It defines a function called my_obj in the R runtime. Now, with the same syntax, I can write something like:
stats_class :my_obj MyObj.call(49,6) # => 13983816.0
The example is trivial, we're just passing things to the R function choose. But the power is pretty interesting. If you have R files written per function, then you can just dump all that code in a subdirectory or in ~/.statisticus somewhere, and Statisticus will slurp it up and use it automatically.
Now, if you want to do something more interesting with an R library, if you create a file, some_code.r, and put in it:
anything_else <- function(x) x
Now, you'll need a smarter class to handle the underlying code:
class SomeCode
def process(x)
r.anything_else(x)
end
end
SomeCode.call(123)
# => 123
This introduces an interesting concept. The process method is the default method (coming from TeguGears) for running a chunk of code. The r method is available inside of there to refer to the R runtime. Any code can be sent to or taken from the R runtime. Many steps can be incorporated, which may make sense sometimes. This is a useful approach if you have code that has many functions in it, and you want to access each one with its own Ruby class.
There are a few more tools that may be generally useful:
- stats: a command-line program for starting Irb with Statisticus running in it
- calc/Calc.call('...'): sugar for sending something directly to the R runtime.
Statisticus is built on top of TeguGears, which means you have or can expect:
- memoized method calls
- composable method calls
- a thread pool for running concurrent code
- each method implementing an observable pattern, so that method results can be broadcast across threads, processes, and machines
- a messaging back bone for distributed code
You should be able to install Statisticus with:
sudo gem install davidrichards-statisticus
Dependencies are R, RSRuby, and TeguGears.
Anyway, let me know of any questions or problems you may have. I'd be happy to work out more examples. Many examples are being included in Panorama, which should be more informative than this introduction.