Erlang as a System Monitor

Posted by David Richards Fri, 06 Mar 2009 09:08:00 GMT

I have this wild and crazy idea that I just may do if someone else wants to pair program with me. Maybe you can tell from this blog that I can get quite scatter-brained, working on a lot of projects, improving each incrementally. This isn't a project I'd start without some real commitment to following through and finishing what I started.

The idea is to grease the skids a little in my brain regarding Erlang and put together a file system management tool that would really inform the user about the kind of content he has on his system in a pragmatic way. I thought I'd have these general elements in place:

  • A full-text search engine that is fast and concurrent
  • A dictionary of meta information on each file
  • A classification system that adds semantic meaning to files, like project name, language, development project, installed library, active project, active when, downloaded?, original URI, etc.

It's kind of a combination of several machine learning skills with Erlang. Why Erlang? Because I want to see how portable I can make it, and how integratable it can be to other systems. I also want to demonstrate how quickly and non-intrusively I can make a heavy-lifting app work with Erlang. It would be fun, useful, and maybe not too much work.

I think we can use Joe Armstrong's full-text search engine as a starting point. Also, the command-line command, lsof, is a very powerful way to see what's happening on a system, possibly files being downloaded, etc.

Anyway, this is an open invitation, if it sounds good to anyone.

What Would a Good Graph Database Look Like? 1

Posted by David Richards Sat, 21 Feb 2009 05:32:00 GMT

I've been working a lot with graphs, using various graph libraries for Ruby. I like these, and I can get a lot done with them. However, none of the Ruby graph libraries have optimization features baked in, and none of them provide transparent persistence. So, I've been Googling around, to see what's out there. There are some pretty neat solutions that may be the way to go. As I look at the adopt/build decision for my uses, I'll use the following criteria to guide my search. Some of these ideas came directly from Renzo Anglez and Claudio Gutierrez' presentation, "Querying from a Graph Database Perspective: the case of RDF":http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.ciw.cl%2Fmaterial%2Firw-2005%2F2005-irw-gutierrez.pdf&ei=mx-fSdiZFor2sAOn2P3MCQ&usg=AFQjCNFKiV0woNWvkOG91dtF7gOh2AlzNg&sig2=LP9byU_vM92oU5lz05vm5A

So, a good graph database should probably offer:

  • path queries (Are nodes a and b related? What is the shortest route between a and b? What is the influence of node a?)
  • distance queries (What is the distance between nodes a and b?)
  • pattern matching (Where do these patterns appear? How much of a pattern is matched?)
  • adjacency queries (All nodes that are separated by no more than x degrees from a vertex. All the edges, given the same parameters.)
  • degree queries (All nodes with at least/most x in nodes or out nodes or nodes in general)
  • concurrency (Shards well in a cluster)
  • indexing isomorphic subgraphs (fast lookup for some of the above-mentioned queries)
  • adjacency matrices for the whole graph or subgraphs (transparent use of the graph as a graph, or as a matrix)
  • a wide range of graph types (the "graph database":http://amalfi.dis.unina.it/graph/ is a dataset of graphs, with 84 distinct types of graphs available)

So, this is really two problems:

  • finding/creating a language that expresses these queries
  • implementing these ideas efficiently in the graph

I like the Neo4j and Neo4j.rb, and I'd like to run a couple of use cases through that first. There are some interesting resources "here":http://github.com/andreasronge/neo4j/tree/master. I really liked the "slides found here":http://jaikoo.com/assets/presentations/neo4j.pdf

If that doesn't work out, I want to take another look at some RDF solutions. At the end of the day, I may be interested in building an Erlang-based solution, if everything else fails.