Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Wed, 14 May 2003
Maybe the future of the semantic web is LSI
Stefano did what I did last night and dug into all the papers on Latent Semantic Indexing (LSI). Actually, he did more than I did, because he got the papers and the math. But I'm not sure if he got something that I got: LSI is patented. So all that cool multidimensional linear algebra cannot be used in open source software. Does that kill the semantic web? Of course not. But it 's going to prevent community development of systems that use LSI.
[00:44] | [computers/internet] | # | TB | F | G | 5 Comments | Other blogs commenting on this post
First of all, LSI is simply based on the concept of spectral decomposition, something that has been around for centuries, you can't patent that. Period. Also, you can't patent how I create a matrix of data and if you do, there is always another way of doing it which can be equivalent (or, even, potentially better). The patent covers ONE possible way of using that math. I'll simply implement another one, show it's better and route around the patent obstacle (which shows why patents on algorithms will never work unless extremely precise, say RSA public key, for example or LZW encoding)





for example, the patent DOES NOT! cover the use of the underlying markup structure of text. This has been prooven effective by google. I'm currently writing the math to do that. It's hard as hell (did you ever applied the spectral composition to hyper-hyper-dimensional matrix?) and I can't find litterature on it, so this means I'm doing something new, innovative. The patent system was done exactly for that: stop blatant stealing but foster innovation. I'm doing exactly that.





Even more: the patent was filed in 1983, it won't take long to expire anyway.





Moreover, US patent policies allow a patent to be used for research. If you don't the software commercialy, there is no problem in releasing the software as open source. And if you want to use it commercially, you'll have to ask for a license to the patent holder (just like it's not illegal to redistribute GIFs in our open source packages). Nothing new under the sun: you can bet your ass that Microsoft will do the same with Mono as soon it emerges as a potential problem for their .NET marketing





And if the above was not enough, remember that only the US have to comply to US patent restrictions. The world is much bigger and diverse than the US.
Posted by
Stefano at Wed May 14 07:54:04 2003


I whipped up a quick implementation of something like LSI (see http://www.perl.com/pub/a/2003/02/19/engine.html). The (very rough) code is on my website.
Posted by Nick at Thu May 15 05:58:46 2003

Some things you might be interested in:

Maciej Ceglowski, through his work at NITLE (nitle.org) has developed an LSI search tool. You can read about it at
http://javelina.cet.middlebury.edu/lsa/out/lsa_intro.htm

There's also a prototype accessible somewhere from there. This approach has all the limitations of LSI: it doesn't scale beyond a few thousand docs. And of course, there's the patent.

He and the fellow he works with have developed an alternative, which is described here. http://www.idlewords.com/weblog.04.2003.html#157
I don't know of any implementations.

And of course, if you haven't yet, you need to check out Waypath (waypath.com), which implements a high-precision semantic search using something completely different from LSI: scalable, portable, mergeable, the list goes on.
Posted by Steven Nieker at Fri May 16 21:18:22 2003

Some things you might be interested in:

Maciej Ceglowski, through his work at NITLE (nitle.org) has developed an LSI search tool. You can read about it at
http://javelina.cet.middlebury.edu/lsa/out/lsa_intro.htm

There's also a prototype accessible somewhere from there. This approach has all the limitations of LSI: it doesn't scale beyond a few thousand docs. And of course, there's the patent.

He and the fellow he works with have developed an alternative, which is described here. http://www.idlewords.com/weblog.04.2003.html#157
I don't know of any implementations.

And of course, if you haven't yet, you need to check out Waypath (waypath.com), which implements a high-precision semantic search using something completely different from LSI: scalable, portable, mergeable, the list goes on.
Posted by Steven Nieker at Sat May 17 11:04:38 2003

You can subscribe to an RSS feed of the comments for this blog: RSS Feed for comments

Add a comment here:

You can use some HTML tags in the comment text:
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are: <a href>, <em>, <i>, <b>, <blockquote>, <br/>, <p>, <code>, <pre>, <cite>, <sub> and <sup>.

You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk

Name:


E-mail:


URL:


Comment:


Remember my info?


twl JPG

About

Ted Leung FOAF Explorer

I work at the Open Source Applications Foundation (OSAF).
The opinions expressed here are entirely my own, not those of my employer.

Creative Commons License
This work is licensed under a Creative Commons License.

Now available!
Professional XML Development with Apache Tools : Xerces, Xalan, FOP, Cocoon, Axis, Xindice
Technorati Profile
PGP Key Fingerprint
My del.icio.us Bookmarks
My Flickr Photos


Syndicate
RSS 2.0 xml GIF
Comments (RSS 2.0) xml GIF
Atom 0.3 feed
Feedburner'ed RSS feed

< May 2003 >
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
25262728293031

Archives
2006
2005
2004
2003

Articles
Macintosh Tips and Tricks

Search
Lucene
Blogs nearby
geourl PNG

Categories
/ (1567)
  books/ (33)
  computers/ (62)
    hardware/ (15)
    internet/ (58)
      mail/ (11)
      microcontent/ (58)
      weblogs/ (174)
        pyblosxom/ (36)
      www/ (25)
    open_source/ (145)
      asf/ (53)
      osaf/ (32)
        chandler/ (35)
        cosmo/ (1)
    operating_systems/ (16)
      linux/ (9)
        debian/ (15)
        ubuntu/ (2)
      macosx/ (101)
        tips/ (25)
      windows_xp/ (4)
    programming/ (156)
      clr/ (1)
      dotnet/ (13)
      java/ (71)
        eclipse/ (22)
      lisp/ (34)
      python/ (86)
      smalltalk/ (4)
      xml/ (18)
    research/ (1)
    security/ (4)
    wireless/ (1)
  culture/ (10)
    film/ (8)
    music/ (6)
  education/ (13)
  family/ (17)
  gadgets/ (24)
  misc/ (47)
  people/ (18)
  photography/ (25)
    pictures/ (12)
  places/ (3)
    us/ (0)
      wa/ (2)
        bainbridge_island/ (17)
        seattle/ (13)
  skating/ (6)
  society/ (20)



[Valid RSS]

del.icio.us linkblog

www.flickr.com

Blogroll

java.blogs
Listed on BlogShares

Locations of visitors to this page
Where are visitors to this page?


pyblosxom GIF