LWN.net Weekly Edition for June 2, 2005
Red Hat's directory server
Managing large networks is a challenging task in a number of ways. One of those challenges is dealing with user information throughout a large institution. A single system can keep that information in /etc/passwd, and a small network can rely on tools like rsync or NIS. When the scale of the network gets large enough, however, and a sufficient number of levels of politics gets in the way, simple tools will no longer do the job in an easy or reliable manner. There comes a point where this information needs to live in a central database and be made available as needed across the network.The larger proprietary software vendors - Microsoft, Sun, Novell, etc. - have long offered directory server products aimed at large network ("enterprise") deployment. These products not only make basic user information available network-wide; they can also be used to distribute a wider array of information. Directory servers are a useful and necessary tool, and the competition in this area is fierce.
Red Hat has set itself up to compete directly with the other "enterprise" software companies. To that end, Red Hat has put together a number of valuable products and services, but, so far, it has not been able to offer a directory server as part of its solution. That gap in Red Hat's offerings has increasingly looked like a liability, especially as Novell increases its efforts to compete in the same space. So Red Hat needed a directory server. It found one, some time ago, when it acquired many of the remaining bits of Netscape from AOL. Since the acquisition, however, little has been heard about the former Netscape's offerings.
Until now. On June 1, Red Hat announced the availability of its directory server product. The (now) Red Hat Directory Server is fast, with an impressive array of capabilities; for the full list, see the product sheet [PDF]. The directory server product is sold like Red Hat Enterprise Linux: by subscription. Pricing is not yet available.
The Red Hat Directory Server also resembles RHEL in another way: it has a Fedora equivalent. The Fedora Directory Server Project is where the development work will be done; the site offers source, documentation, mailing lists, etc. It is, in other words, just another free software development project.
At the Fedora site, one can see that, in fact, not all of the directory server code has been released - yet. The server itself is available under a special GPL+Exception license. The code is generally governed by the terms of the GPL, with the exception that plugin modules can remain proprietary. Those modules, however, must restrict themselves to a carefully-specified set of interfaces; anything linking to any other part of the server can only be distributed under the GPL. Other parts of the system - the management console and admin server components - remain non-free, though they are available in binary format. Red Hat plans to free that code as well, but some work is involved; those components are written in Java, and do not play well with the free Java implementations.
The Fedora project has some ambitious goals; the best description of what they have in mind can be found in Christopher Blizzard's weblog. The project claims to want to bring in outside developers, and to make them "feel that they are equals." Given all that the directory server hackers want to do, they will almost certainly need some help from outside. Consider this:
To some readers, this vision sounds like the Windows registry - except that it's a nightmare, monster central registry for thousands of users. The "everything lives in the directory server" approach clearly will not be for everyone. But, for people wanting to create a single, integrated environment across a large organization, this vision will have some appeal. It is truly a view of the network as a single, large computer, with a minimum of boundaries. It promises to reduce the cost of administering large numbers of systems. One can see why Red Hat thinks it needs to go in this direction to remain competitive in the future.
High-end directory servers have, so far, been the domain of expensive, proprietary software. The freeing of the Netscape server, if handled well, could bring an end to that era. So this move by Red Hat is important, and deserving of support. High-quality free infrastructure is a good thing.
A survey of RSS aggregators
Over the years, the proliferation of news sites, weblogs and other sites with daily updates has made it nearly impossible for the average user to visit every site of interest in a timely fashion. For those of us who want or need to keep informed on a variety of topics, RSS, RDF and Atom feeds have become a nearly indispensable tool to skim the headlines for many sites at once without having to spend more than an hour per day clicking through bookmarks. However, this raises the question of how to manage news feeds effectively.
There are a fair number of RSS aggregator projects on Freshmeat, but we decided to limit our scope to applications that are fairly mature, have been updated recently (many RSS aggregator projects listed on Freshmeat have not been updated in years) and run on the desktop. In particular, we were looking for aggregators that handle a large number of feeds, make it easy to manage feeds and integrate well with the Linux desktop and the average user's workflow.
For some time now, this writer has used the Bloglines service to browse RSS feeds. For this article, the feed list from Bloglines, containing about 130 RSS/RDF and Atom feeds, was exported as an OPML file and imported that into each of the aggregators to see how they performed.
RSSOwl
The first aggregator we'll look at is RSSOwl. This aggregator is written in
Java, using the SWT graphic library. RSSOwl has a fairly flexible
interface, and opens up tabs for each new feed that the user opens from the
list of "favorites."
There are a few interesting features in RSSOwl. First, RSSOwl has an export feature, which can be used to export a feed or individual article to PDF, Rich Text (RTF) or HTML. This might be handy for saving feeds and entries for later. RSSOwl also supports AmphetaRate, a centralized ratings service for rating articles found in news feeds.
Oddly, it seems to display feeds as plain text rather than rendering the HTML. We're not sure if this is a glitch in RSSOwl or if we missed a step in setting it up. Otherwise, RSSOwl's performance was very good, and it handled a large number of feeds without any problems.
Snownews
The Snownews
aggregator is unique in this list, because it's not a graphical
application. Snownews is a console-based feed-reader that uses ncurses, and
is a fairly straightforward application with few frills.
Snownews does not support OPML directly, but there is an "opml2snow" script that comes with Snownews to convert OPML into the format that Snownews likes. It's a little more of a hassle than the easy-import offered by other readers, but it gets the job done. Snownews displays headlines and feeds inline. To follow the feed URL, one must use an external browser. It works fairly well with GUI browsers, but works best (at least in this writer's opinion) with a text-mode browser like w3m or Lynx.
It's probably not going to be the first choice for most users, but those who prefer browsing in w3m or other text-mode browsers should definitely check it out.
Liferea
One reader that seems to be getting a lot of attention at the moment is the
Linux Feed Reader, Liferea. This is a
nicely-designed newsreader that's easy to use. It imported our OPML file
with no problems, and gives the user the option of rendering HTML with
Mozilla or GtkHTML2. It spawns an external browser for full articles rather
than displaying them within the Liferea window. This works well if you
prefer to browse content in Firefox, Epiphany or another browser, but we
would like it if Liferea would give the option of displaying the entire
article inside Liferea itself.
One interesting feature with Liferea is the ability to create a new feed from a Feedster search. This can be quite handy if you're interested in finding feeds on a specific topic from a variety of sources.
If one wishes to be alerted, or interrupted, with updates from subscribed feeds, Liferea has a feature that will pop up a notification window at regular intervals with new headlines. We enabled this feature briefly, but turned it off after an hour or so, finding it quite distracting.
We also found Liferea to be a bit less than stable, at least the 0.9.0 release that is available in Ubuntu Hoary. Liferea crashed a few times when doing something as simple as deleting a feed. Overall, its performance was quite good, and the interface is excellent -- but it might need to stabilize a bit before being our first choice of the available aggregators.
Blam
Blam is a aggregator
written in C# using Mono and GTK#. It's a little more basic than Liferea or
Snownews, but it serves well as a basic newsreader. Headlines and summaries
are displayed within Blam, but it requires an internal browser to follow
links.
At first, Blam would not import the OPML from Bloglines. We tried subscribing a few feeds manually and then exporting Blam's list to OPML to find out what was different. The difference was that Bloglines uses "title" for the name of each feed, and Blam expects "text" -- after doing a quick search and replace in Vim, changing "title" to "text," Blam imported the list of feeds just fine.
Blam is a good choice for users who want a very basic newsreader that's fast and light.
Akregator
KDE users are probably already familiar with Akregator. This reader uses
KHTML to display full articles in tabs within the Akregator interface, at
least by default. Akregator can also be configured to use an external
browser for those who prefer Firefox or another browser to
Konqueror/KHTML.
For users who prefer Konqueror for Web browsing, Akregator is an excellent choice. Konqueror auto-discovers feeds on pages, and makes it easy to add those feed subscriptions to Akregator. Akregator has fewer frills than Liferea or RSSOwl, but it integrates very well with KDE and performs well.
Firefox and Thunderbird
We should also mention Firefox and Thunderbird. While not dedicated aggregators, both applications allow users to read and manage news feeds. However, they lack a number of features that many users would want, at least natively. The advantage of using Firefox as an aggregator is that Firefox makes it very easy to create a "Live Bookmark" to subscribe to feeds, when the browser discovers the feed in a page.
If Firefox doesn't detect the feed, that complicates things greatly. Firefox supports adding a bookmark manually, but does not support adding a feed manually. The Live Bookmark also doesn't allow the user to preview the content or full text, just the headlines from a feed. Firefox doesn't support importing OPML files natively, so users with large subscription lists would have to go through a lot of work to re-subscribe to sites using Firefox.
Of course, it is possible to extend Firefox's capabilities with
extensions. We tried the Sage
extension with Firefox, and were quite pleased with it. The Sage extension
adds a sidebar to Firefox much like the Bookmarks and History
sidebars. There are two panes in the sidebar, a list of subscriptions and
lower pane that lists headlines from the selected feed.
The integration with Firefox makes it a convenient aggregator for those of us who use Firefox exclusively or extensively. Sage had no problem importing the OPML list exported from Bloglines, and its performance was quite acceptable. There are a number of other news reading extensions for Firefox for those who are interested.
Thunderbird, by itself, is also limited in its abilities to import and manage feeds. For users who spend a lot of time in their e-mail client, and who have a fairly limited number of feeds, it would work well -- but this writer would not like to have to import 100 or more feeds using the "Manage Subscription" dialog for Thunderbird. The advantage to using Thunderbird for feeds is the ability to mail links from subscribed feeds.
We found the Forumzilla extension for Thunderbird, which adds OPML import and other features to Thunderbird. Unfortunately, it consistently crashed Thunderbird when trying to import the OPML exported from Bloglines.
Summary
After spending time with each of these aggregators, this writer prefers Liferea and Sage, though any of the aggregators would do in a pinch. Given the variety and maturity of the various options, Linux users should not have much trouble finding an aggregator that works well for them.
IP Software Compliance Tools -- Who Needs Them and Why?
When Black Duck Software first made available its software compliance tool, ProtextIP, about a year ago, the typical first reaction was to view it as a response to SCO's lawsuit.Now there is a second such product, Palamida's IP Amplifier, and it's clear there is a market for such products. Cisco, for one, has just signed on with Palamida. Who really needs products like this, and why? And is there a difference between them?
Who Needs Software Compliance Tools?
Now that Free and Open Source software has hit the mainstream of the enterprise, businesses need to be certain that they are not taking on legal liabilities with the code. There are many licenses, and making sure a company is abiding by them all is complex. That's one reason you are hearing so many voices calling for simplifying and settling on fewer licenses. But it goes deeper than that.
"Everyone who distributes software should know what goes into it," attorney Lawrence Rosen explains. "And almost everyone who distributes software wants to comply with the relevant licenses. Most reputable software-based businesses recognize that playing fast-and-loose with copyright claims isn't worthwhile."
While most businesses today are pleased to adopt and incorporate open source products into their products and services, they want to know what licenses apply so that they can comply with the terms.
"That's what Black Duck and Palamida make possible," Rosen adds. "A distributor or user can know what open source software is in its own software and act accordingly, early in the cycle. It's now possible to evaluate license compatibility for specific component sets and plan appropriate combinations for use in products to be developed."
Unfortunately, developers sometimes use GPL code (or other licensed FOSS code) without telling management, thinking it's public domain. It isn't. And with outsourcing, sometimes developers are in other countries that may have more relaxed views on copyright and this can cause problems. So when developers let things happen they shouldn't (such as making unauthorized copies or derivative works), companies have an automated way to catch some of that and react appropriately before much bigger problems can develop.
Software practices are also changing. Application development today is becoming more like an assembly line, more a matter of assembling bits of code from open source projects and from outsourced firms and incorporating them into proprietary products than handcrafting 100% custom software. This isn't a bad thing, because it makes it possible to avoid having to reinvent the wheel -- one of the advantages of Open Source -- but it also means that checking on license terms and making sure you are complying with them all is vital to the process.
And there is no doubt that enforcement of GPL violations is increasing, as Fortinet learned recently when a German court banned their U.K. subsidiary from further distribution of their firewall and antivirus products until they complied with the GPL, which they promptly did.
Then there is the Sarbanes-Oxley Act [PDF], and its requirements for IT audits.
"The SECs new rules on heightened corporate responsibility for public company reporting known as Sarbanes-Oxley require public companies to abide by internal procedures that are sufficient to provide reasonable assurance that the financial and non-financial information required to be disclosed in its periodic and current reports is accurate," says Karen Copenhaver, executive vice president and general counsel for Black Duck Software.
"Specifically, Sarbanes creates two new corporate governance requirements: assessment of internal controls over financial reporting (required by section 404 of the Act), and heightened corporate responsibility for financial reports (required by section 302 of the Act). It would be hard to overestimate the burden that compliance with these new rules has placed on public companies in the first few years since their enactment.
"Even before Sarbanes, public companies were required to address intellectual property matters in their current and periodic reports. A reporting company traditionally discloses the importance of its intellectual property assets to the companys business and any third-party intellectual property encumbrances on the companys ability to conduct its business. To the extent that a failure to identify or comply with third party license obligations has an effect on the accuracy of any of this information, public companies will be concerned about compliance with their obligations under Sarbanes."
Obviously, Sarbanes-Oxley has upped the ante considerably. But most businesses and developers want to do the right thing anyway, apart from outside pressures. The tools don't set policy for a company, but they surely make it easier to make sure policies are observed.
What Do the Tools Offer?
Before automated software compliance tools were available, due diligence in checking software for infringing code was done by assigning the tedious task to senior software programmers in the company, who, together with lawyers laboriously looked through the code. The problem with such a system, aside from the time it required and the drudgery, is that no one person knows all the Free and Open Source projects available by sight, let alone all the proprietary products you are not allowed to see without complex legal arrangements.
Automated systems are an obvious answer. What they provide is a Google-like collection of code. They've collected it all for you. Both tools scan for copyright infringement and can spot more than verbatim matches. But they do more than scan. Palamida says its IP Amplifier product automatically detects, manages and reports on the third party, commercial and open source components that may exist in their software code base. It consists of two key modules -- the Compliance Library and the Detector. Using an automated collection system, the Compliance Library contains billions of source code snippets and millions of files of the most commonly used open source projects found in the market.
Palamida: "The Palamida IP Amplifier uses three different types of technologies to automate detection, source code fingerprinting, file digest matching, and for Java files, namespace matching. This means the software is able to conduct both source code and binary code analysis. So for companies whose developers download whole libraries, compiled code, XML files, icons, text files, and include those resources into their code base, the software will still detect their usage even though their source code is not available and even if we do not have the components listed in our database."
Next, there is a "layer of analysis that is beyond just code matching for reduction of false positives. We call this technology CodeRank. CodeRank looks at the code matches and evaluates the results on multiple levels, including uniqueness, coverage and clustering. How unique is that match to what is in the Palamida database? How much of a customer file matches a file in Palamidas database? How dense are the matches do they look like a continuous cut and paste or does it look like two engineers coded against the same API?"
After their software evaluates the code matches, Palamida assigns a CodeRank number to the matches; the higher the CodeRank number the higher the chances of copying. In the scan results, users will see a list of all code that has matches and a list of all the third party products that they most likely came from, with the most likely on top.
Reports identify all components that include open source and list their licenses, text and license information, in addition to the CodeRank. All the information and data is exportable in XML data format, allowing users to create custom reports, as well as via HTML reports.
Black Duck too offers a great deal more than just code scanning. Black Duck's Copenhaver: "We do more than just scan code. Our product provides a full suite of services covering project planning, code analysis and detection, license analysis and management, auditing and archival capabilities for the complete life cycle of software projects.
"From an open source perspective," Coperhaver adds, "we help developers manage the origins and obligations of code that they use so they can meet the expectations of the industry and community. But everything we do works for both open source and proprietary or commercial code. Users can add code prints and licenses into the system to manage their internal proprietary code along with open source.
"Our product helps people manage the introduction of licensed materials into their code bases, understand the obligations associated with that code (and combinations of components from different sources), provide an environment for controlled remediation of issues that arise and create an archivable record of the actions that were taken by the team along the way. Our products are designed to bring together developers, lawyers and business decision makers into a collaborative environment."
Black Duck offers an analysis 'engine' that processes licenses at a detailed level and alerts users to license conflicts and obligations of both software source and binary components and their combinations. The ProtexIP Knowledgebase contains detailed breakdowns of 500+ software licenses for automated comparison of license terms and notification of collective obligations, and the data is remotely updated frequently with new licenses as they come to market. It recently added what they call Custom Code Prints, which gives ProtextIP support for proprietary source code.
Palmida claims a database of 40,000 of the most commonly used OSS projects and their associated licenses, monitoring more than 38 million open source files and billions of source code snippets. The Knowledge Base also contains all pertinent information regarding the open source projects: name, version number, project name, licensor, licensor information (when available), license, license text, and project URL, all using an automated collection toolset that incorporates information on all the new projects released on the major OSS repositories for real time updates.
The Palamida database takes up less than 10 Gb disk space, thanks to a compression algorithm, and it's all kept on a customer's own servers, behind their firewall. Its code is written in Java. IP Amplifier can be configured to search daily or weekly and has a set of configuration tools to integrate it into build systems.
Are There Any Differences?
The biggest differentiator is cost. IP Amplifier 3.0 is licensed on an annual subscription basis, for unlimited number of users, at prices that begin at $50,000 and go up to $250,000 per year, depending on the customer's development environment. There is a 30-day Free Trial offer.
Black Duck now offers two options. You can pay an annual licensing fee for its multiuser ProtextIP product, at $25,000 per year, and then add additional charges based on the amount of code you have. Or, you can use their new hosted ProtextIP/OnDemand product, an online system for a single user, single project, 90-day sessions, for which you pay based on the amount of code you wish to scan. It costs $3,000 for 10 MB of code and costs scale up to $25,000 for 100 MBs. A company thinking of acquiring another might wish to use the online tool, rather than purchase more costly version.
Both products still require human analysis, naturally. There can be false matches, if two independent developers happen to write software that is very much the same, even if there has been no copying, just because there are only so many ways of writing the same instruction. Both tools provide not only identical matches but also flag similarities in your source code to others' programs that are worth your further investigation and list issues for review. It's important to realize, however, that the tools scan and analyze copyright issues and licensing issues, not patent infringement. That is an entirely separate ballgame. But for what they are designed to do, unquestionably they have simplified, organized, and improved the due diligence process.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: A look at Tor; New vulnerabilities in apache, gxine, ImageMagick, and mailutils.
- Kernel: Time to remove LSM?; Negative file offsets; The realtime preemption debate begins.
- Distributions: KANOTIX - The Knoppix Improved; New: Pentoo, Navyn OS
- Development: Anyterm: A Terminal Anywhere, new versions of Mailman, popa3d, Knettools, OpenSSH, Twisted, SSL-Explorer, Caravel CMS, PythonCAD, Eman, KDE, mnemo, XCircuit, HLA Adventure, OpenEMR, BEAST/BSE, Mozilla Deer Park, SBCL, Bochs, OProfile.
- Press: Greg K-H on Kernel development, Crackers target Phishing sites, Stallman on Nokia's patent grant, PalmSource promotes Linux, the Rexx scripting language, UML tools, EU funds open-source.
- Announcements: Nokia Donates to GNOME Foundation, analogs of Windows software in Linux, Open source and the commoditization of software, Debian Day at LinuxTag 2005, Europython 2005 update, Fedora Talk at USC, IEEE Web Services CFP, Embedded Technology 2005, Yokohama.
- Letters: The gift of volunteering; Brooks v. Lyons.