Question

My company is heavily invested in the MS BI Stack (SQL Server Reporting Services, -Analysis Services and -Integration Services), but I want to have a look at what the seemingly most talked about open-source alternative Pentaho is like.

I've installed a version, and I got it up and running quite painlessly. So that's good. But I haven't really the time to start using it for actual work to get a thorough understanding of the package.

Have any of you got any insights into what are the pros and cons of Pentaho vs MS BI, or any links to such comparisons?

Much appreciated!

Was it helpful?

Solution

I reviewed multiple Bi stacks while on a path to get off of Business Objects. A lot of my comments are preference. Both tool sets are excellent. Some things are how I prefer chocolate fudge brownie ice cream over plain chocolate.

Pentaho has some really smart guys working with them but Microsoft has been on a well funded and well planned path. Keep in mind MS are still the underdogs in the database market. Oracle is king here. To be competitive MS has been giving away a lot of goodies when you buy the database and have been forced to reinvent their platform a couple of times. I know this is not about the database, but the DB battle has cause MS to give away a lot in order to add value to their stack.

1.) Platform
SQL server doesn't run on Unix or Linux so they are automatically excluded from this market. Windows is about the same price as some versions or Unix now. Windows is pretty cheap and runs faily well now. It gives me about as much trouble as Linux.

2.) OLAP
Analysis services was reinvented in 2005 (current is 2008) over the 2000 version. It is an order of magnatude more powerful over 2000. The pentaho (Mondrian) is not as fast once you get big. It also has few features. It is pretty good but there are less in the way of tools. Both support Excel as the platform which is esscential. The MS version is more robust.

3.) ETL
MS - DTS has been replaced with SSIS. Again, order of magnatude increase in speed, power, and ability. It controls any and all data movement or program control. If it can't do it you can write a script in Powershell. On par with Informatica in the 2008 release. Pentaho - Much better than is used to be. Not as fast as I would like but I can do just about everything I want to do.

4.) dashboard
Pentaho has improved this. It is sort of uncomfortable and unfriendly to develop but there is really not a real equiv for MS.

5.) reports
MS reports is really powerful but not all that hard to use. I like it now but hated it at first, until I got to know it a little better. I had been using crystal reports and the MS report builder is much more powerful. It is easy to do hard things in MS, but a little harder to do easy things. Pentaho is a little clumsy. I didn't like it at all but you might. I found it to be overly complex. I wish it was either more like the Crystal report builder or the MS report builder but it is jasper like. I find is to be hard. That may be a preference.

6.) ad hoc
MS - this was the real winner for me. I tested it with my users an they instantly in love with the MS user report builder. What made the difference was how it was not just easy to use, but also productive. Pentaho - is good but pretty old school. It uses the more typical wizard based model and has powerful tools but I hate it. It is an excellent tool for what it is, but we have moved on from this style and no one wants to go back. Same problem I had with logiXML. The interface worked well for what it was but is not really much of a change from what we used 12 years. http://wiki.pentaho.com/display/PRESALESPORTAL/Methods+of+Interactive+Reporting

There are some experienced people out there that can make Pentaho really run well, I just found the MS suite to be more productive.

OTHER TIPS

Warning -- there are numerous sites out there listing the numerous deficiencies, bugs, and annoyances with SSIS. Not sure why SSIS came out on top with the post -- but before you bet your project on it, look at what people have to say in the blogosphere. From my experience its about 20:1 ranting about how horrible SSIS is to work with--I can concur as well, currently looking for any alternative.

Great information here? I have not tried Pentaho but and planning on checking it out. I am a seasoned MS BI consultant, using it since 1998. SSIS is very fast and very powerful but the criticisms are spot on. I found the following issues with SSIS:

(1) It is hard to debug, you get cryptic errors that may not give you any hint about what and where the problem really is.

(2) Per a prior comment, it is the shittiest development environment ever! I have no clue what they are thinking.

(a) Create a table with a 100 or more columns and put a merge join on it. Now go back in and try to make an update to the merge join (like pull a new column through). It can take several minutes, even on the fastest machine after you click ok on the merge join to save your change. I have a huge dataflow with lots of wide records and many merge joins. Adding one column to the dataflow takes more than half a day. I update a merge join and then have to go do something else and check back 5-10 minutes later to see if it is done. Microsoft's response to this is to break up your package into multiple packages, place the data in a table or binary between them. Well if you are going to disk between all the steps, you may was well do the whole thing in SQL! One of the main purposes of an ETL tool is to all this stuff in memory and avoid disk I/O.

(b) The designer outright crashes sometimes, losing all your work since last save (I do ctrl-S in my sleep now because of this)

(c) I had to figure out a hack and generate SSIS package XML in Excel for wide records. I have a Healthcare client where 600+ column records are commonplace. If you try to define a file format with 600 columns in SSIS, you have to type every single column in one at a time!!! Even MS access allows you to cut and paste a layout from a spreadsheet into a file layout, but not SSIS. So I had to generate the XML from the layout and paste the XML code into the right place in the package. Ugly way to do it but it saved entire days of work and lots of errors.

(d) Similar to (c), if you need to trim all your columns and you have say 600+ of them, guess what? In the derived column component, you have to type trim(column1) 600+ times! I now do all simple transforms like this in the SQL query to get the data, since that can easily be generated from an Excel sheet.

(e) There are many quirky things, components that turn invisible, sometimes you open the package and all the components are completely re-arranged incoherently.

(f) The FTP feature, possibly one of the most common things you need in ETL, is weak and only supports plain vanilla FTP which nobody uses. Everyone these days uses SFTP, FTPS, https, etc... So almost every implementation requires using a 3rd party commend line driven file transfer app the package has to call.

(g) Trying to CYA, similar to the ridiculous security in Windows Vista, Microsoft has made it exceedingly difficult to actually promote an SSIS package from one environment to another. It defaults to this stupid thing of "encrypting sensitive information with user key" security which means it must run under the same account in the environment you are moving it to as the environment you developed it, something that is rarely the case. There are better ways to configure but it always try to revert to this completely useless security protection.

(h) Lastly most of these problems are now in there 3rd version, clearly indicating Microsoft has no plan to fix them.

(i) Debugging is not nearly as easy as other languages.

SSIS still has a great many benefits, but not without some serious pain.

I started using MS Reporting Services many years ago and just love it. I've not tried Penaho's reporting solution so I can't comment on it. Nor have I tried either Analysis Services or Pentaho's alternative.

Recently I needed an ETL solution and being familiar with MSSQL and MSRS it seemed obvious that I would review and probably choose MS Integration Service. But for me, MSIS was awful. Mostly because it was not intuitive. After spending a couple of days trying to learn the tool I decided to look for an alternative and came across Pentaho Data Integration, formerly known as Kettle. I had it up and running within minutes and immediately created my first transformation. It just works.

Admittedly my needs are fairly simple but performance has been great and the community seems very helpful.

I have used SSIS and Pentaho Kettle, and I would highly recommend using Pentaho Kettle for your ETL tool instead of SSIS.

My reasons: -the flow of SSIS is task to task. Kettle makes you think about rows of data flowing through the system. Kettle's approach seems much more intuitive to me. -SSIS is poorly documented. This happens. But there seems to be a lot of nook-and-cranny clicking and setting of variables. Very complex. Pentaho has a community forum which is quite helpful. -I trust Pentaho to integrate with multiple types of databases, including SQL Server. You can also use JDBC which is nice. Also, I've used it to go between SQL Server and Oracle on one side and Vertica on the other. It has a bulk loader available for it on Vertica. That's quite nice. -I have found it very, very hard relatively speaking to get a SSIS package to run on a server. It just wasn't worth my time. -I found it quite easy for Pentaho to mail a warning or error message to a person or list of people. -Pentaho allows tasks to be done in JavaScript for things that need some logic. Simple and easily done with a language most of us have come across.

I can't offer any input on the MS BI Stack but at the most recent Barcamp Orlando, the folks from Pentaho were there and spoke about their products and it was an extremely impressive demo.

The fact that it's an Open Source project that you can extend yourself as well as a paid package for really good service leaves you with a lot of options. They demonstrated some paid work they did for a client and they definitely wow'd the crowd.

I also had a chance to chat a little bit with a developer working on the data warehousing side of things for Pentaho and he was extremely sharp and was very open to suggestions and had no problems answering any questions.

So as far as a company goes, Pentaho really impressed me with both their work and how friendly and approachable all of their developers were.

a couple of points to add

  • Although there is a window version of all Pentaho tools the setup in windows is onerous. Pentaho (especially the server start and stop which is separate from the GUI tool) is typically used in Linux, not windows shop, and there is steep learning curve going from Windows to Linux.
  • any tool has a learning curve when you shift to it. when you get used to always clicking OK and refreshing metadata when you have problems, SSIS isn't that bad. Pentaho can be flaky, too.

Tool questions need to be addressed in terms of larger cultural questions - what kind of shops use open source tools? in my experience i've found that althsough Microsoft shops seem more rigid, when you have trouble with a connection string in a Microsoft shop you can get help.. in Pentaho and Linux shops its more DYI.

BTW, watch out for Pentaho sales guys doing demos - all the things they show are a lot harder to get working than it seems! :)

If you are looking for a robust, low cost alternative to the big boys LogiXML has dashboarding and ad hoc reporting on a .NET platform. We've been using them since late 2006 when Pentaho was just starting, but I haven't looked at it in awhile.

I recently tried pentaho open source BI. I found it to be extremely clumsy. It was not very intuitive and development time took much longer.

It is quite different from either Oracle or ms BI solutions. Maybe the enterprise edition is better.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top