“Beginner” distributed processing project
-
22-08-2019 - |
Question
For the longest time I've been interested in building a cluster of heterogeneous nodes in an attempt to have a home supercomputer since I am very interested in doing AI research.
However, the issue is even though I have a myriad of hardware, (2x dual quad rack mount servers, 8 285GTX Gpus, 6x PS3s 2x Hacked 360s (they can run linux) access to tonnes of common PCs as well as a few workstations) I have no large data set that needs to be crunched, or even any software that I can run distributed. I have messed with distributed code compiling but at best its made my kernel builds go from 10 minutes (at worst) to 30 seconds (and I think 20 of those seconds are just setup).
So where should I start? I have a decent understating of Obj-C/C/C++ so it shouldn't be too hard to write something, but what should I write?
Solution
If you want data to crunch, there's plenty out there:
- A range of data mining and knowledge discovery datasets
- a variety of scraped and/or scrapable data sets
- The Comprehensive Knowledge Archive Network list of data packages
- a collection of Large health Datasets
As for "what should I build", the real question is, what interests you?
OTHER TIPS
Well I think it's best to determine which subset of the hardware you have available you'll be developing your application for. Software for the PS3 needs special attention and will require separate development from something built to run on typical linux servers.
You may also need to do some research on how you could develop an application for the 360; I'm not sure if it'd really give you what you're looking for to be honest.
Once you've decided on the subset of hardware you need to develop for it would be good to start with some basic development to ensure you can a foundation put together which enables communication. With a solid foundation you'll be able to expand your code to support a variety of distributed projects.
I hope I'm understanding your question correctly!
Cheers