Question

Here's the scenario, I have to run clustering algorithm over 10000 data points. I have precomputed the distances between the data points and stored them in a file. Since Python is slow in I/O intensive tasks, I am writing this clustering algorithm in C++. The main issue is that the clustering algorithm will run several times and I have to switch between the python code and C++ code. Something like this

Read Distances from text_file (C++)
Run Clustering Algorithm (C++)

Use the result of this algorithm in main python code

Run clustering algorithm again (C++)

Now I don't want to read the distance file again and again, as it already takes around 17 seconds and the file has over 500 million entries. Something like pausing the execution of C++ code and running the code again when needed. So, how could this be achieved??

Was it helpful?

Solution

just an idea:

Can you maybe run the c++ part your program within your main python program. You can do that by looking at the answers in this [Calling an external command in Python. You can use Adapter design pattern to pre-process the output in your c++ program so it becomes compatible with the data structures used in your main python program and vice-versa.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top