To make comparisons of stochastic algorithms first you typically run them multiple times with different random seeds. The output you then obtain is a random variable. You can then assess whether one algorithm is better than another by performing a statistical hypothesis test (ANOVA or Kruskal-Wallis for multiple comparisons, t-test or Mann-Whitney U test for pairwise comparisons) on the resulting samples. If the obtained p-value in these tests is below your desired threshold (typically 0.05, but for more rigorous proofs you would set this lower e.g. 0.01) you would reject the H0 hypothesis that these results are equal e.g. with respect to their means. Thus you assume that the results are unequal and further, that the one with the better average performance is the better algorithm to choose (if you're interested in average performance - "better" usually has many dimensions).
One thing made me wonder in your comments:
If I run the GA algorithm multiple times with the same seed for initial solution, the result will be completely different
I think you have made some error in your code. You need to use the same random object throughout any random decision made inside your algorithm in order to obtain exactly the same result. Somewhere in your code you probably use a new Random()
instead of the one that you obtained initially with the given seed. Another reason could be that you use parallel processing of parts that draw random numbers. You can never guarantee that your threads are always executed in the same order, so one time thread 1 gets the first random number and thread 2 the second one, another time thread 2 executes first and gets the first number.