Question

I'm trying to write an algorithm to estimate the mass of objects that I know the system of.

My data is of the form of x and y points, so I could either represent these as multiple x and y points, or as a distribution by representing the average and deviation of the x and y points. This would likely depend on the parameters of the algorithm.

I don't need a classifier, I'm looking for a number value estimation.

e.g., x values: {1,2,3,...}, y values: {1,2,3,...} -> mass: 5, or x values:{2 (mean), 1 (std)} y: {2,1} -> 5

I'm pretty new to machine learning, and a classifier doesn't seem like the way to approach this, and regression learning algorithms I've looked up seem to try to estimate parameters, not results.

I'm also planning on doing this in Python, but I don't need a package or something, a general algorithm should put me on the right track.

Edit in response to blubb

My data is given in the form of a set of x points, a set of y points, and a mass. e.g.,

x values   |   y values   | mass
--------------------------------
1 2 3 4    |   1 2 3 4    | 6.7
2 3 4 5    |   2 3 4 5    | 7.9

And I would receive an input, like:

x values   |   y values
-----------------------
5 6 7      |   8 9 10

Another way of resenting it (which may be smarting in terms of a vector space) would be to represent the values by their means and std, so my training data would become:

x mean | x std | y mean | y std | mass
--------------------------------------
2.5    | 1     | 2.5    | 1     | 6.7
3.5    | 1     | 3.5    | 1     | 7.9

These are obviously not the real values, but representative examples. (All values are floats)

Was it helpful?

Solution

You're looking to estimate a function f: R² -> R, therefore regression is the family of methods you should be looking into. Which kind of regression however depends largely on the relation between (x, y) and mass.

Generally described, a regression method defines a cost function c: R² x F -> R+ and a set F of functions to choose from. Often the set F is infinite and parametrized in some form. This leaves most regression methods with the problem of estimating the parameters that determine the optimal f (what you referred to as 'estimating parameters').

In order to determine which regression method is most suitable, you'll have to find out the following things:

  • what is a meaningful cost function c?
  • how to choose the set F of functions?

For example, linear regression chooses the linear least squares cost function and sets the defines F to be the set of all linear functions f: R² x R. This may or may not be what you want, depending on your setup.

Therefore, explaining the experimental setup under which the triplets (x, y, mass) can be determined might help to shed some light on this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top