Question

I have a set of sentences giving conversion ratios, such as

  • 10,000 something for ∫1
  • ∫1 for 10k SMTH
  • 1200 Something for ∫0.1
  • Selling 3000 Smth for 3∫

All of these sentences show ratios of the fictional currency something (SMTH) for the fictional unit of ∫ (INTEGRAL). I need some way of extracting the conversion ratios between these two units. The difficulty is that numbers can be formatted different ways (10,000 or 10000 or 10k), units can be written differently (something, SMTH and different capitalization), the order of units is different ("x SMTH for ∫x" or "∫x for x SMTH"), and sometimes units are written as ∫x or x∫.

TL;DR: Somehow format the above strings into mathematical relationships, but paying attention to many different formats.

I know this is a lot to ask and it is quite complicated. If there is a similar question out already, I would gladly look at it.

What language you ask? Preferably PHP or JS, but pseudo-code is a good start

EDIT:

var val = get sentence,
    integral,
    something;
val = val.replace(",", "").replace("k ", "000 ").replace("m ", "000000 ").replace("million ", "000000 ").replace(" million ", "000000 ").replace(" something", "SMTH").replace(" smth", "SMTH");
words = val.split(" ");
for (var i = 0; i < words.length; i++) {
  if (words[i].indexOf("$")!==-1) {
    integral = words[i].replace("∫" , "");
  } else if (words[i].indexOf("SMTH")!==-1) {
    something = words[i].replace("SMTH" , "");
  }
}

Simplified javascript/Pseudo-code

Was it helpful?

Solution

All examples you have separate the conversion using "for". So there aren't that many combination. What you can do is have a list of words that identify each currency, a regular expression that matches numbers and then you'll have a left side and a right side separated by "for". To process each phrase you would execute the following pseudo code:

for each word:
    if it's a known currency identifier
        Store what is the currency
    else if it's a number
        Store the value
    else if it's the "for" word
        Change side
    end if
end for

After you finish this loop you'll have a data structure with what currency you have on each side and what amount.

OTHER TIPS

I took a stab at implementing something along these lines. As others have mentioned, there is a clear pattern in the [currency] for [currency] which you can easily match. Take a look below, it's fairly well documented.

/**
 * Parse an amount with currency "[symbol (optional)][amount][postfix (optional)] [currency (optional)]"
 * @param  {String} str Currency string e.g. "$100k dollars", "$100million", "100billion euro"
 * @return {Array}      See below
 */
function parseCurrency(str) {
    var match = /([^0-9\.]+)?([0-9\.]+)(\w+)?(?:\s+(\w+))?/.exec(str);

    if(!match) throw new Error("Bad currency input: " + str);

    var symbol = match[1], // €, $, £
        amount = match[2], // 100, 200
        factor = match[3], // k, million i.e. 100k, 100million
        unit = match[4] // euro, pound

    return [symbol, amount, factor, unit];
}

/**
 * Takes in a rate in the form of "[currency] for [currency]"
 * @param  {String} str "[currency] for [currency]"
 * @return {Float}     Rate float
 */
function parseRate(str) {
    // Split and parse the currencies
    var currencies = str.split("for").map(function(amount) {
        return parseCurrency(amount.trim());
    });

    // Calculate the rate
    // put the "for [currency]" over the "[currency] for"
    var base = expandPostfix(currencies[0][1], currencies[0][2]),
        exchangeTo = expandPostfix(currencies[1][1], currencies[1][2]);

    return base / exchangeTo;
}

/**
 * Expand a number postfix
 * @param  {Number} num     
 * @param  {String} postfix Postfix such as "k", "m", "billion"
 * @return {Number}         Expanded number
 */
function expandPostfix(num, postfix) {
    return num * (({
        k : 1000,
        m: 1000000,
        million: 1000000
    })[postfix] || 1);
}

parseRate("1 euro for 3 pound"); // 0.333
parseRate("10000 something for ∫1"); // 10000
parseRate("1200 Something for ∫0.1"); // 12000
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top