Hive Integrazione con Mahout per la raccomandazione

https://stackoverflow.com//questions/22050460

21-12-2019
|

Domanda

Voglio usare Mahout con l'alveare, riceverò i dati dall'alveare e userò il modello di dati un po 'di cose come popolare i dati e utilizzare Mahout per la raccomandazione.È possibile.Perché ho visto Mahout funziona solo per i file.1) Come caricare i dati su Mahout usando Hive Table?2) C'è qualche altro modo in cui posso usare la raccomandazione Mahout con Hive o altri?

Qui sto avendo il risultato dell'hive JDBC, voglio popolare a DataModel in Mahout.Come popolare?

Voglio usare il risultato del database anziché la lettura dal file per la raccomandazione Mahout. Ad esempio:

Hive:

    import java.sql.SQLException;
    import java.sql.Connection;
    import java.sql.ResultSet;
    import java.sql.Statement;
    import java.sql.DriverManager;

    public class HiveJdbcClient {
      private static String driverName = "org.apache.hive.jdbc.HiveDriver";

      /**
       * @param args
       * @throws SQLException
       */
      public static void main(String[] args) throws SQLException {
          try {
          Class.forName(driverName);
        } catch (ClassNotFoundException e) {
          // TODO Auto-generated catch block
          e.printStackTrace();
          System.exit(1);
        }
        //replace "hive" here with the name of the user the queries should run as
        Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "hive", "");
        Statement stmt = con.createStatement();
        String tableName = "testHiveDriverTable";
        stmt.execute("drop table if exists " + tableName);
        stmt.execute("create table " + tableName + " (key int, value string)");
        // show tables
        String sql = "show tables '" + tableName + "'";
        System.out.println("Running: " + sql);
        ResultSet res = stmt.executeQuery(sql);
        if (res.next()) {
          System.out.println(res.getString(1));
        }
           // describe table
        sql = "describe " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
          System.out.println(res.getString(1) + "\t" + res.getString(2));
        }

        // load data into table
        // NOTE: filepath has to be local to the hive server
        // NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line
        String filepath = "/tmp/a.txt";
        sql = "load data local inpath '" + filepath + "' into table " + tableName;
        System.out.println("Running: " + sql);
        stmt.execute(sql);

        // select * query
        sql = "select * from " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
          System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
        }

        // regular hive query
        sql = "select count(1) from " + tableName;
        System.out.println("Running: " + sql);
        res = stmt.executeQuery(sql);
        while (res.next()) {
          System.out.println(res.getString(1));
        }
      }
    }

mahout:

// Create a data source from the CSV file
File userPreferencesFile = new File("data/dataset1.csv");
DataModel dataModel = new FileDataModel(userPreferencesFile);

UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(2, userSimilarity, dataModel);

// Create a generic user based recommender with the dataModel, the userNeighborhood and the userSimilarity
Recommender genericRecommender =  new GenericUserBasedRecommender(dataModel, userNeighborhood, userSimilarity);

// Recommend 5 items for each user
for (LongPrimitiveIterator iterator = dataModel.getUserIDs(); iterator.hasNext();)
{
    long userId = iterator.nextLong();

    // Generate a list of 5 recommendations for the user
    List<RecommendedItem> itemRecommendations = genericRecommender.recommend(userId, 5);

    System.out.format("User Id: %d%n", userId);

    if (itemRecommendations.isEmpty())
    {`enter code here
        System.out.println("No recommendations for this user.");
    }
    else
    {
        // Display the list of recommendations
        for (RecommendedItem recommendedItem : itemRecommendations)
        {
            System.out.format("Recommened Item Id %d. Strength of the preference: %f%n", recommendedItem.getItemID(), recommendedItem.getValue());
        }
    }
 }

Soluzione

Mahout versione 0.9 Fornisce il modello di dati (dati di origine) per i database di reclamo JDBC come MySQL / Oracle / Postgress ecc., Database NOSQL come MonGoDB / HBase / Cassandra e file system basati come indicato.

A partire da questa versione, alveare non è il 100% di database standard SQL, il modello di dati mysqljdbcdatamodel e sql92jdbcdatamodel non è appropriato da utilizzare per le tabelle di alluvione come SQL La sintassi è molto diversa nei database del reclamo JDBC.

Per la tua prima domanda, potresti voler estendere il AbstractJDBCDataModel e sovrascrivere il costruttore per passare nella manopola DATASource e per l'alveare specifiche query SQL specifiche per Preferenza, tempo di preferenza, utente, tutti gli utenti ecc. Simili a quello menzionato nel costruttore AbstractJDBCDataModel.

Per la tua seconda domanda, il metodo sopra riportato è buono se si utilizza l'algoritmo non distribuito (algoritmi del gusto). Se viene utilizzato l'algoritmo distribuito, il Mahout può funzionare su Hadoop Sourcing dei file HDFS che sono supportati dal tavolo dell'alveare. Si prega di vedere Qui sulla corsa mahout su hadoop

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow