Machine Learning on Heroku with PredictionIO
Last week at the TrailheaDX Salesforce Dev Conference we launched the DreamHouse sample application to showcase the Salesforce App Cloud and numerous possible integrations. I built an integration with the open source PredictionIO Machine Learning framework. The use case for ML in DreamHouse is a real estate recommendation engine that learns based on users with similar favorites. Check out a demo and get the source.
For the DreamHouse PredictionIO integration to work I needed to get the PredictionIO service running on Heroku. Since it is a Scala app everything worked great! Here are the steps to get PredictionIO up and running on Heroku.
First you will need a PredictionIO event server and app defined in the event server:
-
Create an app:
heroku run -a <APP NAME> console app new <A PIO APP NAME>
-
List apps:
heroku run -a <APP NAME> console app list
-
Check out the source and local dev instructions for the event server.
Now that you have an event server and app, load some event data:
export ACCESS_KEY=<YOUR ACCESS KEY>
export URL=http://<YOUR HEROKU APP NAME>.herokuapp.com
for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u$i\" }"; done
for i in {1..50}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i$i\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"; done
for j in {1..20}; do for i in {1..5}; do curl -i -X POST $URL/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u$i\", \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i$(( ( RANDOM % 50 ) + 1 ))\" }"; done; done
Check out the demo data:
http://<YOUR HEROKU APP NAME>.herokuapp.com/events.json?accessKey=<YOUR APP ACCESS KEY>&limit=-1
Now you need an engine that will learn from a set of training data and then be able to make predictions. With PredictionIO you can use any algorithm you want but often SparkML is a great choice. For this simple example I’m just using single-node Spark and Postgres but the underlying data source and ML engine can be anything.
This example is based on PredictionIO’s Recommendation Template so it uses SparkML’s Alternating Least Squares (ALS) algorithm. To deploy it on Heroku follow these steps:
-
Attach your PredictionIO Event Server’s Postgres:
heroku addons:attach <YOUR-ADDON-ID> -a <YOUR HEROKU APP NAME>
Note: You can find out
<YOUR-ADDON-ID>
by running:heroku addons -a <YOUR EVENT SERVER HEROKU APP NAME>
-
Train the app:
heroku run -a <YOUR HEROKU APP NAME> train
-
Restart the app to load the new training data:
heroku restart -a <YOUR HEROKU APP NAME>
-
Check the status of your engine:
http://<YOUR HEROKU APP NAME>.herokuapp.com
Now you can check out the recommendations for an item (must be an item that has events):
curl -H "Content-Type: application/json" -d '{ "items": ["i11"], "num": 4 }' -k http://<YOUR HEROKU APP NAME>.herokuapp.com/queries.json
Check out the source and local dev instructions for this example engine.
Let me know if you have any questions or problems. Happy ML’ing!