A closer look at Apache Mahout (JAVA Recommendation library)

There’s been a lot of changes to the Mahout library since it was first introduced through the Apache software foundation back in 2009.
I first looked at this project through the excellent tutorial on classifying and recommending Seinfeld episodes (perhaps not the easiest task of differentiating episodes, keeping in mind it was a show that prided itself on being “about nothing”). This really showed the strength of the suite of libraries and algorithms, which can be ranked and compared for performance and relevance more easily than ever in its current release.
Unfortunately, the folks involved in that project got a takedown notice to no longer provide the Seinfeld episodes’ full scripts that were being used to classify the episodes.
Looking for another alternative interesting dataset, I’ve looked through the following commonly used public data sources:
- Million Song Dataset: http://millionsongdataset.com/
- MusicBrainz: https://musicbrainz.org/
- MovieLens: https://movielens.org/
- Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia
- TVDB: https://www.thetvdb.com/api-information
- IMDB: https://imdb-api.com/
Seems like I’ve settled on just “” for now, but will likely look at unique ways to combine all of the above at a later date.
The scope of the problem will start out fairly modest; given a list of 10 users with very distinct tastes, can we recommend relevant new music to them (either by specific song, or, artist) that they’re likely to enjoy. It’s not like this is a new problem area, but I do feel there’s not been much innovation in this space for a while since the early days of WebJay, RACOFI & MyStrands to the “middle ages” of Songza, LastFM & Pandora and now into the modern era of Apple’s iTunes/Music offerings along with Spotify & YouTube pretty much having a three-party oligopoly on both ad-supported streaming and paid digital music (whether pay-to-download/own or subscription-to-stream).
The premise of this effort will be, given minimal inputs about a user, try to chart out
Taking over a large-scale Adobe Experience Manager (AEM) project


This week, we traveled to Toronto in order to negotiate with our primary Content Management System (CMS) vendor, Razorfish. We all know what the Agile Manifesto says about “Customer collaboration over contract negotiation“, so don’t get me wrong, I’ve never been a huge proponent of contract negotiations, but even I’ll admit that sometimes it does seem like a bit of a “necessary evil”. Particularly when certain vendors aren’t necessarily will to “play ball” or “come to the table” to collaborate and work to find mutually beneficial solutions (won’t name names right now, but HINT, its definitely not Razorfish!)
ALC initially started working with Razorfish back in 2015 (when they were still called Nurun before rival larger digital interactive agency Razorfish acquired them) on the “Corporate CMS re-design project”, which aimed to upgrade and migrate the entire “corporate.playsphere.ca” sub-domain’s content over to a modern enterprise CMS, namely, Adobe Experience Manager (AEM) which ALC has chosen as its corporate CMS. Of course in November, 2016 even larger rival firm Sapient negotiated an agreement with Razorfish to merge, creating a somewhat “super digital interactive agency”. (UPDATE 2020-02-18: and perhaps at the borderline “disturbing” scale, Publicis then acquired the Sapient-Razorfish — SRF for short — conglomerate, creating a frankenstein’s monster digital interactive agency with few remaining significant rivals for the big contracts, this entity now called “Publicis-Sapient-Razorfish”).
Since then, quite a bit has changed, as there were a number of “pillars” in the Darwin programme (collection of projects), which made it the single largest undertaking in ALC’s history according to most in-the-know who I’ve spoken to, coming in at a whopping ~$35 million total estimated cost. Whereas AEM was initially selected to replace only that “Corporate” part of PlaySphere, it has since been selected to replace the entire PlaySphere system (particularly the front-end portions), and provide a number of vendor integrations. This is because only the “Corporate takeover” portion of the Darwin programme was actually on track.
The Darwin portfolio of projects rather ambitiously aims to simultaneously rejuvenate and completely replace both our legacy PlayShere system and a very large number of its vendor integrations, alongside our Retail systems, part of our Call Center technologies, and, a number of other supporting systems that are expected to get small updates. Of particular contention are those myriad of 3rd party APIs we need to support and integrate, each provided by vendor partners, and needing to be consolidated in a number of ways. The aim is to reduce the total number of vendor touchpoints (vendors needed to be contracted with), Software-as-a-Service (SaaS) providers, and, other similar agencies/consultancies we need to work with and/or get support from.
As I shared with my team, I’m happy to report it was a very productive jam-packed 2-day trip. Most importantly, they’ve tentatively agreed to move their code repository for the Darwin project’s new AEM-based ALC.ca replacement for our prior PlaySphere system from their internal Stash (BitBucket Server) instance running within their network, and where all our codebase currently lives, over to our own BitBucket Cloud instance. They will also move away from using their JIRA Server instance as their “source of truth” for all tickets and issues, to instead using our JIRA Cloud instance which we have been using for over a year now. We’re aiming at mid-to-late May at the latest to get these cut-overs done, it will be alot of work to first test out “mirroring the repo” between instances and exporting then importing all the JIRA issues. All agreed though, to pull this off at least one additional trip would be recommended, to bring more of the team up to Toronto next time in order to “observe their day-to-day Agile methodologies” and which pieces of that we may want to bring into our new team as it grows. Agile will be a new thing at ALC in general, so I want to be really certain we “get it right” (and yes, I realize there is no such perfect combination right out of the gates, rather we need to just start somewhere and regularly evolve/tweak it as we go). However, towards still trying to have some kind of plan together, I always love referencing this meme:

After what I’ve seen so far, and my past experience at other companies, we will likely end up adopting some kind of hybrid of Scrum and Kanban. I am hearing that “Scrumban” term more and more, so we’ll see how that goes. Scrum seems like a great fit for project implementations, while Kanban seems like the no-brainer choice for all our enhancements, bug fixes, and “keep-the-lights-on” (KTLO) types of development activities.
With our current plan, its looking like we’ll finally launch one full year behind schedule in September, 2017 some time (that’s without true Agile so far, more like an “incremental Waterfall” approach so far, mostly due to vendor limitations and nothing spelled out in our contracts about how we want to and/or expect our partners to work). I’ve been told the September date is not negotiable and can’t slip no matter what, but also the famous like “September is a long month” (an inside joke reflecting the FUD). Will do what I can to prevent such a large set of initiatives and projects to ever need to be cobbled together again, and instead hopefully we can just do a great job maintaining this new AEM platform, so that all we need are little feature delivery sprints and minor projects.
Leading up to September, 2017 we will be collaborating heavily with the SapientRazorfish team, bolstering our current team of 5 with their 15+ active Developers (although with a tapering down towards eventually only having a few of them remain for at least a year in a support role during the “warranty period” as we call it, post go-live). The plan from our launch date onwards will be that our team slowly but surely ramps up to full capacity to be able to support the web application and Mobile App webview integrations that have been done within AEM totally by ourselves, and, to continue to build on that with various other business project, enhancements, internally drive innovations, etc. It will be an interesting challenge, and we’ll see how it goes.
UPDATE (2017-09-17): We finally launched the darn thing, and it wasn’t even the last possible day of the month as many expected! What a whirlwind the past nearly two and a half years have been (feels like I’ve done about 3-4 years worth of work myself, and I’m certain that if you add up all the person hours on this project including OT and “extra efforts” that went into getting this beast across the finish line, you’d come to like 100+ years of life force spent). But I can finally show off the new look of the webapp:

Example authoring, to choose which Components are allowed within a given Static Template in AEM:

Feeling lucky? Give it a try yourself now, at https://www.alc.ca
Integrate PHP based SOAP RPS game server in another language, JAVA desktop GUI
Now that we have a Web Service ready to consume (even though it is SOAP based), it should be pretty easy to extend to Java, which also means it should be possible with a little bit of effort to create a Desktop GUI.
For more on creating a Java-based SOAP server to have an all-Java version of this solution, see:
https://netbeans.org/kb/docs/websvc/jax-ws.html
See the other parts here:
- Part 1: Creating a basic Rock Paper Scissors game in PHP
- Part 2: Moving a command-line PHP Rock Paper Scissors game to the Browser
- Part 3: Exposing your PHP Rock Paper Scissors game via SOAP Web Services
- Part 4: Integrate PHP based SOAP RPS game server in another language, JAVA desktop GUI <– you are here
Exposing your PHP Rock Paper Scissors game via SOAP Web Services
So why SOAP? Well quite simply this was what I used back when I was first working through PHP and learning the hard way how to expose some server-side “business logic” as a Web Service. This was over a decade ago (late 2006 through 2007), and Google’s SOAP Search API had yet to be phased out, and was still one of the biggest reference implementations of SOA principles. In all truth I’d almost never use SOAP for creating Web Services from scratch anymore, but it does have a few minor advantages in niche cases thanks to ws-security, SAML integration, well-formed response guarantee via schema validation, contract-first development for more stability amongst separate inter-dependent departments/organizations, etc.
I decided to dust this example off simply to have a historical reference of traditional SOA while moving a number of legacy APIs I support to RESTful architecture (since REST is the clear winner and most APIs I need to work with today are REST based anyway).
So this example will serve as a useful tool to go back and review every now and then, especially if your consulting ever takes you to an enterprise gig that still uses legacy SOA technologies (and yes despite REST taking over 6 years ago followed shortly by JSON replacing XML, there are still a decent amount of SOAP-based WS deployments out there at big companies, the kind of big enterprises with big budgets at stake and/or political reasons to keep their legacy technology stacks running). REST/JSON may have won the internet thanks to its simplicity, but there’s something to be said about what the various organizations that came up with SOAP and the ws-* stack had in mind (aside from complexity and lucrative implementation contracts/tooling sales), namely robustness and predictability. Take Netflix and YouTube for example, two of the most frequently called APIs on the web, both “RESTful-ish” but each taking their own liberties with Fielding’s original REST thesis in unique ways, particularly around Auth mechanisms and Usage Policies required to work with the data, DRM, Advertising, somewhat creative usage-restricting API metering, and/or pay-per-use schemes that come into play as soon as you want to do anything serious with the data, and both have suffered their developer communities significant amounts of non-backwards-compatible disruptive versioning, changes and feature deprecation.
Endpoint
The following enpoint is where you can make requests to initiate and get responses from specific operations being exposed by the SOAP Web Service:
http://bcmoney-mobiletv.com/widgets/games/rps/soap
In our case just a ServerTime check, GameScore check, GamePlay to initiate a game (but lots of other operations could potentially be added to this such as Leaderboard tracking by username or region, Multiplayer game listings to show available competitors, etc).
Web Service Description Language (WSDL)
This is the “contract” part of SOAP that tells SOAP clients how to interact with our Web Service. In general, you can reach the WSDL of a SOAP-based Web Service:
http://bcmoney-mobiletv.com/widgets/games/rps/soap?wsdl
A useful tool for validating your WSDL is the W3C WSDL Validator service.
XML Schema (XSD)
I’m a big fan of keeping the same basic format between requests and responses rather than many Web Services out there with vastly different request and response formats. This just creates unnecessary work writing (or annotating/generating) distinct XML parsers.
Request format:
<rps> <game id="1234"> <player id="1234#p1"> <choice>PAPER</choice> </player> <player id="1234#p2"> <choice="SCISSORS</choice> </player> </game> </rps>
Response format:
<rps> <game id="1234"> <player id="1234#p1" outcome="WON"> <wins>5</wins> <losses>4</losses> <draws>2</draws> </player> <player id="1234#p2" outcome="LOST"> <wins>4</wins> <losses>5</losses> <draws>2</draws> </player> </game> </rps>
Notice how only the data underneath the player element changes between request and response, and the two formats are virtually identical.
For more easily visualizing WSDL’s available operations and the data formats within each operation’s response check out the XML Grid – XSD/WSDL Viewer service.
Check some of the classic SOAP API examples that are still around today including Amazon’s Product Adveristing API (WSDL) which I’ve previously used as a product data source in my XmasListz Facebook App or the WebServiceX Currency Exchange API (WSDL) which was used in my post on how to work with SOAP in JavaScript & jQuery.
See the other parts here:
- Part 1: Creating a basic Rock Paper Scissors game in PHP
- Part 2: Moving a command-line PHP Rock Paper Scissors game to the Browser
- Part 3: Exposing your PHP Rock Paper Scissors game via SOAP Web Services <– you are here
- Part 4: Integrate PHP based SOAP RPS game server in another language, JAVA desktop GUI
Raspberry PI – Alexa PI experiment

Turn your Raspberry PI into a fully functioning Alexa (either by literally calling Amazon’s Alexa APIs, or, calling a variety of services in specialized areas as a stand-in).
BC$ = Behavior, Content, Money

The goal of the BC$ project is to raise awareness and make changes with respect to the three pillars of information freedom - Behavior (pursuit of interests and passions), Content (sharing/exchanging ideas in various formats), Money (fairness and accessibility) - bringing to light the fact that:
1. We regularly hand over our browser histories, search histories and daily online activities to companies that want our money, or, to benefit from our use of their services with lucrative ad deals or sales of personal information.
2. We create and/or consume interesting content on their services, but we aren't adequately rewarded for our creative efforts or loyalty.
3. We pay money to be connected online (and possibly also over mobile), yet we lose both time and money by allowing companies to market to us with unsolicited advertisements, irrelevant product offers and unfairly structured service pricing plans.