Features vs attributes, classes vs labels

Submitted by Xilodyne on Sun, 01/15/2017 - 10:35

Recently reviewing my Naïve Bayes java routine that I wrote last summer I realized that I had mix/matched/confused a number of data and method definitions involving attributes, features, labels, classes, training and prediction.  Basing my routine on the description given in Wikipedia, which describes features associated to classes, while at the same time trying to translate the python sklearn into Java, which uses features and labels, led to the mess.  Si

PKL to ARFF

Submitted by Xilodyne on Sat, 12/10/2016 - 12:16

Java source code for converting PKL files to ARFF are at the bottom of this blog post.  The process is:  convert PKL to text file format to match the Weka TextDirectoryLoader structure using the Jython pickle API, run the Weka TextDirectoryLoader routine, then write out to ARFF.

Validating cross-platform results with Weka - a beginning

Submitted by Xilodyne on Sun, 11/06/2016 - 11:42

Having recently started work on the Udacity MiniProject #1 from the Intro to Machine Learning course, what again started as a simple verification that all the python code and libraries worked ended being an interesting dive into handling text data and validating results.  The MiniProject uses a subset of the Enron email corpus to determine the email author id accuracy.  (The Enron corpu

Java implementation of the Udacity Intro to Machine Learning - Gaussian NB Terrain Data

Submitted by Xilodyne on Mon, 10/17/2016 - 10:38

Having previously completed the Naïve Bayes and Gaussian Naïve Bayes implementations in Java, it was just a matter of figuring out how to match the Udacity Intro to Machine Learning python logic and chart the data.  A good learning experience and the Java results are similar.  The Java code is here.  

Gaussian Naive Bayes

Submitted by Xilodyne on Sun, 09/18/2016 - 11:02

Confronted with implementing a Gaussian Naïve Bayes I first needed to understand (and implement) the classification and prediction of a Naïve Bayes.  I found that most of the machine learning frameworks, while implementing some form of the algorithm, never explained why they made some decisions in the coding, nor obvious ways of testing that classification / prediction is consistent with the formula.  I ended up writing code to implement the Male/Female Drew examples as explained by professor Eamonn Keogh at UC Riverside,

Java machine learning in a python world

Submitted by Xilodyne on Sun, 07/17/2016 - 09:45

Having dived into my first Udacity machine learning introductory course in May 2016, I was suddenly confronted with a complete Python machine learning ecosystem.  It is difficult enough overcoming Python's propensity for not defining anything beforehand.  It means digging through a ton of documentation, library code, or testing to figure out the structure of returned variables.  But on top of that are the numpy, sklearn, mathplotlib and pylab python libraries that the Udacity courses are leveraging, which also hav