Machine Learning in PyCVF

In PyCVF, machine learning algorithms are separated in different directories according to the task you want to realize.

Due to historical reasons in the code development, the algorithm here are not considered as nodes at this level, and thus binding with filenames for saving is so on is still to be defined for each category of learning algorithm.

Here as much as possible, we would like to see incremental algorithms and efficient bindings with external libraries multiply on.

Decision

Classification

For the moment the principal classifier implemented is the weka_bridge. Ongoing work is being made for svm_multiclass, Knn classification, and Knn + other classification model.

pycvf.stats.CLS.weka_bridge.StatModel

from weka.core.converters.ConvertUtils import DataSource from weka.core import Instances

data1=DataSource.read(“filename.csv”) #arff xrff classifier=classifer() classifier.buildClassifier(train) for x in range(eval.numInstances()):

alias of WekaModel

pycvf.stats.CLS.svm_multiclass.StatModel
alias of SVMLin_SVMModel
class pycvf.stats.CLS.knn.StatModel(k)

This models is able to predicts the class of a point with respect to the nearest neighbors of that point.

cpu_cost(*args, **kwargs)
dump(file_)
static load(file_, *args, **kwargs)
memory_cost(*args, **kwargs)
predict(A, AN=None, B=None, BN=None, log=False)
Predict what will be the label according to $k$-closest neighbors
random_improve(value, amount=0.5, prec=1)
train(train_data, labels, database)
pycvf.stats.CLS.knn_plus_cls.StatModel
alias of knn_plus_cls

Dimension Reduction

The principal dimension reductions implemented so far are PCA, NMF and bagwords.

Clusterers

The only clusterer linked so far is greedyRSC, project for adding weka, orange clusterers are on the way.

pycvf.stats.CLU.greedyRSC.StatModel
alias of GreedyRSClusterer

Density Estimation

All density estimators provide also a sampling method that allow to the draw samples according to the distribution that has been learn. So far we integrate histogram based density estimators, parzen density estimators,

GMM density estimators.
class pycvf.stats.DE.SPECS.ref.StatModel(*args, **kwargs)

A Statistical Model used for Probability Density Estimation

cpu_cost(*args, **kwargs)
(optional) return a dictionary of integer specifying how costful it is train/query/sample the model, according to its current state.
dump(file_)
Allow to save the model by serializing into some file.
eval(obs, log=False)
return probability density estimate for on the query observation.
if log enabled returns log probability density estimate.
get_as_vector()
return a vector describing the parameters of the models that has been trained.
static load(file_, *args, **kwargs)
Allow to read the model by deserializing it
memory_cost(*args, **kwargs)
Specifies the memory amount used by the model
random_improve(value, amount=0.5, prec=1)
(OPTIONAL) Use the statistical estimate to modify value of amount in a direction that would increase its probability estiate
sample(numsamples=10)
Allows to sample data according to the probability distribution
train(obs, obsN=False, online=False)
Train by taking in accountt some positive and optionally some negative observations If online is equal to True previous observations must remain in memory.
class pycvf.stats.DE.histogram.StatModel(bins, base, delta)
cpu_cost(*args, **kwargs)
dump(file_)
eval(obs, log=False)
get_as_vector()
static load(file_, *args, **kwargs)
manysamples(numsamples)
memory_cost(*args, **kwargs)
onesample()
online_train(*args, **kwargs)
push_histogram(histo)
random_improve(value, amount=0.5, prec=1)
sample(numsamples=10)
subscale(p, scale)
subscale_histo(scale)
train(obs, obsN=False, online=False)
pycvf.stats.DE.parzen.StatModel
alias of ParzenModel

Multiple Instance Learning