Machine Learning in PyCVF
In PyCVF, machine learning algorithms are separated in different directories according to the task you want to realize.
Due to historical reasons in the code development, the algorithm here are not considered as nodes at this level, and thus binding
with filenames for saving is so on is still to be defined for each category of learning algorithm.
Here as much as possible, we would like to see incremental algorithms and efficient bindings with external libraries multiply on.
Decision
Classification
For the moment the principal classifier implemented is the weka_bridge. Ongoing work is being made for svm_multiclass, Knn classification, and Knn + other classification model.
-
pycvf.stats.CLS.weka_bridge.StatModel
from weka.core.converters.ConvertUtils import DataSource
from weka.core import Instances
data1=DataSource.read(“filename.csv”) #arff xrff
classifier=classifer()
classifier.buildClassifier(train)
for x in range(eval.numInstances()):
alias of WekaModel
-
pycvf.stats.CLS.svm_multiclass.StatModel
- alias of SVMLin_SVMModel
-
class pycvf.stats.CLS.knn.StatModel(k)
This models is able to predicts the class of a point with respect to the nearest neighbors of that point.
-
cpu_cost(*args, **kwargs)
-
dump(file_)
-
static load(file_, *args, **kwargs)
-
memory_cost(*args, **kwargs)
-
predict(A, AN=None, B=None, BN=None, log=False)
- Predict what will be the label according to $k$-closest neighbors
-
random_improve(value, amount=0.5, prec=1)
-
train(train_data, labels, database)
-
pycvf.stats.CLS.knn_plus_cls.StatModel
- alias of knn_plus_cls
Dimension Reduction
The principal dimension reductions implemented so far are PCA, NMF and bagwords.
Clusterers
The only clusterer linked so far is greedyRSC, project for adding weka, orange clusterers are on the way.
-
pycvf.stats.CLU.greedyRSC.StatModel
- alias of GreedyRSClusterer
Density Estimation
All density estimators provide also a sampling method that allow to the draw samples
according to the distribution that has been learn.
So far we integrate histogram based density estimators, parzen density estimators,
GMM density estimators.
-
class pycvf.stats.DE.SPECS.ref.StatModel(*args, **kwargs)
A Statistical Model used for Probability Density Estimation
-
cpu_cost(*args, **kwargs)
- (optional)
return a dictionary of integer specifying how costful it is train/query/sample
the model, according to its current state.
-
dump(file_)
- Allow to save the model by serializing into some file.
-
eval(obs, log=False)
- return probability density estimate for on the query observation.
- if log enabled returns log probability density estimate.
-
get_as_vector()
- return a vector describing the parameters of the models that has been trained.
-
static load(file_, *args, **kwargs)
- Allow to read the model by deserializing it
-
memory_cost(*args, **kwargs)
- Specifies the memory amount used by the model
-
random_improve(value, amount=0.5, prec=1)
- (OPTIONAL)
Use the statistical estimate to modify value of amount in
a direction that would increase its probability estiate
-
sample(numsamples=10)
- Allows to sample data according to the probability distribution
-
train(obs, obsN=False, online=False)
- Train by taking in accountt some positive and optionally some negative observations
If online is equal to True previous observations must remain in memory.
-
class pycvf.stats.DE.histogram.StatModel(bins, base, delta)
-
cpu_cost(*args, **kwargs)
-
dump(file_)
-
eval(obs, log=False)
-
get_as_vector()
-
static load(file_, *args, **kwargs)
-
manysamples(numsamples)
-
memory_cost(*args, **kwargs)
-
onesample()
-
online_train(*args, **kwargs)
-
push_histogram(histo)
-
random_improve(value, amount=0.5, prec=1)
-
sample(numsamples=10)
-
subscale(p, scale)
-
subscale_histo(scale)
-
train(obs, obsN=False, online=False)
-
pycvf.stats.DE.parzen.StatModel
- alias of ParzenModel
Multiple Instance Learning