我目前正在研究一个问题,用贝叶斯网络对图像进行分类。我已经尝试使用
pomegranate
、
pgmpy
和
bnlearn
。我的数据集包含20多万张图片,我对这些图片进行了一些特征提取算法,得到了一个大小为1026的特征向量。
pgmpy
from pgmpy.models import BayesianModel
from pgmpy.estimators import HillClimbSearch, BicScore, K2Score
est = HillClimbSearch(feature_df, scoring_method=BicScore(feature_df[:20]))
best_model = est.estimate()
edges = best_model.edges()
model = BayesianModel(edges)
from pomegranate import *
model = BayesianNetwork.from_samples(feature_df[:20], algorithm='exact')
library(bnlearn)
df <- read.csv('conv_encoded_images.csv')
df$Age = as.numeric(df$Age)
res <- hc(df)
model <- bn.fit(res,data = df)
The program written in bnlearn in R completes running in couple of minutes, while the pgmpy runs for hours and 石榴 freezes my system after a few minutes. You can see from my code that I'm giving first 20 rows for training in both pgmpy and pomegranate programs, while bnlearn takes the whole dataframe. Since I am doing all my image preprocessing and feature extraction in python, it is difficult for me to switch between R and python for training.
我的数据包含从0到1的连续值。我也试过将数据离散为0和1,但这并没有解决这个问题。
有没有什么办法可以加快这些python包的训练速度,或者我在代码中做错了什么?
提前感谢任何帮助。