5.3 自动特征生成与选择
下面将使用tsfresh包演示如何进行自动特征生成和特征选择
下一步需要注意,由于国内网络的限制,直接运行时会导致连接失败,此时有两个办法 1)在该地址 https://github.com/MaxBenChrist/robot-failure-dataset 手动下载lp1.data 2)在网站https://www.ipaddress.com 输入https://raw.githubusercontent.com 的真实ip,然后在C:\Windows\System32\drivers\etc下的hosts文件中添加类似这样的几行185.199.108.133 raw.githubusercontent.com
T_x__variance_larger_than_standard_deviation | T_x__has_duplicate_max | T_x__has_duplicate_min | T_x__has_duplicate | T_x__sum_values | T_x__abs_energy | T_x__mean_abs_change | T_x__mean_change | T_x__mean_second_derivative_central | T_x__median | ... | F_z__permutation_entropy__dimension_5__tau_1 | F_z__permutation_entropy__dimension_6__tau_1 | F_z__permutation_entropy__dimension_7__tau_1 | F_z__query_similarity_count__query_None__threshold_0.0 | F_z__matrix_profile__feature_"min"__threshold_0.98 | F_z__matrix_profile__feature_"max"__threshold_0.98 | F_z__matrix_profile__feature_"mean"__threshold_0.98 | F_z__matrix_profile__feature_"median"__threshold_0.98 | F_z__matrix_profile__feature_"25"__threshold_0.98 | F_z__matrix_profile__feature_"75"__threshold_0.98 | |
1 | 0.0 | 1.0 | 1.0 | 1.0 | -43.0 | 125.0 | 0.214286 | 0.071429 | 0.038462 | -3.0 | ... | 1.972247 | 2.163956 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1.0 | 1.0 | 1.0 | 1.0 | -53.0 | 363.0 | 3.785714 | -0.071429 | 0.153846 | -3.0 | ... | 2.397895 | 2.302585 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1.0 | 0.0 | 1.0 | 1.0 | -60.0 | 344.0 | 3.214286 | 0.071429 | -0.076923 | -5.0 | ... | 2.397895 | 2.302585 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1.0 | 1.0 | 0.0 | 1.0 | -93.0 | 763.0 | 3.714286 | -0.428571 | -0.192308 | -6.0 | ... | 2.271869 | 2.302585 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 | 1.0 | 0.0 | 0.0 | 1.0 | -105.0 | 849.0 | 4.071429 | -0.357143 | 0.000000 | -8.0 | ... | 2.271869 | 2.302585 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
84 | 1.0 | 0.0 | 0.0 | 1.0 | 5083.0 | 1825597.0 | 18.857143 | 15.285714 | -0.538462 | 394.0 | ... | 1.366711 | 1.609438 | 1.831020 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
85 | 1.0 | 0.0 | 0.0 | 1.0 | -511.0 | 18023.0 | 2.785714 | -1.214286 | 0.192308 | -33.0 | ... | 1.972247 | 2.163956 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
86 | 1.0 | 0.0 | 0.0 | 1.0 | -987.0 | 67981.0 | 3.928571 | -3.500000 | -0.153846 | -65.0 | ... | 0.600166 | 0.639032 | 0.683739 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
87 | 1.0 | 0.0 | 0.0 | 1.0 | -1921.0 | 247081.0 | 6.642857 | -0.357143 | 0.461538 | -126.0 | ... | 1.366711 | 1.609438 | 1.831020 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
88 | 1.0 | 1.0 | 0.0 | 1.0 | -304.0 | 6408.0 | 2.428571 | -0.714286 | 0.230769 | -21.0 | ... | 2.397895 | 2.302585 | 2.197225 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
F_x__length | F_x__large_standard_deviation__r_0.05 | F_x__large_standard_deviation__r_0.1 | F_y__length | F_y__large_standard_deviation__r_0.05 | F_y__large_standard_deviation__r_0.1 | F_z__length | F_z__large_standard_deviation__r_0.05 | F_z__large_standard_deviation__r_0.1 | T_x__length | T_x__large_standard_deviation__r_0.05 | T_x__large_standard_deviation__r_0.1 | T_y__length | T_y__large_standard_deviation__r_0.05 | T_y__large_standard_deviation__r_0.1 | T_z__length | T_z__large_standard_deviation__r_0.05 | T_z__large_standard_deviation__r_0.1 | |
1 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 0.0 | 0.0 |
2 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
3 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
4 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
5 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
84 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
85 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
86 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
87 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
88 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 |
官方文档的这个流程展示了用fresh算法进行特征选择的思路,简单来说,它是通过比较不同时间序列类别下特征的显著性差异来确定是否要挑选出这个特征。
除此之外,还有其他用于特征选择的方法,如recursive feature elimination (RFE),不过tsfresh包并没有内置这种方法,可以结合sklearn中的RFE方法自行组合使用。
Last updated