Module bytehub.core
Classes
class CoreFeatureStore (connection_string='sqlite:///bytehub.db', connect_args={})
-
Core Feature Store
Connects directly to a SQLAlchemy-compatible database.
When using specifying features for
create_feature
,update_feature
, etc., use either:namespace
andname
as arguments; or- specify
name
in the format"my-namespace/my-feature"
.
Args
connection_string
:str
- SQLAlchemy connection string for database containing feature store metadata - defaults to local sqlite file.
connect_args
:dict
, optional- dictionary of connection arguments to pass to SQLAlchemy.
Ancestors
- bytehub._base.BaseFeatureStore
- abc.ABC
Methods
def clean_namespace(self, name)
-
Removes any data that is not associated with features in the namespace. Run this to free up disk space after deleting features
Args
name
:str
- namespace to clean
def clone_feature(self, name, namespace=None, **kwargs)
-
Create a new feature by cloning an existing one.
Args
name
:str
- name of the feature.
namespace
:str
, optional- namespace which should hold this feature.
from_name
:str
- the name of the existing feature to copy from.
from_namespace
:str
, optional- namespace of the existing feature.
def create_feature(self, name, namespace=None, **kwargs)
-
Create a new feature in the feature store.
Args
name
:str
- name of the feature
namespace
:str
, optional- namespace which should hold this feature.
description
:str
, optional- description for this namespace.
partition
:str
, optional- partitioning of stored timeseries (default:
"date"
). serialized
:bool
, optional- if
True
, converts values to JSON strings before saving, which can help in situations where the format/schema of the data changes over time. transform
:str
, optional- pickled function code for feature transforms.
meta
:dict
, optional- key/value pairs of metadata.
def create_namespace(self, name, **kwargs)
-
Create a new namespace in the feature store.
Args
name
:str
- name of the namespace.
description
:str
, optional- description for this namespace.
url
:str
- url of data store.
storage_options
:dict
, optional- storage options, e.g. access credentials.
backend
:str
, optional- storage backend, see bytehub._storage.available_backends, defaults to
"pandas"
. meta
:dict
, optional- key/value pairs of metadata.
def delete_feature(self, name, namespace=None, delete_data=False)
-
Delete a feature from the feature store.
Args
name
:str
- name of feature to delete.
namespace
:str
, optional- namespace, if not included in feature name.
delete_data
:bool
, optional- if set to
True
will delete underlying stored data for this feature, otherwise default behaviour is to delete the feature store metadata but leave the stored timeseries values intact.
def delete_namespace(self, name)
-
Delete a namespace from the feature store.
Args
name
- namespace to be deleted.
def last(self, features)
-
Fetch the last value of one or more features.
Args
features
:Union[str, list, pd.DataFrame]
- feature or features to fetch.
Returns
dict
- dictionary of name, last value pairs.
def list_features(self, **kwargs)
-
List features in the feature store.
Search by namespace, name and/or regex query
Args
name
:str
, optional- name of feature to filter by.
namespace
:str
, optional- namespace to filter by.
regex
:str
, optional- regex filter on name.
friendly
:bool
, optional- simplify output for user.
Returns
pd.DataFrame
- DataFrame of features and metadata.
def list_namespaces(self, **kwargs)
-
List namespaces in the feature store.
Search by name or regex query.
Args
name
:str
, optional- name of namespace to filter by.
namespace
:str
, optional- same as name.
regex
:str
, optional- regex filter on name.
Returns
pd.DataFrame
- DataFrame of namespaces and metadata.
def load_dataframe(self, features, from_date=None, to_date=None, freq=None, time_travel=None)
-
Load a DataFrame of feature values from the feature store.
Args
features
:Union[str, list, pd.DataFrame]
- name of feature to load, or list/DataFrame of feature namespaces/name.
from_date
:datetime
, optional- start date to load timeseries from, defaults to everything.
to_date
:datetime
, optional- end date to load timeseries to, defaults to everything.
freq
:str
, optional- frequency interval at which feature values should be sampled.
time_travel
:str
, optional- timedelta string, indicating that time-travel should be applied to the returned timeseries values, useful in forecasting applications.
Returns
Union[pd.DataFrame, dask.DataFrame]
- depending on which backend was specified in the feature store.
def save_dataframe(self, df, name=None, namespace=None)
-
Save a DataFrame of feature values to the feature store.
Args
df
:pd.DataFrame
- DataFrame of feature values.
Must have a
time
column or DateTimeIndex of time values. Optionally include acreated_time
column (defaults toutcnow()
if omitted). For a single feature: avalue
column, or column header of featurenamespace/name
. For multiple features name the columns usingnamespace/name
. name
:str
, optional- name of feature, if not included in DataFrame column name.
namespace
:str
, optional- namespace, if not included in DataFrame column name.
def transform(self, name, namespace=None, from_features=[])
-
Decorator for creating/updating virtual (transformed) features. Use this on a function that accepts a dataframe input and returns an output dataframe of tranformed values.
Args
name
:str
- feature to update.
namespace
:str
, optional- namespace, if not included in feature name.
from_features
:list
- list of features which should be transformed by this one
def update_feature(self, name, namespace=None, **kwargs)
-
Update a feature in the feature store.
Args
name
:str
- feature to update.
namespace
:str
, optional- namespace, if not included in feature name.
description
:str
, optional- updated description.
transform
:str
, optional- pickled function code for feature transforms.
meta
:dict
, optional- updated key/value pairs of metadata.
To remove metadata, update using
{"key_to_remove": None}
.
def update_namespace(self, name, **kwargs)
-
Update a namespace in the feature store.
Args
name
:str
- namespace to update.
description
:str
, optional- updated description.
storage_options
:dict
, optional- updated storage options.
meta
:dict
, optional- updated key/value pairs of metadata.
To remove metadata, update using
{"key_to_remove": None}
.