|
|
侠义非凡的葫芦 · 【歌單】花兒與少年第二季BGM-Charme ...· 6 月前 · |
|
|
腹黑的跑步机 · 张昌伟教授专栏丨一例Tubridge治疗BB ...· 1 年前 · |
|
|
慷慨的橙子 · 【简单】Kubernetes ...· 1 年前 · |
|
|
腼腆的菠菜 · python——paramiko详解_pyt ...· 1 年前 · |
git
repository
conf/
DataCatalog
Tutorial
kedro
viz
--autoreload
plotly.PlotlyDataSet
data_processing
pipeline
data_science
pipeline
kedro
run
arguments
KedroSession
Data Catalog
*_args
parameters
IncrementalDataSet
kedro
pipeline
create
do?
pipeline()
wrapper to provide overrides
fsspec
arguments
SequentialRunner
ParallelRunner
Extend Kedro
_load
method with
fsspec
_save
method with
fsspec
_describe
method
PartitionedDataSet
click
global
and
project
commands
before_node_run
hook
Logging
Development
venv
/
virtualenv
kedro
run
DataCatalog
DataCatalog
kedro
airflow
conf/base/spark.yml
SparkSession
in custom project context class
MemoryDataSet
for intermediary
DataFrame
MemoryDataSet
with
copy_mode="assign"
for non-
DataFrame
Spark objects
ThreadRunner
catalog
,
context
,
pipelines
and
session
catalog
context
pipelines
session
Resources
core
contribution process
extras
contribution process
pre-commit
usage
This section contains detailed information about configuration, for which the relevant API documentation can be found in kedro.config.ConfigLoader .
We recommend that you keep all configuration files in the
conf
directory of a Kedro project. However, if you prefer, point Kedro to any other directory and change the configuration paths by setting the
CONF_SOURCE
variable in
src/<python_package>/settings.py
as follows:
CONF_SOURCE = "new_conf"
Local and base configuration environments¶
Kedro-specific configuration (e.g., DataCatalog configuration for IO) is loaded using the ConfigLoader class:
from kedro.config import ConfigLoader
from kedro.framework.project import settings
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = ConfigLoader(conf_source=conf_path, env="local")
conf_catalog = conf_loader.get("catalog*", "catalog*/**")
This recursively scans for configuration files firstly in the conf/base/ (base being the default environment) and then in the conf/local/ (local being the designated overriding environment) directory according to the following rules:
Either of the following is true:
filename starts with catalog
file is located in a sub-directory whose name is prefixed with catalog
And file extension is one of the following: yaml, yml, json, ini, pickle, xml or properties
Configuration information from files stored in base or local that match these rules is merged at runtime and returned as a config dictionary:
If any two configuration files located inside the same environment path (conf/base/ or conf/local/ in this example) contain the same top-level key, load_config will raise a ValueError indicating that the duplicates are not allowed.
If two configuration files have duplicate top-level keys but are in different environment paths (one in conf/base/, another in conf/local/, for example) then the last loaded path (conf/local/ in this case) takes precedence and overrides that key value. ConfigLoader.get will not raise any errors - however, a DEBUG level log message will be emitted with information on the overridden keys.
Any top-level keys that start with _ are considered hidden (or reserved) and are ignored after the config is loaded. Those keys will neither trigger a key duplication error nor appear in the resulting configuration dictionary. However, you can still use such keys, for example, as YAML anchors and aliases.
Additional configuration environments¶
In addition to the two built-in local and base configuration environments, you can create your own. Your project loads conf/base/ as the bottom-level configuration environment but allows you to overwrite it with any other environments that you create, such as conf/server/ or conf/test/. To use additional configuration environments, run the following command:
kedro run --env=test
If no env option is specified, this will default to using the local environment to overwrite conf/base.
If, for some reason, your project does not have any other environments apart from base, i.e. no local environment to default to, you must customise KedroContext to take env="base" in the constructor and then specify your custom KedroContext subclass in src/<python_package>/settings.py under the CONTEXT_CLASS key.
If you set the KEDRO_ENV environment variable to the name of your environment, Kedro will load that environment for your kedro run, kedro ipython, kedro jupyter notebook and kedro jupyter lab sessions:
export KEDRO_ENV=test
If you both specify the KEDRO_ENV environment variable and provide the --env argument to a CLI command, the CLI argument takes precedence.
Template configuration¶
Kedro also provides an extension TemplatedConfigLoader class that allows you to template values in configuration files. To apply templating in your project, set the CONFIG_LOADER_CLASS constant in your src/<python_package>/settings.py:
from kedro.config import TemplatedConfigLoader # new import
CONFIG_LOADER_CLASS = TemplatedConfigLoader
Let’s assume the project contains a conf/base/globals.yml file with the following contents:
bucket_name: "my_s3_bucket"
key_prefix: "my/key/prefix/"
datasets:
csv: "pandas.CSVDataSet"
spark: "spark.SparkDataSet"
folders:
raw: "01_raw"
int: "02_intermediate"
pri: "03_primary"
fea: "04_feature"
The contents of the dictionary resulting from