APIリファレンス
メイン設定クラス
EstatDltConfig
e-Stat APIからdltへのデータ統合のメイン設定クラスです。ソースとロード先の設定を組み合わせ、データ抽出とロードのための追加処理オプションを提供します。
Bases: BaseModel
Main configuration for e-Stat API to DLT integration.
Combines source and destination configurations with additional processing options for data extraction and loading.
Attributes:
| Name | Type | Description |
|---|---|---|
source |
SourceConfig
|
e-Stat API source configuration. |
destination |
DestinationConfig
|
DLT destination configuration. |
batch_size |
Optional[int]
|
Number of records per batch. |
max_retries |
int
|
Maximum API retry attempts. |
timeout |
Optional[int]
|
API request timeout in seconds. |
Source code in src/estat_api_dlt_helper/config/models.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
SourceConfig
e-Stat APIデータソースの設定クラスです。認証、取得する統計表選択、各種オプションを含む統計データの取得パラメータを定義します。
パラメータの詳細はe_Stat API 仕様を参照のこと。
Bases: BaseModel
Configuration for e-Stat API data source.
Defines parameters for fetching statistical data from e-Stat API, including authentication, data selection, and pagination options.
Attributes:
| Name | Type | Description |
|---|---|---|
app_id |
str
|
e-Stat API application ID for authentication. |
statsDataId |
Union[str, List[str]]
|
Statistical table ID(s) to fetch. |
lang |
Literal['J', 'E']
|
Language for API response (J: Japanese, E: English). |
metaGetFlg |
Literal['Y', 'N']
|
Whether to fetch metadata. |
cntGetFlg |
Literal['Y', 'N']
|
Whether to fetch only record count. |
Source code in src/estat_api_dlt_helper/config/models.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
validate_stats_data_id(v)
classmethod
Ensure statsDataId is valid.
Source code in src/estat_api_dlt_helper/config/models.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
DestinationConfig
dlt destination (データロード先) の設定クラスです。ロード先のDWH、データセット/テーブル名、書き込み戦略を含むdltの設定を定義します。
Bases: BaseModel
Configuration for DLT data destination.
Defines parameters for loading data to various destinations using DLT, including destination type, dataset/table names, and write strategies.
Attributes:
| Name | Type | Description |
|---|---|---|
destination |
Union[str, Any]
|
DLT destination type or configuration object. |
dataset_name |
str
|
Target dataset/schema name. |
table_name |
str
|
Target table name. |
write_disposition |
Literal['append', 'replace', 'merge']
|
How to write data (append/replace/merge). |
primary_key |
Optional[Union[str, List[str]]]
|
Primary key columns for merge operations. |
Source code in src/estat_api_dlt_helper/config/models.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
validate_primary_key(v, info)
classmethod
Validate primary key is provided for merge operations.
Source code in src/estat_api_dlt_helper/config/models.py
130 131 132 133 134 135 136 137 138 139 140 141 | |
APIクライアント
EstatApiClient
e-Stat APIアクセス用のクライアントクラスです。政府統計のe-Stat API機能から統計データを取得するメソッドを提供し、API認証、リクエストフォーマット、レスポンス解析を処理します。
Client for accessing e-Stat API.
Provides methods to fetch statistical data from Japan's e-Stat API. Handles API authentication, request formatting, and response parsing. Uses dlt's requests Client for automatic retry, rate limiting, and connection pooling.
Attributes:
| Name | Type | Description |
|---|---|---|
app_id |
e-Stat API application ID for authentication. |
|
base_url |
Base URL for API endpoints. |
|
timeout |
Request timeout in seconds. |
|
client |
HTTP client with retry and connection pooling. |
Source code in src/estat_api_dlt_helper/api/client.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
__init__(app_id, base_url=None, timeout=60)
Initialize e-Stat API client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app_id
|
str
|
e-Stat API application ID |
required |
base_url
|
Optional[str]
|
Base URL for API (defaults to official endpoint) |
None
|
timeout
|
int
|
Request timeout in seconds |
60
|
Source code in src/estat_api_dlt_helper/api/client.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
close()
Close the underlying HTTP session.
Source code in src/estat_api_dlt_helper/api/client.py
196 197 198 | |
get_stats_data(stats_data_id, start_position=1, limit=100000, meta_get_flg='Y', cnt_get_flg='N', explanation_get_flg='Y', annotation_get_flg='Y', replace_sp_chars='0', lang='J', **additional_params)
Get statistical data from e-Stat API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_data_id
|
str
|
Statistical data ID |
required |
start_position
|
int
|
Start position for data retrieval (1-based) |
1
|
limit
|
int
|
Maximum number of records to retrieve |
100000
|
meta_get_flg
|
str
|
Whether to get metadata (Y/N) |
'Y'
|
cnt_get_flg
|
str
|
Whether to get count only (Y/N) |
'N'
|
explanation_get_flg
|
str
|
Whether to get explanations (Y/N) |
'Y'
|
annotation_get_flg
|
str
|
Whether to get annotations (Y/N) |
'Y'
|
replace_sp_chars
|
str
|
Replace special characters (0: No, 1: Yes, 2: Remove) |
'0'
|
lang
|
str
|
Language (J: Japanese, E: English) |
'J'
|
**additional_params
|
Any
|
Additional query parameters |
{}
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
API response as dictionary |
Source code in src/estat_api_dlt_helper/api/client.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
get_stats_data_generator(stats_data_id, limit_per_request=100000, **kwargs)
Get statistical data as a generator for pagination.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_data_id
|
str
|
Statistical data ID |
required |
limit_per_request
|
int
|
Number of records per request |
100000
|
**kwargs
|
Any
|
Additional parameters for get_stats_data |
{}
|
Yields:
| Type | Description |
|---|---|
Dict[str, Any]
|
Response data for each page |
Source code in src/estat_api_dlt_helper/api/client.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
get_stats_list(search_word=None, survey_years=None, stats_code=None, **kwargs)
Get list of available statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_word
|
Optional[str]
|
Search keyword |
None
|
survey_years
|
Optional[str]
|
Survey years (YYYY or YYYYMM-YYYYMM) |
None
|
stats_code
|
Optional[str]
|
Statistics code |
None
|
**kwargs
|
Any
|
Additional query parameters |
{}
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
API response as dictionary |
Source code in src/estat_api_dlt_helper/api/client.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
データ解析
parse_response
e-Stat APIレスポンスを解析してArrow形式に変換する関数です。JSONレスポンスを受け取り、データ値と関連メタデータを含む構造化されたArrowテーブルを返します。
Parse e-Stat API response data and convert to Arrow table.
This is the main entry point for parsing e-Stat API responses. Takes the JSON response and returns a structured Arrow table with data values and associated metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dict[str, Any]
|
The complete JSON response from e-Stat API |
required |
Returns:
| Type | Description |
|---|---|
Table
|
pa.Table: Arrow table containing the parsed data with metadata |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required data sections are missing |
KeyError
|
If expected keys are not found in the response |
Source code in src/estat_api_dlt_helper/parser/response_parser.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
データローダー関数
estat_table
単一のe-Stat API統計表をdlt resourceとして扱うための宣言的APIです。
stats_data_id を指定するだけでdlt resourceを生成でき、write_disposition、
primary_key、incremental などの設定もあわせて定義できます。
DLT resource for a single e-Stat statistical table.
estat_table(stats_data_id, app_id=dlt.secrets.value, table_name=None, write_disposition='replace', primary_key=None, incremental=None, limit=_UNSET, maximum_offset=_UNSET, timeout=_UNSET, **api_params)
Create a DLT resource for a single e-Stat statistical table.
Use this function directly when you need fine-grained control over write_disposition and primary_key per resource.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_data_id
|
str
|
Statistical table ID to fetch. |
required |
app_id
|
str
|
e-Stat API application ID. Resolved automatically from secrets.toml or environment variables when used inside a @dlt.source or @dlt.resource context. |
value
|
table_name
|
Optional[str]
|
Resource/table name. Defaults to "estat_{stats_data_id}". |
None
|
write_disposition
|
str
|
How to write data to destination. |
'replace'
|
primary_key
|
Optional[Union[str, List[str]]]
|
Primary key column(s) for merge disposition. |
None
|
incremental
|
Optional[incremental[str]]
|
Optional incremental loading configuration. Use dlt.sources.incremental("time", initial_value="0000000000") to enable incremental loading based on the time code column. When set, cdTimeFrom API parameter is automatically configured using the last loaded time code value. Best used with write_disposition="merge" or "append". Note that only new time points are detected; revisions to existing data are not captured. |
None
|
limit
|
int
|
Maximum records per API request (pagination size). |
_UNSET
|
maximum_offset
|
Optional[int]
|
Maximum total records to fetch. None for unlimited. |
_UNSET
|
timeout
|
int
|
API request timeout in seconds. |
_UNSET
|
**api_params
|
Any
|
Additional e-Stat API parameters (e.g., lang, cdTab, cdArea, cdTime, cdTimeFrom, cdTimeTo, cat01, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
DltResource
|
DLT resource yielding PyArrow tables. |
Example
import dlt
from estat_api_dlt_helper import estat_table
resource = estat_table(
stats_data_id="0000020201",
write_disposition="merge",
primary_key=["time_code", "area_code"],
)
pipeline = dlt.pipeline(
pipeline_name="estat",
destination="duckdb",
dataset_name="estat_data",
)
pipeline.run(resource)
Source code in src/estat_api_dlt_helper/loader/estat_table.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
estat_source
複数のe-Stat API統計表をdlt sourceとしてまとめて扱うための宣言的APIです。
stats_data_ids に単一ID、IDリスト、{resource_name: stats_data_id} 形式の辞書を渡せます。
また、tables に estat_table() のリストを渡すことで、リソースごとの個別設定も可能です。
DLT source for e-Stat API data.
estat_source(stats_data_ids=None, tables=None, app_id=dlt.secrets.value, write_disposition='replace', primary_key=None, incremental=None, limit=100000, maximum_offset=None, timeout=60, **api_params)
Create a DLT source for e-Stat API statistical data.
Supports two modes: - stats_data_ids: Each ID becomes a separate DLT resource via estat_table, sharing write_disposition/primary_key settings. - tables: Pass pre-configured estat_table resources directly for per-resource control.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats_data_ids
|
Union[str, List[str], Dict[str, str], None]
|
Statistical table ID(s) to fetch. Accepts: - str: single ID (resource name: "estat_{id}") - List[str]: multiple IDs (resource names: "estat_{id}") - Dict[str, str]: {resource_name: stats_data_id} for custom names |
None
|
tables
|
Optional[List[DltResource]]
|
Pre-configured DltResource list (from estat_table). When provided, write_disposition/primary_key are ignored as each resource carries its own settings. app_id/limit/maximum_offset/timeout are propagated to each table via bind(). |
None
|
app_id
|
str
|
e-Stat API application ID. Resolved automatically from secrets.toml ([sources.estat] app_id) or environment variable (SOURCES__ESTAT__APP_ID) if not provided. |
value
|
write_disposition
|
str
|
How to write data to destination. Applied to all resources when using stats_data_ids. |
'replace'
|
primary_key
|
Optional[Union[str, List[str]]]
|
Primary key column(s) for merge disposition. Applied to all resources when using stats_data_ids. |
None
|
incremental
|
Optional[incremental[str]]
|
Optional incremental loading configuration. Applied to all resources when using stats_data_ids mode. Ignored when using tables mode (each table carries its own config). |
None
|
limit
|
int
|
Maximum records per API request (pagination size). |
100000
|
maximum_offset
|
Optional[int]
|
Maximum total records to fetch. None for unlimited. |
None
|
timeout
|
int
|
API request timeout in seconds. |
60
|
**api_params
|
Any
|
Additional e-Stat API parameters passed directly to the API request (e.g., lang, lvTab, cdTab, cdTime, cdArea, cdTimeFrom, cdTimeTo, metaGetFlg, cntGetFlg, replaceSpChars, cat01, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Iterable[DltResource]
|
DltSource with one resource per stats_data_id |
Iterable[DltResource]
|
(the @dlt.source decorator wraps the generator into a DltSource). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both stats_data_ids and tables are provided, if neither is provided, if tables is an empty list, or if tables is used with write_disposition/primary_key/ incremental/api_params arguments. |
Example
import dlt
from estat_api_dlt_helper import estat_source, estat_table
# Simple: multiple tables with shared settings
source = estat_source(
stats_data_ids=["0000020201", "0004028584"],
write_disposition="merge",
primary_key=["time_code", "area_code"],
)
# Advanced: per-resource settings
source = estat_source(
tables=[
estat_table(stats_data_id="0000020201", table_name="pop",
write_disposition="merge", primary_key=["time_code"]),
estat_table(stats_data_id="0004028584", table_name="gdp",
write_disposition="replace"),
],
)
pipeline = dlt.pipeline(
pipeline_name="estat",
destination="duckdb",
dataset_name="estat_data",
)
pipeline.run(source)
Source code in src/estat_api_dlt_helper/loader/estat_source.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
load_estat_data
e-Stat APIデータを指定されたデスティネーションにロードする便利な関数です。提供された設定でdltパイプラインを作成して実行します。
Load e-Stat API data to the specified destination using DLT.
This is a convenience function that creates and runs a DLT pipeline with the provided configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
EstatDltConfig
|
Configuration for e-Stat API source and DLT destination |
required |
credentials
|
Optional[Dict[str, Any]]
|
Optional credentials to override destination credentials |
None
|
**kwargs
|
Any
|
Additional arguments passed to pipeline.run() |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
LoadInfo object containing information about the load operation |
Example
from estat_api_dlt_helper import EstatDltConfig, load_estat_data
config = {
"source": {
"app_id": "YOUR_API_KEY",
"statsDataId": "0000020211",
"limit": 10
},
"destination": {
"destination": "duckdb",
"dataset_name": "demo",
"table_name": "demo",
"write_disposition": "merge",
"primary_key": ["time", "area", "cat01"]
}
}
config = EstatDltConfig(**config)
info = load_estat_data(config)
print(info)
Source code in src/estat_api_dlt_helper/loader/load_manager.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
create_estat_resource
e-Stat APIデータ用のdltリソースを作成する関数です。設定に基づいてe-Stat APIからデータを取得するカスタマイズ可能なdltリソースを作成します。
Create a DLT resource for e-Stat API data.
This function creates a customizable DLT resource that fetches data from the e-Stat API based on the provided configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
EstatDltConfig
|
Configuration for e-Stat API source and destination |
required |
name
|
Optional[str]
|
Resource name (defaults to table_name from config) |
None
|
primary_key
|
Optional[Any]
|
Primary key columns (overrides config if provided) |
None
|
write_disposition
|
Optional[str]
|
Write disposition (overrides config if provided) |
None
|
columns
|
Optional[Any]
|
Column definitions for the resource |
None
|
table_format
|
Optional[str]
|
Table format for certain destinations |
None
|
file_format
|
Optional[str]
|
File format for filesystem destinations |
None
|
schema_contract
|
Optional[Any]
|
Schema contract settings |
None
|
table_name
|
Optional[Callable[[Any], str]]
|
Callable to generate dynamic table names |
None
|
max_table_nesting
|
Optional[int]
|
Maximum nesting level for nested data |
None
|
selected
|
Optional[bool]
|
Whether this resource is selected for loading |
None
|
merge_key
|
Optional[Any]
|
Merge key for merge operations |
None
|
parallelized
|
Optional[bool]
|
Whether to parallelize this resource |
None
|
**resource_kwargs
|
Any
|
Additional keyword arguments for dlt.resource |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
DltResource |
DltResource
|
Configured DLT resource for e-Stat API data |
Example
from estat_api_dlt_helper import EstatDltConfig, create_estat_resource
config = EstatDltConfig(...)
resource = create_estat_resource(config)
# Customize the resource
resource = create_estat_resource(
config,
name="custom_stats",
columns={"time": {"data_type": "timestamp"}},
selected=True
)
Source code in src/estat_api_dlt_helper/loader/dlt_resource.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | |
create_estat_pipeline
e-Stat APIデータロード用のdltパイプラインを作成する関数です。提供された設定に基づいて指定されたデスティネーション用に構成されたカスタマイズ可能なdltパイプラインを作成します。
Create a DLT pipeline for e-Stat API data loading.
This function creates a customizable DLT pipeline configured for the specified destination based on the provided configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
EstatDltConfig
|
Configuration for e-Stat API source and destination |
required |
pipeline_name
|
Optional[str]
|
Name of the pipeline (overrides config if provided) |
None
|
pipelines_dir
|
Optional[str]
|
Directory to store pipeline state |
None
|
dataset_name
|
Optional[str]
|
Dataset name in destination (overrides config if provided) |
None
|
import_schema_path
|
Optional[str]
|
Path to import schema from |
None
|
export_schema_path
|
Optional[str]
|
Path to export schema to |
None
|
dev_mode
|
Optional[bool]
|
Development mode (overrides config if provided) |
None
|
refresh
|
Optional[str]
|
Schema refresh mode |
None
|
progress
|
Optional[str]
|
Progress reporting configuration |
None
|
destination
|
Optional[Any]
|
DLT destination (constructed from config if not provided) |
None
|
staging
|
Optional[Any]
|
Staging destination for certain loaders |
None
|
**pipeline_kwargs
|
Any
|
Additional keyword arguments for dlt.pipeline |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Pipeline |
Pipeline
|
Configured DLT pipeline |
Example
from estat_api_dlt_helper import EstatDltConfig, create_estat_pipeline
config = EstatDltConfig(...)
pipeline = create_estat_pipeline(config)
# Customize the pipeline
pipeline = create_estat_pipeline(
config,
pipeline_name="custom_estat_pipeline",
dev_mode=True,
progress="log"
)
Source code in src/estat_api_dlt_helper/loader/dlt_pipeline.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
create_estat_source
複数のe-Stat API統計表を一括でロードするためのdltソースを作成する関数です。EstatDltConfigのリストを渡すことで、各設定に対応するリソースをまとめたdlt sourceを生成します。
Create a DLT source for multiple e-Stat API tables.
This function creates a DLT source containing one resource per config, using create_estat_resource() internally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
configs
|
List[EstatDltConfig]
|
List of EstatDltConfig, one per resource. |
required |
**source_kwargs
|
Any
|
Additional keyword arguments for dlt.source |
{}
|
Returns:
| Type | Description |
|---|---|
DltSource
|
DLT source with one resource per config. |
Example
from estat_api_dlt_helper import EstatDltConfig, create_estat_source
configs = [
EstatDltConfig(
source={"app_id": "YOUR_KEY", "statsDataId": "0000020201"},
destination={"destination": "duckdb", "dataset_name": "estat", "table_name": "pop"},
),
EstatDltConfig(
source={"app_id": "YOUR_KEY", "statsDataId": "0004028584"},
destination={"destination": "duckdb", "dataset_name": "estat", "table_name": "gdp"},
),
]
source = create_estat_source(configs)
Source code in src/estat_api_dlt_helper/loader/dlt_source.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
create_unified_estat_resource
異なるメタデータ構造を持つ複数のe-Stat APIデータセットを統一スキーマで処理するdltリソースを作成する関数です。複数のstatsDataIdを読み込む際に発生する「Schema at index X was different」エラーを防ぎます。
Create a DLT resource for e-Stat API data using unified schema.
This function creates a DLT resource that handles multiple e-Stat datasets with varying metadata structures by using a unified schema approach. This prevents PyArrow "Schema at index X was different" errors when loading multiple statsDataIds.
The unified schema includes all possible fields from all datasets, with missing fields automatically set to None. Column order from the original data is preserved.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
EstatDltConfig
|
Configuration for e-Stat API source and destination |
required |
name
|
Optional[str]
|
Resource name (defaults to table_name from config) |
None
|
primary_key
|
Optional[Any]
|
Primary key columns (overrides config if provided) |
None
|
write_disposition
|
Optional[str]
|
Write disposition (overrides config if provided) |
None
|
columns
|
Optional[Any]
|
Column definitions for the resource |
None
|
table_format
|
Optional[str]
|
Table format for certain destinations |
None
|
file_format
|
Optional[str]
|
File format for filesystem destinations |
None
|
schema_contract
|
Optional[Any]
|
Schema contract settings (defaults to evolve mode) |
None
|
table_name
|
Optional[Callable[[Any], str]]
|
Callable to generate dynamic table names |
None
|
max_table_nesting
|
Optional[int]
|
Maximum nesting level for nested data |
None
|
selected
|
Optional[bool]
|
Whether this resource is selected for loading |
None
|
merge_key
|
Optional[Any]
|
Merge key for merge operations |
None
|
parallelized
|
Optional[bool]
|
Whether to parallelize this resource |
None
|
**resource_kwargs
|
Any
|
Additional keyword arguments for dlt.resource |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
dlt.Resource: Configured DLT resource with unified schema support |
Example
from estat_api_dlt_helper import EstatDltConfig
from estat_api_dlt_helper.loader.unified_schema_resource import create_unified_estat_resource
# Configuration with multiple statsDataIds that have different schemas
config = EstatDltConfig(
source={
"statsDataId": ["0004028473", "0004028474", "0004028475"],
"app_id": "YOUR_APP_ID"
},
destination={
"table_name": "unified_stats",
"dataset_name": "estat_data"
}
)
# Create resource with unified schema
resource = create_unified_estat_resource(config)
# Run in pipeline without schema errors
import dlt
pipeline = dlt.pipeline(
pipeline_name="estat_unified",
destination="duckdb",
dataset_name="estat_data"
)
pipeline.run(resource)
Note
This resource is recommended when loading multiple statsDataIds that may have different metadata structures (e.g., some have parent_code in time_metadata while others don't).
Source code in src/estat_api_dlt_helper/loader/unified_schema_resource.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 | |