Google BigQuery 클라이언트 API를 사용하여 BigQuery에서 JSON 파일로드

https://stackoverflow.com//questions/25048787

21-12-2019
|

문제

Google BigQuery 클라이언트 API를 사용하여 로컬 파일 시스템에서 BigQuery로의 JSON 파일을로드하는 방법이 있습니까?

발견 된 모든 옵션은 다음과 같습니다.

1 - 레코드를 하나씩 스트리밍합니다.

2- GCS에서 JSON 데이터로드.

3- 원시 게시물 요청을 사용하여 JSON (I.E..e. Google 클라이언트 API가 아님)을로드합니다.

해결책

Python 에서이 작업을 수행하려는 Python 태그로 가정합니다.로드 예제 여기 로컬 파일에서 데이터를로드합니다 (CSV는 CSV를 사용하지만 JSON에 적용하기 쉽습니다 ... 동일한 디렉토리에 다른 JSON 예제가 있습니다.

기본 흐름은 다음과 같습니다.

# Load configuration with the destination specified.
load_config = {
  'destinationTable': {
    'projectId': PROJECT_ID,
    'datasetId': DATASET_ID,
    'tableId': TABLE_ID
  }
}

load_config['schema'] = {
  'fields': [
    {'name':'string_f', 'type':'STRING'},
    {'name':'boolean_f', 'type':'BOOLEAN'},
    {'name':'integer_f', 'type':'INTEGER'},
    {'name':'float_f', 'type':'FLOAT'},
    {'name':'timestamp_f', 'type':'TIMESTAMP'}
  ]
}
load_config['sourceFormat'] = 'NEWLINE_DELIMITED_JSON'

# This tells it to perform a resumable upload of a local file
# called 'foo.json' 
upload = MediaFileUpload('foo.json',
                         mimetype='application/octet-stream',
                         # This enables resumable uploads.
                         resumable=True)

start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = jobs.insert(
  projectId=project_id,
  body={
    'jobReference': {
      'jobId': job_id
    },
    'configuration': {
      'load': load
    }
  },
  media_body=upload).execute()

 # Then you'd also want to wait for the result and check the status. (check out
 # the example at the link for more info).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow