Convert SQLite database to Postgres database
Install sqlite cli
Sometimes in life you need to work with SQLite
databases. For example, I was given this Pitchfork database in the form of a database.sqlite
file. The first thing I needed was to install a SQLite CLI, which can be downloaded https://sqlite.org/2020/sqlite-tools-osx-x86-3310100.zip
- which contains a bundle of command-line tools for managing SQLite database files, including the command-line shell program, the sqldiff program, and the sqlite3_analyzer program.
cd sqlite-tools-osx-x86-3310100/
(base) shravan-sqlite-tools-osx-x86-3310100$ ls -rtl
total 5080
-rwxr-xr-x@ 1 shravan staff 700096 Jan 28 03:28 sqldiff
-rwxr-xr-x@ 1 shravan staff 728116 Jan 28 03:28 sqlite3_analyzer
-rwxr-xr-x@ 1 shravan staff 1168852 Jan 28 03:28 sqlite3
-rw-r--r--@ 1 shravan staff 83585024 Sep 20 05:24 database.sqlite
(base) shravan-sqlite-tools-osx-x86-3310100$ ./sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open database.sqlite
Data
We will be working with the pitchfork
database made available to use in the form of a .sqlite
file. The below screenshot shows the number of rows and columns to expect in each of these tables:
View tables and schema
sqlite> .tables
artists content genres labels reviews years
sqlite> .schema
CREATE TABLE reviews (
reviewid INTEGER,
title TEXT,
artist TEXT,
url TEXT,
score REAL,
best_new_music INTEGER,
author TEXT,
author_type TEXT,
pub_date TEXT,
pub_weekday INTEGER,
pub_day INTEGER,
pub_month INTEGER,
pub_year INTEGER);
CREATE TABLE artists (
reviewid INTEGER, artist TEXT);
CREATE TABLE genres (
reviewid INTEGER, genre TEXT);
CREATE TABLE labels (
reviewid INTEGER, label TEXT);
CREATE TABLE years (
reviewid INTEGER, year INTEGER);
CREATE TABLE content (
reviewid INTEGER, content TEXT);
sqlite>
Query one of these tables
sqlite> SELECT
...> reviewid,
...> title,
...> artist,
...> author
...> FROM reviews
...> LIMIT 10;
22703|mezzanine|massive attack|nate patrin
22721|prelapsarian|krallice|zoe camp
22659|all of them naturals|uranium club|david glickman
22661|first songs|kleenex, liliput|jenn pelly
22725|new start|taso|kevin lozano
22722|insecure (music from the hbo original series)|various artists|vanessa okoth-obbo
22704|stillness in wonderland|little simz|katherine st. asaph
22694|tehillim|yotam avni|andy beta
22714|reflection|brian eno|andy beta
22724|filthy america its beautiful|the lox|ian cohen
sqlite>
By default the mode is set to |
Export the tables as individual csv files
To export an SQLite table (or part of a table) as CSV, simply set the “mode” to “csv” and then run a query to extract the desired rows of the table.
sqlite> .headers on
sqlite> .mode csv
sqlite> .once 'reviews.csv'
sqlite> SELECT * FROM reviews;
sqlite>
The .headers on
line causes column labels to be printed as the first row of output. This means that the first row of the resulting CSV file will contain column labels. If column labels are not desired, set “.headers off” instead. (The “.headers off” setting is the default and can be omitted if the headers have not been previously turned on.)
The line .once FILENAME
causes all query output to go into the named file instead of being printed on the console. In the example above, that line causes the CSV content to be written into a file named “C:/work/dataout.csv”.
Finally, after running the select statement, you can see that reviews.csv
got created. Similarly, do it for other tables.
(base) shravan-sqlite-tools-osx-x86-3310100$ ls -rtl
total 243728
-rw-r--r--@ 1 shravan staff 83585024 Sep 20 05:24 database.sqlite
-rwxr-xr-x@ 1 shravan staff 700096 Jan 28 03:28 sqldiff
-rwxr-xr-x@ 1 shravan staff 728116 Jan 28 03:28 sqlite3_analyzer
-rwxr-xr-x@ 1 shravan staff 1168852 Jan 28 03:28 sqlite3
-rw-r--r--@ 1 shravan staff 34891456 Feb 25 15:13 database.sqlite.zip
-rw-r--r-- 1 shravan staff 2924489 Feb 25 16:09 reviews.csv
-rw-r--r-- 1 shravan staff 392675 Feb 25 16:22 artists.csv
-rw-r--r-- 1 shravan staff 300689 Feb 25 16:22 genres.csv
-rw-r--r-- 1 shravan staff 358748 Feb 25 16:22 labels.csv
-rw-r--r-- 1 shravan staff 220192 Feb 25 16:23 years.csv
-rw-r--r-- 1 shravan staff 75497159 Feb 25 16:24 content.csv
Import the csv files into Postgres
Create a database pitchfork
using this command: createdb -h localhost -p 5432 -U shravan pitchfork
Open the query tool and run the sql script provided in the appendix.
sqlite> SELECT COUNT(*) FROM reviews;
COUNT(*)
18393
sqlite>
(etl) shravan$ psql pitchfork
psql (11.5)
Type "help" for help.
pitchfork=# SELECT COUNT(*) FROM reviews;
count
-------
18393
(1 row)
pitchfork=#
Appendix
set time zone 'UTC';
DROP TABLE IF EXISTS reviews;
DROP TABLE IF EXISTS artists;
DROP TABLE IF EXISTS genres;
DROP TABLE IF EXISTS labels;
/*DROP TABLE IF EXISTS years;*/
DROP TABLE IF EXISTS contents;
CREATE TABLE IF NOT EXISTS reviews (
reviewid INTEGER,
title TEXT,
artist TEXT,
url TEXT,
score NUMERIC,
best_new_music INTEGER,
author TEXT,
author_type TEXT,
pub_date timestamp,
pub_weekday INTEGER,
pub_day INTEGER,
pub_month INTEGER,
pub_year INTEGER
);
COPY reviews
FROM '/Users/shravan/Downloads/sqlite-tools-osx-x86-3310100/reviews.csv' (DELIMITER ',', FORMAT CSV, HEADER, NULL 'NA');
CREATE TABLE IF NOT EXISTS artists (
reviewid INTEGER,
artist TEXT
);
COPY artists
FROM '/Users/shravan/Downloads/sqlite-tools-osx-x86-3310100/artists.csv' (DELIMITER ',', FORMAT CSV, HEADER, NULL 'NA');
CREATE TABLE IF NOT EXISTS genres (
reviewid INTEGER,
genre TEXT
);
COPY genres
FROM '/Users/shravan/Downloads/sqlite-tools-osx-x86-3310100/genres.csv' (DELIMITER ',', FORMAT CSV, HEADER, NULL 'NA');
CREATE TABLE IF NOT EXISTS labels (
reviewid INTEGER,
label TEXT
);
COPY labels
FROM '/Users/shravan/Downloads/sqlite-tools-osx-x86-3310100/labels.csv' (DELIMITER ',', FORMAT CSV, HEADER, NULL 'NA');
/*
CREATE TABLE years (
reviewid INTEGER,
year INTEGER
);
COPY years
FROM '/Users/shravan/Downloads/sqlite-tools-osx-x86-3310100/years.csv' (DELIMITER ',', FORMAT CSV, HEADER, NULL 'NA');
*/
CREATE TABLE contents (
reviewid INTEGER,
contents TEXT
);
COPY contents
FROM '/Users/shravan/Downloads/sqlite-tools-osx-x86-3310100/content.csv' (DELIMITER ',', FORMAT CSV, HEADER, NULL 'NA');
Use ctrl + l
to clear the screen.
help:
sqlite> .help
.archive ... Manage SQL archives
.auth ON|OFF Show authorizer callbacks
.backup ?DB? FILE Backup DB (default "main") to FILE
.bail on|off Stop after hitting an error. Default OFF
.binary on|off Turn binary output on or off. Default OFF
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL
.check GLOB Fail if output since .testcase does not match
.clone NEWDB Clone data into NEWDB from the existing database
.databases List names and files of attached databases
.dbconfig ?op? ?val? List or change sqlite3_db_config() options
.dbinfo ?DB? Show status information about the database
.dump ?TABLE? ... Render all database content as SQL
.echo on|off Turn command echo on or off
.eqp on|off|full|... Enable or disable automatic EXPLAIN QUERY PLAN
.excel Display the output of next command in spreadsheet
.exit ?CODE? Exit this program with return-code CODE
.expert EXPERIMENTAL. Suggest indexes for queries
.explain ?on|off|auto? Change the EXPLAIN formatting mode. Default: auto
.filectrl CMD ... Run various sqlite3_file_control() operations
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off Turn display of headers on or off
.help ?-all? ?PATTERN? Show help text for PATTERN
.import FILE TABLE Import data from FILE into TABLE
.imposter INDEX TABLE Create imposter table TABLE on index INDEX
.indexes ?TABLE? Show names of indexes
.limit ?LIMIT? ?VAL? Display or change the value of an SQLITE_LIMIT
.lint OPTIONS Report potential schema issues.
.load FILE ?ENTRY? Load an extension library
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
.mode MODE ?TABLE? Set output mode
.nullvalue STRING Use STRING in place of NULL values
.once (-e|-x|FILE) Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
.output ?FILE? Send output to FILE or stdout if FILE is omitted
.parameter CMD ... Manage SQL parameter bindings
.print STRING... Print literal STRING
.progress N Invoke progress handler after every N opcodes
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILE Read input from FILE
.recover Recover as much data as possible from corrupt db.
.restore ?DB? FILE Restore content of DB (default "main") from FILE
.save FILE Write in-memory database into FILE
.scanstats on|off Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN? Show the CREATE statements matching PATTERN
.selftest ?OPTIONS? Run tests defined in the SELFTEST table
.separator COL ?ROW? Change the column and row separators
.sha3sum ... Compute a SHA3 hash of database content
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.stats ?on|off? Show stats or turn stats on or off
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables matching LIKE pattern TABLE
.testcase NAME Begin redirecting output to 'testcase-out.txt'
.testctrl CMD ... Run various sqlite3_test_control() operations
.timeout MS Try opening locked tables for MS milliseconds
.timer on|off Turn SQL timer on or off
.trace ?OPTIONS? Output each SQL statement as it is run
.vfsinfo ?AUX? Information about the top-level VFS
.vfslist List all available VFSes
.vfsname ?AUX? Print the name of the VFS stack
.width NUM1 NUM2 ... Set column widths for "column" mode
sqlite>