Introduction
In modern web applications, enabling users to interact with data using natural language can significantly enhance user experience. In this blog post, we'll explore how to build a backend application using Express.js and DuckDB that allows users to:
Upload CSV files.
Query uploaded data using natural language.
Receive results in a user-friendly format.
We'll walk through setting up the backend, handling file uploads, importing data into DuckDB, converting natural language queries into SQL, and executing queries.
What is DuckDB?
DuckDB is an in-process SQL OLAP database management system. It is designed to support analytical query workloads, also known as Online Analytical Processing (OLAP). DuckDB offers a fast and efficient way to process and analyze data without the need for a separate database server.
Key Features:
Lightweight and Fast: DuckDB runs in-process, eliminating the overhead of client-server communication.
SQL Support: It provides full SQL support, making it familiar to those used to relational databases.
Easy Integration: DuckDB can be embedded into applications written in various programming languages, including Python, R, and Node.js.
Project Overview
We'll build an Express.js backend application that:
Handles CSV File Uploads:
Uses
multer
middleware for handling file uploads.Stores uploaded files on the server.
Imports CSV Data into DuckDB:
Creates DuckDB tables from uploaded CSV files.
Dynamically manages tables based on file names.
Converts Natural Language Queries to SQL:
Uses a language model (e.g., Google PaLM API) to translate natural language queries into SQL.
Ensures the generated SQL queries are compatible with DuckDB.
Executes SQL Queries:
Runs the SQL queries against the DuckDB database.
Returns query results to the client.
Setting Up the Express.js Backend
Prerequisites
Node.js and npm installed
Basic knowledge of Express.js
Initialize the Project
Create a new directory for your project and initialize it:
mkdir duckdb-express
cd duckdb-express
npm init -y
Install Dependencies
Install the required packages:
npm install express multer duckdb dotenv cors axios
npm install -D typescript @types/express @types/node
express: Web framework for Node.js.
multer: Middleware for handling
multipart/form-data
(file uploads).duckdb: DuckDB Node.js bindings.
dotenv: Loads environment variables from a
.env
file.cors: Enables Cross-Origin Resource Sharing.
axios: Promise-based HTTP client for making API requests.
typescript and @types/*: TypeScript and type definitions.
Set Up TypeScript Configuration
Initialize TypeScript in your project:
npx tsc --init
his will create a tsconfig.json
file.
Create the Folder Structure
mkdir src
touch src/server.ts
touch src/utils.ts
mkdir src/uploads
Uploading and Importing CSV Files into DuckDB
Setting Up Multer for File Uploads
Import Required Modules in server.ts
:
import express from 'express';
import multer from 'multer';
import path from 'path';
import dotenv from 'dotenv';
import cors from 'cors';
Configure Multer Storage:
const storageConfig = multer.diskStorage({
destination: path.join(__dirname, 'uploads'),
filename: (req, file, callback) => {
const sanitizedFileName = file.originalname.replace('.csv', '').replace(/[^a-zA-Z0-9_]/g, '') + '.csv';
callback(null, sanitizedFileName);
},
});
const upload = multer({ storage: storageConfig });
Create the /upload
Route:
app.post('/upload', upload.single('file'), async (req, res) => {
if (!req.file) {
return res.status(400).json({ error: 'No file uploaded' });
}
const tableName = req.file.originalname.replace('.csv', '').replace(/[^a-zA-Z0-9_]/g, '');
const filePath = path.join(__dirname, `./uploads/${tableName}.csv`);
// Import CSV into DuckDB (we'll cover this next)
});
Importing CSV Data into DuckDB
Initialize DuckDB Connection:
import DuckDB from 'duckdb';
const db = new DuckDB.Database(':memory:');
const connection = db.connect();
Function to Execute Queries:
const executeQuery = (sqlQuery: string) => {
return new Promise((resolve, reject) => {
connection.all(sqlQuery, (err, res) => {
if (err) {
console.warn('Query error:', err);
return reject(err);
} else {
// Convert BigInt values to strings if necessary
const sanitizedResult = res.map((row: any) =>
Object.fromEntries(
Object.entries(row).map(([key, value]) => [
key,
typeof value === 'bigint' ? value.toString() : value,
])
)
);
resolve(sanitizedResult);
}
});
});
};
Import CSV into DuckDB in the /upload
Route:
try {
// Check if table already exists
const tableExists: any = await executeQuery(
`SELECT * FROM information_schema.tables WHERE table_name = '${tableName}'`
);
if (tableExists.length > 0) {
return res.status(200).json({ message: 'File already uploaded' });
}
// Create table from CSV
await executeQuery(
`CREATE TABLE ${tableName} AS SELECT * FROM read_csv_auto('${filePath}')`
);
res.status(200).json({
message: 'File uploaded and table created successfully',
tableName,
});
} catch (err) {
console.error('Error loading CSV into DuckDB', err);
res.status(500).json({ error: 'Failed to load CSV into database' });
}
Converting Natural Language Queries to SQL
To allow users to query the data using natural language, we'll use a language model to convert natural language into SQL queries.
Note: For this example, we'll assume you have access to a language model API that can perform this conversion, such as the Google PaLM API or OpenAI's GPT-3.
Set Up Environment Variables
Create a .env
file in your root directory:
touch .env
Add your API key:
API_KEY=your_api_key_here
Load environment variables in server.ts
:
dotenv.config();
const API_KEY = process.env.API_KEY;
Create Utility Function for Query Conversion
In utils.ts
:
import axios from 'axios';
export const getSQLQuery = async (naturalLanguageQuery: string, prompt: string) => {
const API_KEY = process.env.API_KEY;
const url = `https://api.yourlanguageapi.com/generate`;
const data = {
prompt: prompt + naturalLanguageQuery,
max_tokens: 100,
// ... other API-specific parameters
};
try {
const response = await axios.post(url, data, {
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${API_KEY}`,
},
});
// Extract SQL query from response
const sqlQuery = response.data.sql;
return sqlQuery;
} catch (error) {
console.error('Error making API request:', error);
throw error;
}
};
Create the /query
Route
In server.ts
:
import { getSQLQuery } from './utils';
app.post('/query', async (req, res) => {
const { naturalLanguageQuery, tableName } = req.body;
if (!tableName || !naturalLanguageQuery) {
return res.status(400).json({ error: 'Invalid request parameters' });
}
// Ensure table exists
// ... (Check and possibly create the table as in the /upload route)
// Get column information
const columns: any = await executeQuery(`
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = '${tableName}'
`);
const allColumns = columns.map(
(item: any) => item.column_name + ', ' + item.data_type
);
if (!allColumns || allColumns.length === 0) {
return res.status(400).send({ error: 'No columns available in the table.' });
}
// Prepare prompt for the language model
const prompt = `You are an assistant that converts natural language queries into SQL queries for DuckDB.
The table name is "${tableName}", and the columns are: ${allColumns}.
Your response must be only a JSON object: {"sql": "SQL query here"}.
Do not include any code fences, language labels, explanations, or additional text—just the JSON object with the SQL query.`;
try {
const sqlQuery = await getSQLQuery(naturalLanguageQuery, prompt);
// Execute SQL query on DuckDB
const result: any = await executeQuery(sqlQuery);
res.status(200).json({ query: sqlQuery, result, length: result.length });
} catch (err) {
console.error('Error processing query', err);
res.status(500).json({ error: 'Failed to process query' });
}
});
Executing Queries and Returning Results
The executeQuery
function handles the execution of SQL queries and returns the results or errors accordingly.
Handling BigInt Conversion: DuckDB may return
BigInt
types, which can cause issues when serializing to JSON. The function convertsBigInt
values to strings.Error Handling: The code catches and logs any errors during query execution, returning a 500 status code to the client if necessary.
Building the Query Interface
The query interface allows users to input natural language queries and receive results.
Fetching Available Files
You can create a route that lists all uploaded files:
In server.ts
:
import fs from 'fs';
app.get('/files', (req, res) => {
const files = fs.readdirSync(path.join(__dirname, './uploads'));
res.status(200).json({ files });
});
Frontend Integration
While this blog focuses on the backend, integrating the backend with a frontend involves:
Fetching the list of available files from
/files
.Allowing the user to select a file to query.
Capturing the natural language query (text input or voice input).
Sending the query and selected file to the
/query
endpoint.Displaying the results in a user-friendly format (e.g., a table).
Voice Input (Optional):
Integrate the Web Speech API in the frontend to convert voice input into text.
Update the query input field with the transcribed text.
Conclusion
By leveraging Express.js and DuckDB, we've built a backend application that allows users to upload CSV files and query the data using natural language. This setup can be a powerful tool for data analysis and exploration, providing users with an intuitive interface to interact with data.
Key Takeaways:
Express.js provides a robust framework for building server-side applications.
DuckDB offers an efficient in-process database solution for analytical queries.
Natural Language Processing (NLP) can enhance user interaction by allowing natural language queries.
API Integration with language models can translate natural language into executable SQL queries.
References
Note: Always ensure that API keys and sensitive information are securely managed and not exposed in your code repositories or client-side code.