Plugin: AI Chatbot

Chatbot PoC for SQL Access via LangChain

Maintainer: Diego Ruiz - BX Service GmbH
Status: Beta - Proof of concept
License: GPLv2
Sources: GitHub

Description

This repository is a proof of concept (PoC) for a chatbot system that can understand natural language questions and translate them into SQL queries against a PostgreSQL database using LangChain, Ollama, and LangGraph.

What it Does

Accepts user questions via natural language.
Uses a local LLM model via Ollama (e.g. llama3) to interpret the intent.
Translates questions into PostgreSQL SQL queries with help from LangChain tools.
Injects tenant filtering (e.g. AD_Client_ID = ...) to avoid cross tenant data access.
Streams back answers to the user.
Uses sqlglot to safely parse and rewrite SQL, ensuring valid syntax and injected conditions.

Technologies Used

Python 3.12
LangChain
LangGraph
Ollama (for running local LLMs like llama3)
SQLGlot (safe SQL parsing & rewriting)

Disclaimer

This is a Proof of Concept. Do NOT use it in production. This project is meant to explore feasibility and limitations, not to provide a secure, production-ready system.

Configuration

System Setup in iDempiere

Log in as System in iDempiere (not as a tenant).
Navigate to the “SQL AI Configurator” window.
Open the existing record and fill in the following fields

Required Fields:

Field	Description
Python Path	Full path to python3. Recommended: Use lc-env inside Conda or virtualenv for consistent dependencies.
Script Path	Full path to the pocSQLLangChain.py script from this repository.
Config JSON	JSON string with PostgreSQL connection info and tables to include (see example below).

Example Config JSON: {

 "host": "localhost",
 "port": 5432,
 "database": "idempiere",
 "user": "adempiere",
 "password": "adempiere",
 "include_tables": ["v_bpartner_info", "v_order_info"]

}

Recommendation:

Do NOT use core iDempiere tables like c_order, m_product, or ad_user directly — they are too large and complex.

Instead, create custom views that expose only the necessary columns and rows.

Example View: CREATE OR REPLACE VIEW v_bpartner_info AS select

   value,
   name,
   iscustomer,
   isvendor,
   taxid,
   totalopenbalance,
   c_bpartner_id,
   ad_client_id

FROM c_bpartner;

Security Best Practice

Create a dedicated PostgreSQL user for this integration:

Only grant read-only access to the necessary views.
Avoid giving write, update, or delete permissions.
Helps reduce the risk if the model generates unexpected queries.

Which Script to Use?

Recommended: Use pocSQLLangChain.py

❌ Avoid using the Agent-based version for now

The agent-based pipeline tends to hallucinate more, lacks strict validation, and is harder to control. pocSQLLangChain.py includes:

SQL validation (SELECT-only)
Safe WHERE clause injection
SQL parsing with sqlglot

How SQL Injection is Prevented

Only allows queries starting with SELECT
Rejects any containing dangerous keywords like INSERT, DROP, UPDATE
Strips trailing semicolons
Uses sqlglot to parse and inject AD_Client_ID = ... safely

Using the Chatbot

Log in to iDempiere as a tenant
Open the Chatbot Form
Ask a question like: “List my top customers”

The system will:

Translate your question to SQL
Inject AD_Client_ID = ... for multi-tenant safety
Log the executed query

Logging

For each user question, the system logs a warning message with the final SQL query.
Review these logs to ensure the correctness of the generated queries.

A dedicated iDempiere process has been created to trigger the Python chatbot script via REST call — suitable for integrations with external tools or frontends.

Limitation and Risks

LLM Hallucination: The LLM may generate incorrect or non-existent SQL columns or logic and throw random answers.
SQL Injection: Only partially mitigated — raw SQL generation still occurs.
No Authentication: There's no user management or permission control (Do this with a specific potsgresql users with limited access).
Performance: Not optimized for real-time or large-scale workloads.
Error Handling: Limited feedback when queries fail (e.g. ambiguity, syntax, or missing fields).