Plugin: AI Chatbot

From iDempiere en

Chatbot PoC for SQL Access via LangChain

BX Chatbot.gif

Description

This repository is a proof of concept (PoC) for a chatbot system that can understand natural language questions and translate them into SQL queries against a PostgreSQL database using LangChain, Ollama, and LangGraph.

What it Does

  • Accepts user questions via natural language.
  • Uses a local LLM model via Ollama (e.g. llama3) to interpret the intent.
  • Translates questions into PostgreSQL SQL queries with help from LangChain tools.
  • Injects tenant filtering (e.g. AD_Client_ID = ...) to avoid cross tenant data access.
  • Streams back answers to the user.
  • Uses sqlglot to safely parse and rewrite SQL, ensuring valid syntax and injected conditions.

Technologies Used

  • Python 3.12
  • LangChain
  • LangGraph
  • Ollama (for running local LLMs like llama3)
  • SQLGlot (safe SQL parsing & rewriting)

Disclaimer

This is a Proof of Concept. Do NOT use it in production. This project is meant to explore feasibility and limitations, not to provide a secure, production-ready system.

Configuration

System Setup in iDempiere

  1. Log in as System in iDempiere (not as a tenant).
  2. Navigate to the “SQL AI Configurator” window.
  3. Open the existing record and fill in the following fields

Required Fields:

Field Description
Python Path Full path to python3.

Recommended: Use lc-env inside Conda or virtualenv for consistent dependencies.

Script Path Full path to the pocSQLLangChain.py script from this repository.
Config JSON JSON string with PostgreSQL connection info and tables to include (see example below).

Example Config JSON: {

 "host": "localhost",
 "port": 5432,
 "database": "idempiere",
 "user": "adempiere",
 "password": "adempiere",
 "include_tables": ["v_bpartner_info", "v_order_info"]

}

Recommendation:

  • Do NOT use core iDempiere tables like c_order, m_product, or ad_user directly — they are too large and complex.

Instead, create custom views that expose only the necessary columns and rows.

Example View: CREATE OR REPLACE VIEW v_bpartner_info AS select

   value,
   name,
   iscustomer,
   isvendor,
   taxid,
   totalopenbalance,
   c_bpartner_id,
   ad_client_id

FROM c_bpartner;

Security Best Practice

Create a dedicated PostgreSQL user for this integration:

  • Only grant read-only access to the necessary views.
  • Avoid giving write, update, or delete permissions.
  • Helps reduce the risk if the model generates unexpected queries.

Which Script to Use?

Recommended: Use pocSQLLangChain.py

❌ Avoid using the Agent-based version for now

The agent-based pipeline tends to hallucinate more, lacks strict validation, and is harder to control. pocSQLLangChain.py includes:

  • SQL validation (SELECT-only)
  • Safe WHERE clause injection
  • SQL parsing with sqlglot

How SQL Injection is Prevented

  • Only allows queries starting with SELECT
  • Rejects any containing dangerous keywords like INSERT, DROP, UPDATE
  • Strips trailing semicolons
  • Uses sqlglot to parse and inject AD_Client_ID = ... safely

Using the Chatbot

  1. Log in to iDempiere as a tenant
  2. Open the Chatbot Form
  3. Ask a question like: “List my top customers”

The system will:

  • Translate your question to SQL
  • Inject AD_Client_ID = ... for multi-tenant safety
  • Log the executed query

Logging

  • For each user question, the system logs a warning message with the final SQL query.
  • Review these logs to ensure the correctness of the generated queries.

A dedicated iDempiere process has been created to trigger the Python chatbot script via REST call — suitable for integrations with external tools or frontends.

Limitation and Risks

  • LLM Hallucination: The LLM may generate incorrect or non-existent SQL columns or logic and throw random answers.
  • SQL Injection: Only partially mitigated — raw SQL generation still occurs.
  • No Authentication: There's no user management or permission control (Do this with a specific potsgresql users with limited access).
  • Performance: Not optimized for real-time or large-scale workloads.
  • Error Handling: Limited feedback when queries fail (e.g. ambiguity, syntax, or missing fields).
Cookies help us deliver our services. By using our services, you agree to our use of cookies.