Typedef project Fenic: A ‘dataframe’ for LLMs

Typedef project Fenic: A ‘dataframe’ for LLMs

Typedef provides purpose-built AI data infrastructure services for cloud workloads that need to handle LLM-powered pipelines, unstructured data processing, inference complexity and the running of batch AI workloads in production. In other words, management of all AI infrastructure complexity. The company says it is now going one step further and turning AI prototypes into scalable, production-ready workloads. This development manifests itself in a new release of the company’s open source project Fenic, a PySpark-inspired DataFrame for building AI and agentic applications.

What is a PySpark DataFrame?

To define these terms, PySpark is an open source software application programming interface (API) for Python and Apache Spark.

According to online technology learning company Coursera, “This popular data science framework allows [developers] to perform big data analytics and speedy data processing for data sets of all sizes. It combines the performance of Apache Spark and its speed in working with large data sets and machine learning algorithms with the ease of using Python to make data processing and analysis more accessible.”

Further then, a dataframe is best defined as a “tabular data structure” that consists of both rows and columns (not unline a database or a spreadsheet file) that contains a “dictionary of lists” which here means that every list has its own identifiers or keys, such as “day of the week” or “cost” and so on. It is essentially a two-dimensional table of whatever size is needed for a given data job.

A PySpark DataFrame (or dataframe), therefore, is a tabular data structure that serves PySpark APIs working to deliver information into Python and Spark.

What is project Fenic?

Fenic enables AI and data engineering teams to transform unstructured and structured data “into insights” (as the marketing people are so fond of saying) using “familiar” dataframe operations enhanced with semantic intelligence. It features support for markdown, transcripts and semantic operators, coupled with efficient batch inference across any model provider.

New features found in Fenic version 0.3.0 include: Rust-powered Jinja templating as a column function for dynamic, data-aware prompts (loops, conditionals, arrays); built-in “fuzzy string matching” with three comparison modes and 6 algorithms for blocking, deduping, and record linkage, before AI engineers and associated developers spend tokens. There’s also new functions & models (e.g. Cohere & Gemini embeddings and summarisation) plus meaningful performance & DX improvements.

“Typedef is fully committed to accelerating innovation and time-to-value by building in the open,” said Kostas Pardalis, co-founder of Typedef and Fenic Steward. “The latest release of Fenic features changes that will result in a lot less glue code, fewer brittle prompts, and cheaper, more reliable pipelines – helping ship AI workflows to production faster. Unlike traditional data tools retrofitted for Large Language Models (LLMs), the Fenic query engine is built to work with AI models and unstructured data like emails and call transcripts.”

Semantic feature engineering

Other popular use-cases built with Fenic include: semantic feature engineering for recommended system models; high-precision named entity recognition and duplication; automated user-generated content moderation; and transaction enrichment and classification for FinTech firms.

While the open Fenic provides a foundation to build AI pipelines with semantic intelligence baked in, Typedef is said to “supercharge” it with commercial features to make it more scalable across an enterprise.

As a startup that launched in June 2025, Typedef.AI offers support for more complex, mixed AI workflows; collaboration via a web-based user interface; and reporting and analytics. Moreover, Typedef allows for rapid, iterative prompt and pipeline experimentation to quickly determine production-ready workloads that will demonstrate value. 

Magical analyst house Gartner thinks that more than 40 percent of agentic AI projects will be cancelled or fail by the end of 2027 due to escalating costs and unclear business value. Pilot paralysis is a well-documented epidemic affecting the bulk of enterprise AI projects with some research estimating the failure to scale rate to be as much as 87 percent. Typedef says it is looking to right this wrong by structuring unstructured data, operationalising inference and unlocking semantic insight at scale.