Typedef project Fenic: A ‘dataframe’ for LLMs

Typedef provides purpose-built AI data infrastructure services for cloud workloads that need to handle LLM-powered pipelines, unstructured data processing, inference complexity and the running of batch AI workloads in production. In other words, management of all AI infrastructure complexity. The company says it is now going one step further and turning AI prototypes into scalable, production-ready workloads. This development manifests itself in a new release of the company’s open source project Fenic, a PySpark-inspired DataFrame for building AI and agentic applications.

What is a PySpark DataFrame?

To define these terms, PySpark is an open source software application programming interface (API) for Python and Apache Spark.

According to online technology learning company Coursera, “This popular data science framework allows [developers] to perform big data analytics and speedy data processing for data sets of all sizes. It combines the performance of Apache Spark and its speed in working with large data sets and machine learning algorithms with the ease of using Python to make data processing and analysis more accessible.”

Further then, a dataframe is best defined as a “tabular data structure” that consists of both rows and columns (not unline a database or a spreadsheet file) that contains a “dictionary of lists” which here means that every list has its own identifiers or keys, such as “day of the week” or “cost” and so on. It is essentially a two-dimensional table of whatever size is needed for a given data job.

A PySpark DataFrame (or dataframe), therefore, is a tabular data structure that serves PySpark APIs working to deliver information into Python and Spark.

What is project Fenic?

Fenic enables AI and data engineering teams to transform unstructured and structured data “into insights” (as the marketing people are so fond of saying) using “familiar” dataframe operations enhanced with semantic intelligence. It features support for markdown, transcripts and semantic operators, coupled with efficient batch inference across any model provider.

New features found in Fenic version 0.3.0 include: Rust-powered Jinja templating as a column function for dynamic, data-aware prompts (loops, conditionals, arrays); built-in “fuzzy string matching” with three comparison modes and 6 algorithms for blocking, deduping, and record linkage, before AI engineers and associated developers spend tokens. There’s also new functions & models (e.g. Cohere & Gemini embeddings and summarisation) plus meaningful performance & DX improvements.

“Typedef is fully committed to accelerating innovation and time-to-value by building in the open,” said Kostas Pardalis, co-founder of Typedef and Fenic Steward. “The latest release of Fenic features changes that will result in a lot less glue code, fewer brittle prompts, and cheaper, more reliable pipelines – helping ship AI workflows to production faster. Unlike traditional data tools retrofitted for Large Language Models (LLMs), the Fenic query engine is built to work with AI models and unstructured data like emails and call transcripts.”

Semantic feature engineering

Other popular use-cases built with Fenic include: semantic feature engineering for recommended system models; high-precision named entity recognition and duplication; automated user-generated content moderation; and transaction enrichment and classification for FinTech firms.

While the open Fenic provides a foundation to build AI pipelines with semantic intelligence baked in, Typedef is said to “supercharge” it with commercial features to make it more scalable across an enterprise.

As a startup that launched in June 2025, Typedef.AI offers support for more complex, mixed AI workflows; collaboration via a web-based user interface; and reporting and analytics. Moreover, Typedef allows for rapid, iterative prompt and pipeline experimentation to quickly determine production-ready workloads that will demonstrate value.

Magical analyst house Gartner thinks that more than 40 percent of agentic AI projects will be cancelled or fail by the end of 2027 due to escalating costs and unclear business value. Pilot paralysis is a well-documented epidemic affecting the bulk of enterprise AI projects with some research estimating the failure to scale rate to be as much as 87 percent. Typedef says it is looking to right this wrong by structuring unstructured data, operationalising inference and unlocking semantic insight at scale.

Dataiku plans IPO in the US

New York-based AI and data analytics startup Dataiku is preparing for an IPO in the United States. According ...

Mels Dees October 2, 2025

Top story

Tracking data lineage from data archaeology to digital twins

Data management must now grow and evolve a new arm that extends to data lineage control. As data streams flow...

Adrian Bridgwater September 21, 2025

Fabric gets real-time data mirroring from Oracle and BigQuery

Microsoft is expanding its Fabric data platform with new capabilities to make data from external systems dire...

Mels Dees September 19, 2025

Expert Talks

How to Recover My Archived PST Files in Outlook?

An archive PST file in Outlook is a file in which older emails and ot...

Tech calendar

Whitepapers

Enhance your data protection strategy for 2025

The Data Protection Guide 2025 explores the essential strategies and...

Stay tuned, subscribe!

Chinese cyber threat exploited VMware vulnerability for a full year

ServiceNow makes AI a primary part of its platform with AI Experience

Qualcomm’s vision: you’re the maestro, AI is your ensemble

Microsoft appoints new CEO

ServiceNow goes after the mid-market with its AI-based Core Business Suite

Oracle Database @ AWS: best of both worlds?

Slack is evolving into a work operating system

Managing the AI chaos with ServiceNow's AI Control Tower

Minimizing liability is not the same as security: Lessons learned from Collin’s Aerospace cyberattack

How Split-Second Data Performance and Sovereignty Keep the Netherlands Moving

How to Recover My Archived PST Files in Outlook?

The AI productivity mirage: why leaders are aiming at the wrong target

National 6G Conference

Innovation Week 2025

The Next Chapter in Cybersecurity with Imperva + Thales

Luxembourg Venture Days

Dell Technologies Forum

BrickCon The Databricks Community Conference

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices