Jason Chen

July 13, 2023


When we first met Brandon and Andriy last year, Nomic was still just a prototype and a grand vision.

At the time, the growth of AI-powered tooling was well on its way, as were early signals of a coming inflection point in the popularity, usage, and utility of LLMs. However, while it was increasingly clear that LLMs would be crucial to the next generation of software, we felt the broader AI landscape still lacked several foundational pieces of the puzzle.

In particular, as technical advancements made LLMs more sophisticated and complex, there was a growing need to understand the underlying data to maximize their potential and ensure responsible use. This translated into a core pain point for companies building with LLMs - the lack of tooling to visualize, edit, interact with, and clean the massive datasets that are used to train these models, as well as those produced by them. Without these tools, these models can be prone to hallucinations that sometimes lead to disastrous consequences in specific use cases and industries.

Nomic aimed to solve this problem, and in the past year, has started to do just that.

With Atlas, the team has built an enterprise-grade product that allows companies and users to explore, label, search, and share massive datasets. It is an exceptionally scalable embedding explorer, and is a product that has enabled several of the most comprehensive (and interesting) demonstration projects in large dataset visualization and interaction (e.g. 5.4M of the most popular tweets, an explorable map of Stable Diffusion, and a map of English-language Wikipedia).

In addition, as generative AI applications and products have become more and more popular this past year, it’s clear that the foundational LLMs themselves are still concentrated in the hands of just a few closed-source companies. This has called into question the true “openness” of AI models.

In response, Nomic has launched its flagship model - GPT4All. Initially built with data curated using Atlas, GPT4All is an open-source compressed language model that runs locally on any device, with output that is in line with GPT-3.5 at a fraction of the compute cost. It has generated enormous open-source interest - with 48K+ Github stars as of this writing - and helped build a powerful community that trains and deploys customized large language models that run locally on consumer-grade CPUs. GPT4All has since evolved into an ecosystem of open-source models that enables anyone, regardless of compute constraints, to access the benefits of powerful generative AI.

More importantly, GPT4All aligns strongly with Nomic’s vision, and our shared belief, that AI access will be increasingly democratized. As Nomic partners with outspoken open-source companies like Huggingface, MongoDB, replit, and others, we’re excited to see them build a more equitable AI ecosystem.

Ultimately, Contrary is a founder-focused firm. In Brandon and Andriy, we were blown away by their founder-market fit, having previously worked on this problem together at Rad AI. Their credentials and accomplishments speak for themselves, and represent a diverse blend of academic, technical, and commercial capabilities. In getting to know them this past year though, we have been most impressed by their humility, their drive, and their enduring commitment to the impressive community they’ve built.

We could not be more proud to have led Nomic’s seed round last year, and are honored to continue supporting them in their Series A alongside our good friends at Coatue, betaworks, MongoDB, Factorial Capital, and a deep collection of exceptional industry angels (including Naval Ravikant, Amjad Masad, Clem Delangue, Lukas Biewald, and many more).

