The Data Model Behind The Common News

May 12, 2026

One of the easiest ways to build a weak civic product is to treat government information like unstructured content.

At first glance, that seems reasonable. Local government data often arrives as PDFs, agendas, notices, minutes, and meeting pages. It looks like content, so the default instinct is to ingest it, summarize it, and publish a page.

But that approach breaks down quickly.

If you want a product like The Common News to scale, the system cannot just store text. It has to store relationships. That is what makes the difference between a feed of isolated summaries and a product that understands recurring issues, governing bodies, places, and local concepts over time.

That is why I think the data model matters so much. It is not just an implementation detail. It determines what the product can explain, connect, and rank for later.

The Problem With Treating Civic Information as Flat Content

The raw material of local government is messy:

  • agenda packets
  • ordinance drafts
  • committee notices
  • meeting minutes
  • attachments
  • public hearing materials

If all of that gets flattened into a single article-like object, you lose the structure that makes the information useful.

You can publish a summary that way, but you cannot reliably answer questions like:

  • Which committee keeps discussing this issue?
  • Which project has appeared across multiple meetings?
  • Which neighborhood or ward is affected?
  • What other summaries are connected to this same topic?
  • Is this page about one specific event or an ongoing issue?

Those are product questions, but they are also data-model questions.

The Core Idea

The core idea behind the model is simple:

Government information should be represented as connected civic objects, not just text blobs.

In practice, that means separating the system into a few foundational layers:

  • source records
  • civic events
  • summaries
  • canonical entities
  • relationships between them

That structure gives the product a stable way to represent both the specific thing that happened and the broader civic concepts it belongs to.

1. Municipalities Are the Top-Level Container

Everything starts with the municipality.

This seems obvious, but it matters because local government products are not generic news products. Every record lives inside a civic context with its own governing bodies, neighborhoods, naming patterns, and workflows.

The municipality layer helps scope things like:

  • meetings
  • committees
  • projects
  • neighborhoods
  • wards or districts
  • agencies or departments

Without that boundary, the system becomes noisy very fast. With it, you get a clean organizational model for how local concepts belong together.

2. Meetings and Agendas Are Not the Same Thing

One thing I think is easy to model badly is the relationship between a meeting and the content attached to that meeting.

A meeting is an event. An agenda is a structured input to that event. Attachments, notices, items, and follow-up documents may all exist around it.

That distinction matters because users often care about both:

  • the event itself
  • the specific agenda items within it

So the model needs to preserve that hierarchy:

  • municipality
  • governing body
  • meeting
  • agenda or item-level content

That structure makes it possible to describe what happened at the right level of abstraction instead of collapsing everything into one vague page.

3. Summaries Are Product Surfaces, Not Raw Source Records

I do not think a summary should be treated as the same object as the source material.

The source material is the input. The summary is a product surface built from that input.

That distinction matters because summaries need their own fields and behavior:

  • headline
  • canonical slug
  • description
  • body content
  • SEO metadata
  • publish state
  • municipality context
  • linked entities

Once the summary is modeled as its own first-class object, the product can do much more with it. It can publish cleaner pages, connect them to issue hubs, and represent the same meeting through multiple useful product relationships.

4. Canonical Entities Are What Make the Product Compound

This is probably the most important layer.

The Common News gets more valuable when the system can recognize that the same committee, project, neighborhood, or civic issue keeps reappearing over time. That only works if those concepts are modeled canonically instead of being re-derived from raw text every time.

Examples of canonical entities include:

  • committees
  • projects
  • neighborhoods
  • wards or districts
  • streets or locations
  • agencies
  • policy categories

Once those entities exist, summaries can link into them, and those entities can accumulate relevance across multiple meetings and multiple months.

That is what allows the product to move from "here is one summary" to "here is the ongoing story behind this issue."

5. Relationships Matter More Than Any Single Table

I think people often talk about data models as if they are mostly about storing rows cleanly. For this kind of product, the more important question is how things connect.

The high-value relationships are things like:

  • a summary belongs to a municipality
  • a summary is derived from a meeting
  • a meeting belongs to a governing body
  • a summary references multiple entities
  • an entity appears across many summaries
  • a committee has many meetings
  • a project may span multiple committees and summaries

Those relationships are what make downstream features possible:

  • related coverage
  • project hub pages
  • committee pages
  • cross-summary navigation
  • topical archives
  • stronger internal linking
  • better local SEO

In other words, the model does not just store the product. It enables the product.

6. The Model Has to Support Messy Inputs Without Becoming Messy Itself

A big part of civic product design is dealing with ugly input quality.

Government sources are inconsistent. Names vary. Projects get described differently across documents. Some places have cleaner structures than others. Sometimes the source is explicit; sometimes it is buried in prose.

A good data model has to absorb that mess without letting it pollute the product layer.

That usually means:

  • keeping source ingestion separate from canonical entities
  • allowing uncertain or provisional associations upstream
  • promoting only higher-confidence entity links into the published surface
  • preserving source references even when normalization is imperfect

This is one reason I care so much about canonical committee and project data. If the system generates too many weak entities from free-form text, the product becomes noisy, and the SEO surface becomes worse instead of better.

7. The Data Model Also Shapes SEO

I do not think the data model and SEO strategy are separate conversations.

If the model is weak, the site can only publish generic pages. If the model is strong, the site can publish pages that are tied to real civic concepts with stable, reusable meaning.

That affects:

  • slugs
  • canonical URLs
  • summary metadata
  • entity pages
  • internal linking
  • structured data opportunities
  • long-tail search eligibility

This is why I think the data model is one of the hidden levers behind discoverability. Search engines are much more likely to understand pages that reflect stable entities and clear relationships instead of loosely summarized text.

8. Why This Matters Product-Wise

The real value of the model is not elegance for its own sake. It is that it lets the product answer better questions.

With a strong civic model, the product can eventually help users understand:

  • what happened
  • who acted
  • what issue it belongs to
  • what place it affects
  • what else is connected to it
  • what they should read next

That is a much better experience than asking users to manually reconstruct the local issue graph from disconnected summaries.

Closing

For me, the data model behind The Common News is really a bet on structure.

The source material may start as messy local government content, but the product becomes meaningfully better when that material is modeled as municipalities, meetings, summaries, and canonical civic entities connected through stable relationships.

That is what makes the system more useful operationally, more navigable for users, and more durable as a civic publishing platform over time.