Quantifying your reliance on Open Source software

With dependency-management-data (DMD)

Jamie Tanna (https://www.jvt.me)

Why is it important?

As I wrote in the post Analysing our dependency trees to determine where we should send Open Source contributions for Hacktoberfest:

In recent years, it has become unavoidable to build software on top of Open Source. This is absolutely a great thing, and allows developers to focus on fewer areas of domain specialisation as possible, as well as allowing a much wider range of users to pick up on defects and bring new features to our tools.

As I wrote in the post Analysing our dependency trees to determine where we should send Open Source contributions for Hacktoberfest:

However, with events such as the Log4Shell security vulnerability, times where maintainers have removed their libraries from package and source repositories, sometimes in political protest, it's understandable that businesses are somewhat hesitant about the sustainability of projects.

💖

Do you fully appreciate the depth of your dependency on the software supply chain?

Being able to understand how your business uses Open Source is really important for a few other key reasons:

  • Where can I contribute to? (as Deliveroo did at Hacktoberfest)
  • Usages of unwanted libraries (i.e. copyleft)
  • Understand usage of libraries and frameworks
  • Understand the spread of libraries and their versions
  • Discovering end-of-life or vulnerable software
  • Discovering which libraries you're using which are deprecated
  • High level view of how many major/minor/patch versions behind

How can we do it?

💰🤑💸

GitHub logo GitLab logo Snyk logo Mend logo

Let's use Open Source!

Open Source Initiative logo

/usr/bin/whoami

Timeline of events

  • 2023-07: This talk!
  • 2023-02: Created the dependency-management-data project
  • 2022-08: First iteration with Dependabot
  • 2019: "Formally" considering it
  • 2017: Hacking around

What is dependency-management-data?

Dependency Management Data (DMD) - dmd.tanna.dev

What's in the project?

  • The command-line tool dmd
  • The web application dmd-web
  • The outputted SQLite database
  • (Your SQLite browser of choice)

dmd

  • Build the SQLite database
  • Enrich it with more data ("advisories")
  • Provide common queries ("reports")
  • Ingests different sources of dependency data ("datasources")

dmd-web

  • Centrally deployed and accessible
  • View reports in the browser
  • Datasette's excellent SQLite UI
    • Share URLs with your colleagues!

SQLite database

  • Conveniently distribute, share
  • Great for local-only or building applications on top of it
  • No lock-in - all state synced to the DB

Demo

  • elastic/kibana
  • What dependencies does it have? (example)
  • How many dependencies across package managers? (example)
  • Does it have any advisories? (example)

How did it come to be?

Idea for Open Source/Startup: monetising the supply chain

Analysing our dependency trees to determine where we should send Open Source contributions for Hacktoberfest

  • Using the Dependabot APIs
  • Good starting point
  • Lack of data for some ecosystems
  • Hard to parse the "current version"

via GIPHY

Mend Renovate logo

EndOfLife.date logo

commit 73a99614a2af6fa9f66508bab8541ed65e18ed66
Author: Jamie Tanna <>
Date:   Thu Feb 2 09:23:43 2023 +0000

    Initialise project

 LICENSE.md        | 13 +++++++++++++
 README.md         |  7 +++++++
 public/index.html | 90 ++++++++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+)

How does it work?

# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo
# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# optionally, generate advisories
dmd db generate advisories --db dmd.db
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'

Datasources

Currently:

  • renovate-graph
  • dependabot-graph
  • endoflife-checker
  • More welcome!
  • Converts to an underlying data model (in SQLite)
  • Uses that for internal querying + enrichment

Reports

Pre-baked, Open Source'd queries:

$ dmd report --help

  advisories                 Report advisories that are available for packages or dependencies in use
  golangCILint               Query usages of golangci-lint, tracked as a source-based dependency
  mostPopularDockerImages    Query the most popular Docker namespaces and images in use
  mostPopularPackageManagers Query the most popular package managers in use

Advisories

Right now we can ask:

  • what Terraform modules and versions are being used across the org?
  • which teams are using the Gin web framework?

What if we could ask:

  • which software am I running that needs an upgrade soon?
  • how much time should my team(s) be planning in the next quarter to upgrade their AWS infrastructure?

via GIPHY

Dependency advisory data sources:

AWS infrastructure advisory data sources:

🤫

Custom advisories 🦸

Community provided advisories via -contrib:

INSERT INTO advisories (
  package_pattern,
  package_manager,
  version,
  version_match_strategy,
  advisory_type,
  description
) VALUES (
  'github.com/golang/mock',
  'gomod',
  NULL,
  NULL,
  'UNMAINTAINED',
  'golang/mock is no longer maintained, and active development been moved to github.com/uber/mock'
);

Ownership

  • Who does this production service using end-of-life software belong to?
  • dmd owners + owners table to the rescue!
select
  distinct
  renovate.platform,
  renovate.organisation,
  renovate.repo,
  owner
from
  renovate
  left join owners
  on  renovate.platform = owners.platform
  and renovate.organisation = owners.organisation
  and renovate.repo = owners.repo

Example project

Contrib project

Case Studies

Which other services may be affected by this production bug?

I.e. aws-lambda-go package (example)

select
  distinct renovate.platform,
  renovate.organisation,
  renovate.repo,
  version,
  owner
from
  renovate
  left join owners on renovate.platform = owners.platform
  and renovate.organisation = owners.organisation
  and renovate.repo = owners.repo
where
  package_name = 'github.com/aws/aws-lambda-go'
order by
  version desc

The Gorilla Toolkit archiving

$ dmd report gorillaToolkit --db dmd.db
Renovate
Direct dependencies
+------------------------------+---+
| PACKAGE                      | # |
+------------------------------+---+
| github.com/gorilla/mux       | 4 |
| github.com/gorilla/websocket | 1 |
| github.com/gorilla/handlers  | 1 |
+------------------------------+---+
Indirect dependencies
+---------------------------------+---+
| PACKAGE                         | # |
+---------------------------------+---+
| github.com/gorilla/websocket    | 6 |
| github.com/gorilla/securecookie | 2 |
| github.com/gorilla/sessions     | 1 |
| github.com/gorilla/schema       | 1 |
| github.com/gorilla/mux          | 1 |
| github.com/gorilla/context      | 1 |
+---------------------------------+---+

Docker free tier sunset

Working out which Docker namespaces and images you most depend on

$ dmd report mostPopularDockerImages --db dmd.db
Renovate
+---------------------------------------+-----+
| NAMESPACE                             |   # |
+---------------------------------------+-----+
| _                                     | 346 |
| dockersamples                         |  12 |
| registry1.dsop.io/ironbank/redhat/ubi |  11 |
| docker.elastic.co/elasticsearch       |   6 |
| gcr.io/distroless                     |   6 |
| cimg                                  |   5 |
| docker.elastic.co/kibana              |   4 |
| amazon                                |   4 |
| gcr.io/kaniko-project                 |   3 |
| goreleaser                            |   3 |
| quay.io/something                     |   3 |
| circleci                              |   3 |
+---------------------------------------+-----+
+--------------------------------------------+----+
| IMAGE                                      |  # |
+--------------------------------------------+----+
| golang                                     | 44 |
| alpine                                     | 36 |
| node                                       | 33 |
| docker                                     | 25 |
| nginx                                      | 21 |
| ubuntu                                     | 19 |
| python                                     | 16 |
| ruby                                       | 15 |
| redis                                      | 14 |
| busybox                                    | 11 |
| registry1.dsop.io/ironbank/redhat/ubi/ubi8 | 11 |
| openjdk                                    | 11 |
+--------------------------------------------+----+

What are the most used indirect/transitive dependencies?

For Go Modules (example query):

select
  distinct package_name,
  count(*)
from
  renovate,
  json_each(dep_types) as dep_type
where
  package_manager = 'gomod'
  and dep_type.value = 'indirect'
group by
  package_name
order by
  count(*) DESC;

xkcd comic showing a tower of various layers of boulders and stones, labelled "all modern digital infrastructure", which looks a little precarious. Towards the bottom there is a slim load-bearing stone which is labelled "a project some random person in Nebraska has been thanklessly maintaining since 2003"

How far behind on updates am I?

For instance, to get a view of how many updates are pending, per package manager (example):

select
  package_manager,
  count(*)
from
  renovate_updates
group by
  package_manager
order by
  count(*) desc

Alternatively, how many updates (and whether they're i.e. major bumps) per package manager (example):

select
  package_manager,
  update_type,
  count(*)
from
  renovate_updates
group by
  package_manager,
  update_type
order by
  count(*) desc

Custom advisories

"we don't like Spring Boot"

select
  *
from
  renovate
where
  datasource = 'maven'
  and package_name like 'org.springframework%'

Alternatively:

insert into
  advisories (
    package_pattern,
    package_manager,
    version,
    version_match_strategy,
    advisory_type,
    description
  )
VALUES
  (
    'org.springframework*',
    'gradle',
    NULL,
    'ANY',
    'OTHER',
    'Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...'
  );

Resources

Getting started

# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo another-org/repo
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest --token $GITLAB_TOKEN your-org/repo another-org/nested/repo

# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'

Questions?

via GIPHY