Jamie Tanna (https://www.jvt.me)
Senior Software Engineer @ Elastic
/usr/bin/whoami
As I wrote in the post Analysing our dependency trees to determine where we should send Open Source contributions for Hacktoberfest
But it's not always ☀🌈
💖
Do you fully appreciate the depth of your dependency on the software supply chain?
Being able to understand how your business uses Open Source is really important for a few other key reasons:
Being able to understand how your business uses Open Source internal software is really important for a few other key reasons:
Other insights into:
How can we do it?
💰🤑💸
Let's use Open Source!
Dependency Management Data (DMD) - dmd.tanna.dev
What's in the project?
dmd
dmd-web
, and the GraphQL-only web application dmd-graph
SQLite database
dmd
- all state synced to the DB
dmd
dmd-web
dmd-graph
commit 73a99614a2af6fa9f66508bab8541ed65e18ed66
Author: Jamie Tanna <>
Date: Thu Feb 2 09:23:43 2023 +0000
Initialise project
LICENSE.md | 13 +++++++++++++
README.md | 7 +++++++
public/index.html | 90 ++++++++++++++++++++++++++++++++++++
3 files changed, 110 insertions(+)
# produce some data that DMD can import, i.e.
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo
# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# optionally, generate advisories
dmd db generate advisories --db dmd.db
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'
Currently:
renovate-graph
: Use Renovate as the source
dependabot-graph
: Use GitHub's Dependabot API
endoflife-checker
: Pull in i.e. AWS infrastructure
When importing:
Once ingested, write SQL to your heart's content 🤓 i.e.
Pre-baked, Open Source'd queries:
$ dmd report --help
advisories Report advisories that are available for packages or dependencies in use
dependenton Report usage of a given dependency
golangCILint Query usages of golangci-lint, tracked as a source-based dependency
infrastructure-advisories Report infrastructure advisories
licenses Report license information for package dependencies
mostPopularDockerImages Query the most popular Docker registries, namespaces and images in use
mostPopularPackageManagers Query the most popular package managers in use
policy-violations Report policy violations that are found for packages or dependencies
Right now we can write SQL queries to ask:
what Terraform modules and versions are being used across the org?
which teams are using the Gin web framework?
But what if we could ask:
are any of our projects relying on libraries that no longer are recommended by our language guilds?
how much time should my team(s) be planning in the next quarter to upgrade their AWS infrastructure?
Dependency advisory data sources:
AWS infrastructure advisory data sources:
🤫
Custom advisories 🦸
Community provided advisories via -contrib:
INSERT INTO custom_advisories (
package_pattern,
package_manager,
version,
version_match_strategy,
advisory_type,
description
) VALUES (
'github.com/golang/mock',
'gomod',
NULL,
NULL,
'UNMAINTAINED',
'golang/mock is no longer maintained, and active development been moved to github.com/uber/mock'
);
package policy
import future.keywords.contains
import future.keywords.if
default advisory_type := "UNMAINTAINED"
deny contains "Use the new GitLab.com server" if
startswith(input.dependency.package_name, "gitlab.example.com/")
// Versions of Gin >= 1.9
deny contains msg if {
input.dependency.package_manager in {"gomod", "golang"}
input.dependency.package_name = "github.com/gin-gonic/gin"
versions[0] in {"v1", "1"}
to_number(versions[1]) >= 9
msg := sprintf("%s. Versions of Gin since v1.9.0 have shipped " +
"ByteDance/sonic as an optional dependency, but it still " +
"appears as a dependency, and could be in use - more " +
"details in " +
"https://github.com/gin-gonic/gin/issues/3653", [prefix])
}
Who does this production service using end-of-life software belong to?
dmd owners
+ owners
table to the rescue!
select
distinct
sboms.platform,
sboms.organisation,
sboms.repo,
owner
from
sboms
left join owners
on sboms.platform = owners.platform
and sboms.organisation = owners.organisation
and sboms.repo = owners.repo
-- where ...
How many customer-facing services are running an outdated version of this internal library?
And in the repository_metadata
table:
repo | additional_metadata |
---|---|
api-service | {"customer_facing": "true"} |
examples | |
business-service | {"customer_facing": "false"} |
select
distinct
sboms.platform,
sboms.organisation,
sboms.repo,
(case json_extract(repository_metadata.additional_metadata,
'$.customer_facing') when "true" then true
else false
end) as is_customer_facing
from
sboms
left join repository_metadata
on sboms.platform = repository_metadata.platform
and sboms.organisation = repository_metadata.organisation
and sboms.repo = repository_metadata.repo
-- where ...
"What package advisories do I have?"
organisation | repo | package_name | current_version | dep_types | advisory_type | description |
---|---|---|---|---|---|---|
alphagov | pay-selfservice | node | 18.17.1 | ["engines"] | DEPRECATED | nodejs 18 has been unsupported (usually only receiving critical security fixes) for 246 days |
elastic | beats | github.com/golang/mock | v1.6.0 | ["require"] | UNMAINTAINED | golang/mock is no longer maintained, and active development been moved to github.com/uber/mock |
monzo | response | python | 3.7 | [] | UNMAINTAINED | python 3.7 has been End-of-Life for 359 days |
via https://dependency-management-data-example.fly.dev/report/advisories
# there may be some folks using YAML anchors
sidecars: &sidecars
# or there could also be comments in here!
kafka: 1.2.3
app:
image: "internal.docker.registry/service-name"
*sidecars
In the renovate
table:
repo | package_name | current_version | package_file_path |
---|---|---|---|
good-service | internal-docker.tld/kafka | 0.3.0 | .hopper.yml |
affected-service | internal-docker.tld/kafka | 0.2.1 | .hopper.yml |
also-affected-service | internal-docker.tld/kafka | 0.1.0 | .hopper.yml |
Parsed via https://docs.renovatebot.com/modules/manager/regex/
And the repository_metadata
table:
repo | additional_metadata |
---|---|
good-service | {"tier": "tier_1"} |
affected-service | {"tier": "tier_1"} |
also-affected-service | {"tier": "tier_2"} |
Slightly garish query:
select
renovate.organisation,
renovate.repo,
current_version,
owner,
json_extract(additional_metadata, '$.tier') as tier
from
renovate
left join owners on
renovate.platform = owners.platform
and renovate.organisation = owners.organisation
and renovate.repo = owners.repo
left join repository_metadata on
renovate.platform = repository_metadata.platform
and renovate.organisation = repository_metadata.organisation
and renovate.repo = repository_metadata.repo
where
-- NOTE: that this is performed with a lexicographical match, which is NOT
-- likely to be what you are expecting to perform version constraint matching
-- but this is a good start for these use cases
renovate.current_version < '0.3'
order by
tier ASC
Result:
organisation | repo | current_version | owner | tier |
---|---|---|---|---|
deliveroo | affected-service | 0.2.1 | Grocery | tier_1 |
deliveroo | also-affected-service | 0.1.0 | tier_2 |
We could use the dependenton
query:
# for Gradle projects
$ dmd report dependenton --db dmd.db --package-manager gradle
--package-name org.apache.logging.log4j:log4j-core
+-------------------+---------+-------------------+----------------------------+-------------+
| REPO | VERSION | DEPENDENCY TYPES | FILEPATH | OWNER |
+-------------------+---------+-------------------+----------------------------+-------------+
| logstash | 2.17.1 | ["dependencies"] | logstash-core/build.gradle | Elastic |
| logstash | 2.17.1 | ["dependencies"] | logstash-core/build.gradle | Elastic |
| fake-private-repo | 2.13.0 | ["dependencies"] | blank-java/build.gradle | Jamie Tanna |
| fake-private-repo | 2.13.0 | ["dependencies"] | blank-java/build.gradle | Jamie Tanna |
+-------------------+---------+-------------------+----------------------------+-------------+
Or start with SQL:
select
platform,
organisation,
repo,
current_version
from
renovate
where
package_name = 'org.apache.logging.log4j:log4j-core'
With the versions affected:
select
platform,
organisation,
repo,
current_version
from
renovate
where
package_name = 'org.apache.logging.log4j:log4j-core'
and current_version in (
'2.0-beta9', '2.0-rc1',
'2.0-rc2', '2.0.1',
'2.0.2', '2.0',
-- ....
-- ....
-- ....
'2.13.0', '2.13.1',
'2.13.2', '2.13.3',
'2.14.0', '2.14.1',
)
Or with a Policy:
default advisory_type := "SECURITY"
versions := split(input.dependency.version, ".")
major := to_number(versions[0])
minor := to_number(versions[1])
patch := to_number(versions[2])
is_log4j2 if {
input.dependency.package_manager in {"gradle", "maven"}
input.dependency.package_name =
"org.apache.logging.log4j:log4j-core"
major == 2
}
// ...
Or with Policy (continued):
// ...
// CVE-2021-44228 aka Log4shell affects versions 2.0-beta9 to
// 2.14.1
is_vulnerable_version if input.dependency.version in {
"2.0-beta9", "2.0-rc1", "2.0-rc2"}
// CVE-2021-44228 aka Log4shell affects versions 2.0-beta9 to
// 2.14.1
is_vulnerable_version if {
minor > 0
minor <= 14
}
deny contains msg if {
is_log4j2
is_vulnerable_version
msg := "Dependency is vulnerable to Log4Shell CVE " +
"(CVE-2021-44228)"
}
As of dmd
v0.26.0, there was an inbuilt query that produced the following output:
$ dmd report gorillaToolkit --db dmd.db
Renovate
Direct dependencies
+------------------------------+---+
| PACKAGE | # |
+------------------------------+---+
| github.com/gorilla/mux | 4 |
| github.com/gorilla/websocket | 1 |
| github.com/gorilla/handlers | 1 |
+------------------------------+---+
Indirect dependencies
+---------------------------------+---+
| PACKAGE | # |
+---------------------------------+---+
| github.com/gorilla/websocket | 6 |
| github.com/gorilla/securecookie | 2 |
| github.com/gorilla/sessions | 1 |
| github.com/gorilla/schema | 1 |
| github.com/gorilla/mux | 1 |
| github.com/gorilla/context | 1 |
+---------------------------------+---+
Alternatively:
select
platform,
organisation,
repo,
arr.value as dep_type
from
renovate,
json_each(dep_types) as arr
where
package_name like 'github.com/gorilla/%'
group by
package_name
order by
dep_type desc
Results in:
platform | organisation | repo | dep_type |
---|---|---|---|
github | dagger | dagger | require |
github | deepmap | oapi-codegen | require |
github | elastic | beats | require |
github | tailscale | tailscale | require |
github | tailscale | tailscale | indirect |
$ dmd report mostPopularDockerImages --db dmd.db
+----------------------------------+-----+
| NAMESPACE | # |
+----------------------------------+-----+
| library | 499 |
| ghcr.io/gravitational | 18 |
| dockersamples | 12 |
| gcr.io/distroless | 11 |
| public.ecr.aws/gravitational | 10 |
| registry1.dsop.io/redhat/ubi | 10 |
| docker.mirror.hashicorp.services | 10 |
| wiremock | 8 |
| docker | 8 |
| docker.elastic.co/elasticsearch | 7 |
| cimg | 6 |
| hashicorpdev | 6 |
+----------------------------------+-----+
# produce some data that DMD can import, i.e.
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN \
your-org/repo another-org/repo
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest \
--token $GITLAB_TOKEN your-org/repo another-org/nested/repo
# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'