A More Trustworthy Web?

Sandro Hawke, W3C Fellow
sandro@w3.org

For W3C AC Meeting
May 2018, Berlin

Credible Web CG

Timeline

Formed fall 2017 by Credibility Coalition folks
Initial meetings at TPAC
Started regular meetings last month
Nearing consensus on WG-like charter
F2F meeting planned for July

Framing the Problem

Can we shift the Web to empower end users
in their ongoing efforts
to decide which web content is trustworthy
and to avoid being misled or deceived?

Incremental approach

Establish common vocabularies (schemas)
- for sharing data on the web
- ... relevant to credibility assessment
- schema.org is accepted in news industry
- ... and some major platforms

What might folks share
that will help others know
what to trust?

Project Area 1: Inspection

Content and content providers may have observable features that indicate credibility. Can the CG identify these features and allow them to be annotated?

example: "emotionally charged tone"

Annotation by: friends, volunteers, paid sources, AI

Usable by search, feed algorithms, UI

Risks:

impact of false positives/negatives
gameability arms race, like SEO

Early draft (from CredCo) at https://credweb.org/cciv

Project Area 2: Corroboration

Identify claims in content; check them against evidence

See: Int'l Fact-Checking Network, ClaimReview, RelatedFactChecks

Input-Side (Helping Fact-Checkers)
- Collaborate on identifying checkable claims
- Share urgency/demand data
- Add context (time, geo) to claims
Output-Side (Claim Review++)
- Expose provenance
- Cross-link to increase trust
- Convey more nuance/detail

Project Area 3: Reputation

Help people maintain and use their trust networks

Better human/machine collaboration around trust
Allow end-users to use their preferred trust networks
Help users track quality of sources
Bootstrap from: contacts? followers? co-authoring?

Risks:

asymmetry: humans see negative info as more salient
coerced statements (eg need for secret ballot)

Vocab example: { <example.com> :domainCredibility 0.80 }

Project Area 4: Transparency

Help folks self-report data impacting credibility (in some context)

Examples: disclosing business model, investors, jurisdiction

Publishers use structured markup in labeling themselves and their content, intended to highlight what makes them credible
See https://theTrustProject.org/

Risks:

by itself, can allow malicious folks to appear extra trustworthy
burden for small sources

Getting started...

Researchers (DFKI, MIT, Indiana, ...)
Small vendors (Hypothes.is, Meedan, FactsMission, ...)
News Media (AP/IPTC, BBC, ...)
Search/NewsFeed Platforms (Google, Facebook, Bing, ...)
Trust-related businesses (AirBnB, NIC.br, ...)
Liaison (CredCo, Tech & Check, IEEE P7011, ...)
Broader community

San Francisco meeting, end of July

More at https://credweb.org

These slides at http://hawke.org/talk-ac-2018

Bonus: What We're Not Doing

Out of scope 1

We won't be identifying "legitimate" content providers

We're not going to:

decide who gets their content seen
set standards for making that decision
develop tech for whitelists/blacklists beyond end-user control

because:

Political polarization would make consensus unlikely
Even within a community, the lines are not clear
Centralizes power, could turn into censorship
Would likely diminish the smaller voices (anti-web)

Out of scope 2

We won't build an AI to decide what's trustworthy

Because:

Unclear if it's possible without superhuman AI
Imperfect systems might make folks even more vulnerable

but:

Hybrid systems (humans in the loop) can be great
AI-based personal tools (eg spam filters) are good
We'll help ecosystem for the training data