Tabulation data set concept

From Trust The Vote

Jump to: navigation, search

This sketch is one approach to managing information during tabulation. The key idea here is that instead of creating totals and subtotals, instead we accumulate a single data set with totals from all the various vote counters, annotated and documented.

The advantage of this scheme is that it is always clear and possible to determine the 'makeup' of any total count for a contest.

Contents

Design Congress Note

At this writing, the stakeholder community, comprised principally of states' elections directors, IT managers, and other related officials, is forming with a planned work list of materials to provide review and comment. This may well fit within that set of items. A protocol for identifying and flagging such content within the Wiki would be helpful.

Format of the data set

The data set consists of a set of rows, one for each vote counter. There are columns for _all_ the different contests that happen in any part of the jurisdiction.

Columns

There are two groups of columns:

  1. Identification columns - These are a set of columns that fully identify the place and time where the votes recorded in this row occurred.
  1. Vote columns - There is a group of columns for _each_ contest and question across all the precinct in the jurisdiction for this vote. The number of columns corresponds to the number of possible responses for each contest or question.

Example

There would be one row for each vote counting device, i.e. optical scanner, DRE, etc.

For a particular election in a particular jurisdiction, all tabulation data sets have the same set of columns. Here's the way one might look:

  • jurisdiction name
  • date and time of count readout
  • name of responsible vote personnel
  • # of votes in contest 1 for selection 1
  • # of votes in contest 1 for selection 2
  • # of votes in contest 2 for selection 1
  • # of votes in contest 2 for selection 2
  • # of votes in contest 2 for selection 3
  • # of votes in contest 3 for selection 1
  • # of votes in contest 3 for selection 2
  • # of votes in contest 4 for selection 1
  • # of votes in contest 4 for selection 2
  • # of votes in contest 4 for selection 3

Notes

  1. As more counting devices are 'read' and combined, the resulting tabulation data set gets more and more rows, but the columns are the same ones all along.
  2. The particular set of columns are determined by the Election Management System
  3. N.B. No subtotals are computed as the counting devices are read. The tabulation data set maintains complete drill down and audit information all through the process.

Open issues

  • Including counts of undervotes, overvotes, and other invalid votes for each contest is critical in order to be able to audit the results, to flag possible problems with the voting system, and to provide important insights into the interests of the electorate,
  • Handling of write ins. The problem is that write-ins introduce additional 'data' for a response. I would say that it is necessary that (with human intervention, see EJB below) this information is captured in the Tabulation data set, but it does break the simplicity of an n by m table. More thought needed.

[From EJB: Write-ins are mainly handled by central count activities. (To enable that, in the polling place ballots with write-ins are segregated from ballots without.) Part of the central count process is a machine IDing a particular ballot's particular contest with a selection of write-in, and a person deciding whether the write-in is legitimate - a very state specific thing. But in any case, for legit write-ins, there is human data entry of the written-in choice. So that entered data should become part of the ballot dataset, which should then flow through to the tabulator dataset

Personal tools