User Manual for DUDE - DUplicate text DEtection

A joint project of ACM SIGDA and IEEE CEDA

Hello, Ms./Mrs./Mr. Conference Chair! Here is how to use the DUDE system to perform duplication checks on submissions to your conference.

ACM SIGDA logo                

Contents

  1. Get the software and the password
  2. Create plaintext versions of your pdf files
  3. Get Hash Codes from DUDE server
  4. Generate Reports for your reviewers
  5. Create hash codes from your own submissions
  6. A paper is withdrawn
  7. Camera Ready papers come in
  8. At long last, you hold the conference
               

Contacts

  • IEEE CEDA: Dr. Lou Scheffer, Cadence
    lou bat cadence bought com
  • ACM SIGDA: Prof. Igor Markov, University of Michigan
    imarkov vat umich caught edu
  • University of Michigan: Stephen Hufnagel
    shuf cat umich got edu

Valid HTML 4.01 Transitional
IEEE CEDA logo


Get the software and the password

You will need:

Please create a writeable directory for all DUDE work. In this directory, create

Now you are ready to begin!


Creating .txt files

DUDE works with .txt files, so we need plain text files containing the contents of each paper. To do this, run the DUDE program, give it command 'T' (for text). It will ask you for a directory name. It will attempt to convert each .PDF file in the directory to text, and store this in a parallel .txt file.

If conversion fails for any files, you will get a message. You can then try to convert with other programs (perhaps the full Adobe Acrobat), or ask the author for a plaintext version, or just plain continue without that paper. In any case, please let Igor and Lou know of any conversion problems.


Getting the Hash Codes

In this step, the DUDE program contacts the University of Michigan server and gets copies of hash codes from all papers known to the server. It retrieves this as a compressed tar file, then unpacks it. The unpacked files occupy about 65 megabytes, and will be stored in a directory "HashCodes".

You can also do this manually, using the command "wget --http-user=user --http-passwd=passwd http://sigda.eecs.umich.edu/DUDE/WellKept/HashCodes.tar.gz", then unpacking it manually with "tar xzf HashCodes.tar.gz". Please cut and paste the commands above because sometimes two dashes cannot be distinguished from one dash.


Generating Reports

Run the DUDE program, if not already running, and select the 'R' option (for report). This will generate a report, listing for all submissions the other papers with the most common phrases. Also, for each submission that is above the matching threshold (currently 5%) it generates a web page that highlights the identical phrases. These pages are stored in the 'Submissions' directory, in parallel to the original submissions.

Reading the annotated web page: (See a sample comparison page) If the reports flag too many files, you may want to increase the similarity threshold to 15% or even 20%. This can be done using the command
  setenv THRESHOLD 15.0
before running the DUDE executable in the same session (if you open another window or start a new login session, you must run this command again).

Creating Hash Codes, and send them to the server

Run the DUDE program, if not already running, and select the 'H' option (for Hash codes). This will create a hash digest file for all submissions. To send these manually, do this: If you are more trusting, or have examined the source code to be sure we are sending only what we claim, then just run DUDE and type the 'S' option.

Handling a withdrawn paper

If a paper is withdrawn, the system needs to forget about it, so other will not find it as a possible similar paper. Run DUDE with the 'W' option, then enter the ID of the paper.

Submitting camera copies to our database

It is very important to submit camera-copies to the DUDE database, so that they replace the hash digests of original submissions. This removes information about rejected papers from the database and thus reduces spurious overlaps with later conferences. While the submission process can be autoamated, we currently prefer you to post an archive (tar.gz or zip) with PDF files for us to download. We discourage large email attachments, but are willing to consider them if all else fails.

Conference organizers may want to compare each camera copy against the original submission. Substantially different camera copies may warrant more careful checks (e.g., if the authors remove key results and resubmit to another conference, which has happened previously).


Hold the Conference

At this point the accepted papers become public.

Run the DUDE program, if not already running, and select the 'F' option (for Finally holding the Conference).


Frequently Asked Questions

Please see a separate document.

Project Information

The DUDE project is currently in its experimental stage. Most of the software has been written and works. We are evaluating it with several pilot conferences, and hope to have it approved for use in all SIGDA and CEDA sponsored conferences within a year. If you have questions, comments or suggestions, please drop us a note using email (see addresses at the top of the page).


Igor Markov and Lou Scheffer