Archive for the 'Software' Category

ArcGIS Python Script Debugging as it Outta Be

Setting up debugging of ArcGIS Python Geoprocessing Scripts used to be quite an exercise in frustration. 

However, with the Komodo IDE (for scripting languages like Python and Perl), I was pleasantly surprised this is now as simple as

  1. Download and Install the Komodo IDE from Active State ($300 to use after the 21 day free trial)
  2. In the IDE settings, set the User Environment Variable:
    PYTHONPATH=C:\Program Files\ArcGIS\bin
  3. Copy/paste any arguments/parameters you want to test with from the ArcCatalog/ArcToolbox command line output into the Debug Command Line Arguments

And it just works.  How come this doesn’t happen more often with software? 

Your Cheatin’ File Formats

A privacy problem that seems to go largely unnoticed is the issue of personal data that is hidden away in computer documents without their creators’ knowledge.  In fact, nearly all of the most common and popular document formats use such metadata to tuck away all sorts of nifty descriptive information about the document.  Here are just a few examples:

  • When it was created/changed
  • Who made the changes based on User Name or other Operating System-captured name
  • Applications used – including watermarking or similar identifying information tying a document directly back to the exact copy of software or hardware that created it
  • And on and on

Unless you use only text (.txt) files to store data, then odds are pretty good that your documents (MS Word, PDF, JPEG, etc.) have gobs of this type of extra information attached.  And in most cases, while perhaps overdone by complex document formats, this additional document information is intended to be a useful thing and not stored for any nefarious, privacy-intruding purposes.  

However, privacy issues can quickly arise when these documents are then published to the web.  In this scenario, they can reveal personal information through their metadata that their users never desire or intend to be published. 

A perfect example of this situation that has entered the annals of Web Lore is the Cat Schwartz (of  circa-2000 TechTV fame) cropping wardrobe malfunction.  An original topless image was cropped to just an innocuous head shot and posted to her blog, but oops, the metadata thumbnail still contained the original uncropped topless photo.  Just a small, yet-shocking example of hidden metadata stored in only one such complex and ubiquitous Internet data exchange format – in this case a JPEG with EXIF metadata. 

So what are users to do that want to “scrub” all personal information and metadata from their documents before posting to the web?  Unfortunately, there appear to be no easy, one-size-fits-all solutions to this problem.  Application vendors have little to gain and much to lose by stripping out such metadata.  These applications need to have access to this metadata to provide increased functionality and the market appears to make it clear that users value this functionality over privacy.  Even when vendors do provide mechanisms to eliminate such data, they make it cumbersome and onerous.  Third party solutions often only work on one specific complex data format. 

Windows Vista surprisingly does provide a mechanism for doing this (Properties | Advanced | “Remove Properties and Personal Information“), but this only removes some of the obvious metadata that Windows can identify and does nothing with vendor specific data.  Also, you have to actually manually select the file(s) – it can’t recursively cleanse subfolders.

Take the simplest of examples: How do I remove personal data from my JPEGs before I post to public photos sites?

The Windows Vista “Remove Properties” tool doesn’t help because it only handles a few of the obvious EXIF data items (like Title, Author, Tags, etc.), but there are literally hundreds of others unhandled (even the very obvious ones like “Taken On” date and editing application).  Thus for even this simplest example, the user is forced to turn to a third party tool like ExifTool - an impressive, but somewhat geeky and command-line driven EXIF metadata utility that includes a cleaner.  One could also save the JPEG to a different format that doesn’t support EXIF metadata like BMP or PNG, but get ready for some serious size bloat as the compression is lost.

To “quickly” achieve this, I just gave up and wrote my own (C# source code below-now how’s that for geeky?) - but it is only a marginal success because it only handles the Text metadata.  When I tried to just remove all metadata, I got some troublesome results (the compression was removed, or the changes were just ignored because they caused inconsistencies).  This is a worrisome example of how even someone who is actively committed to removing all of this information can be thwarted.  But I figured the text attributes included most of information that someone might want to scrub anyway (like dates, programs, etc.). 

So there is one complex data format partially down, thousands more to go.  Privacy really shouldn’t be this hard folks…

  
// Disclaimer: Use of this code is done so entirely at your own risk.
// This software is provided "as is" without warranty of any kind
// C# Snippets/Class to remove image text metadata from a jpeg file
// Note: removing non-text metadata can have undesired effects of
// altering the compression or other image characteristics
class ExifTextCleanser
{
   public static void RemoveImageTextPropertyItems(Image image)
   {
        foreach (PropertyItem pi in image.PropertyItems)
        {
            // if it's text, remove it
            if (pi.Type == 2) // 2 = Text
            {
                image.RemovePropertyItem(pi.Id);
            }
        }
   }

   public static void PrintImageTextProperties(Image image)
   {
    Console.WriteLine("properties id count=" + image.PropertyIdList.Length);
    Encoding encoder = new ASCIIEncoding();

    // Print all Image PropertyItems
    foreach (PropertyItem pi in image.PropertyItems)
    {
       if (pi.Type == 2) // 2 = Text
       {
        string textProperty = encoder.GetString(pi.Value);
        Console.WriteLine("Property, ID=" + pi.Id + ", value=" + textProperty);
       }
    }
   }

   public static void CleanseJpeg(string originalFileName, string newFileName)
   {
    if (!(originalFileName.ToLower().EndsWith(".jpg") ||
          originalFileName.ToLower().EndsWith(".jpeg")))
    {
     Console.WriteLine(originalFileName + " not a JPEG.");
     return;
    }

    Bitmap bitmap = new Bitmap(originalFileName);

    PrintImageTextProperties(bitmap); // take a peek at this metadata info

    RemoveImageTextPropertyItems(bitmap); // then nuke it

    // save the cleansed version of the file
    bitmap.Save(newFileName);
   }
}

Fighting with Windows Vista to Register COM Components

With great trepidation, I starting developing on my Windows Vista machine this month.  Although I had gotten the Vista computer last August, it was just too unreliable for me to risk my daily productivity with.  This unreliability included things like being unable to perform the very basic functions of an Operating System, such as file management.  Just a few examples of things I’ve encountered: 

  1. Couldn’t copy files because of an “Out of Memory” error (description and Vista Hotfix)
  2. Frequent File Explorer and Windows Photo Gallery crashes and lock ups when you rename a file (no workaround)
  3. Copying and extracting zip files is excruciatingly slow (description and workaround)

I generally like new technologies and I definitely like to see when User Interfaces are redesigned to make them easier to use.  Some of that happened with Vista but in general it is unreliable bloatware that will only serve to further diminish Microsoft’s reputation and market share.

But while I could carp about the technology failure that is Vista all day, the primary point of this post is to offer a few hints and “Gotchas” for those unfortunate souls who still need to do ATL/COM development.  Vista adds User Account Control security which adds some requirements for registering COM components.   So here is the advice:

  1. Create a shortcut to Visual Studio.NET that is set to “Run as Administrator” (Properties | Shortcut Tab | Advanced | Run As Administrator Checkbox)
    • Even though your account is an “Administrator” account in Vista, unless you select the ”Run as Administrator” option your regsvr32 calls made during the build will fail.
    • Creating a shortcut and setting this option is the easiest way to do this so you won’t forget
  2. If like me, you create batch files to regsvr32 a bunch of COM dlls/components, you may suddenly notice that these batch scripts don’t work in Vista (even when you Run as Administrator).  To make these work in Vista:
    • Follow the same steps as above for creating a shortcut to the batch file and setting to Run as Administrator
    • When you Run as Administrator a batch file, Vista sets the current directory to C:\Windows\system32 (versus running from the current directory like it used to in the good ‘ol days), so you need to change the current directory back in your batch file
    • So here is a sample modified regsvr32 batch:

REM Stuff Added to get regsvr32 batch to run in Vista

REM Change the directory back to where this ran from (instead of System32)
set INSTALL_PATH=C:\MyFiles\PATH-TO-COM-Dlls\bin
cd %INSTALL_PATH%

REM Original Stuff that ran pre-Vista

regsvr32 MyCOMComponent.dll
pause

Government Inaction and the Google “Airbrush” Conspiracy

The threshhold for what exactly constitutes and rises to the level of “conspiracy” seems to be getting lower and lower.

Last week, the House Committee on Science and Technology, Investigations and Oversight Subcommittee (yes, that is the real name) chairman Brad Miller, D-North Carolina accused Google of “airbrushing history”  He wrote (in an open letter to Google):

 ”Google’s use of old imagery appears to be doing the victims of Hurricane Katrina a great injustice by airbrushing history.”

Now, the real scandal and conspiracy here involves the huge sums of money the government spends on acquiring and processing map data (satellite imagery, street/road maps, and other geospatially descriptive data).  If the government was doing its job, it would have provided a similar easy-to-use satelite imagery service to the public many years ago. 

The government has spent many 1000s of times more than even uber-rich Google ever could, yet let’s see the federal governments version of Google Maps – http://www.geodata.gov.  Huh?  What am I supposed to do with this site again?

Now one could argue that letting people see satellite pictures of their house is not necessarily a key mission of the federal government.  But how about providing satellite imagery for the national Defense?  Are they doing any better of a job here?

This guy doesn’t seem to this so:

“Google Earth’s major problem was not it’s ease-of-use, but the manner in which it showcased the shortcomings of the American NGA (National Geospatial Intelligence Agency).” 

If a national web map had followed the GPS model, firstly it would actually work and provide a valuable service and secondly there would be 2 levels of imagery available to everyone in the world: one for those with a National Defense (or verified commercial need) and one for the rest of us.

So by not providing these services, the government creates a commercial need which then actually imperils us all.  For proof of this, read “Terrorists ‘use Google maps to hit UK troops’

The “conspiracy” has been abated by the way, Google has buckled under Congression pressure and restored the post-Katrina imagery.  Wow, that was a lot quicker and easier than getting the government to do its job – can we outsource all of the other government functions to Google as well?