Jump to content

NASA almost kills Mar Robot


Recommended Posts

STANFORD, CALIF. -- A software glitch that paralyzed the Mars "Spirit" rover earlier this year was caused by an unanticipated characteristic of a DOS file system, a NASA scientist said Monday.

The flaw, since fixed, was only discovered after days of agonizingly slow tests complicated by the limited "windows" of communication allowed by the rotation of Mars, said Robert Denise, a member of the Flight Software Development Team at NASA's Jet Propulsion Laboratory.

On Jan. 21, the Spirit rover stopped communicating with the teams on Earth, beginning a cycle where the rover would reboot itself, over and over. After days of tests, the team finally discovered on Jan. 26 that the issue was tied to what was originally reported as corruption inside the rover's onboard flash memory.

In a presentation at the Hot Chips conference here, Denise said that the real issue was an embedded DOS file system whose directory structure kept growing and growing. When the rover's embedded operating system then told the flash memory to mirror the data structure in RAM, the unexpectedly large file caused a fatal error and an almost continuous reboot cycle, he said.

Aside from the flash memory error, the recent voyages of Spirit and Opportunity have gone far better than expected. The mission was originally funded to last 90 sols, the equivalent of 90 Mars days, and come to an end last April. (One sol equals 24.65 hours.) Since both rovers have managed to stay "alive" far longer than anticipated, Denise said, the current funding will run out on Sept. 13, the beginning of the "solar conjunction," when Mars disappears behind the Sun and out of radio range. The lifespan of both rovers is really not known, he said.

On Sol 18, the mood among the JPL ground team was nothing short of "euphoric," Denise said. "Life was good," he said. "And then we missed a comms pass," a window in which the JPL team and the rover were supposed to exchange information.

The team didn't worry, at least initially. The team rechecked that its instruments were calibrated, and awaited the next pass a few hours later. Over the next few days, however, nothing went right, Denise said. The team determined the rover was functional; it could emit a status "beep", proving it was online. Other passes, however, generated just pseudorandom noise, indicative that the rover was online, functioning, but that no data was passing through the antenna. The rover, meanwhile, was rebooting hundreds of times a day.

The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.

By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.

According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.

The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.

Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.

Dynamic allocation of memory is considered a no-no in embedded systems, precisely because of the possibility of a system crash, attendees said. Denise acknowledged that JPL's tests only allowed for the addition of a small number of data files, and that the exception slipped by. "We made an exception and got bit by it," he admitted.

The team finally got the rover up and running by essentially using the system RAM as simulated flash, discovered the error, and disabled the dynamic allocation feature, Denise said. The flash memory was erased, and the JPL engineers installed a utility that monitors the file system, and treats the memory heap as a consumable resources.

Denise's keynote address to the Hot Chips audience lasted about an hour, twenty minutes or so dedicated to the flash-memory issue. At the end, he summed up the issue for the small percentage of the audience who weren't engineers: "The Spirit was the willing, but the flash was weak."


Link to comment
Share on other sites

  • 16 years later...

The rover has been operating much longer than its planned 90 sols (Martian solar days). Thanks to the solar panels being cleaned by the natural wind of Mars, power generation has increased significantly, which is why Spirit continued to function effectively for a long time, eventually significantly exceeding its planned life. Spirit traveled 7.73 km instead of the planned 600 m, which made it possible to make more extensive analyzes of the geological rocks of Mars.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...