DreamObjects data (maybe) corrupted, support can't tell me more

dreamobjects

#1

TL;DR: DreamObjects migration was failing; I was told some files had been corrupted upon upload; it’s been 3 weeks with no resolution nor details.

Hi there.

I think I’m generally a patient person, but I am fed up with the (lack of) response of Dreamhost’s Support.

Due to the change in the data centers I was told to migrate my DreamObjects data. Why this was not automatic I cannot guess. But ok, I started migrating my data, I had four users each with one bucket. How hard could it be?

Of these four migrations, 2 got stuck.

I opened a support ticket on the 5th of September. I heard nothing and wrote back every few days asking for updates. Finally on the 18th, they identified three files as the source of the trouble:

Our Cloud Engineers believe that those files are corrupted, most likely
from when they were first uploaded.

And they assured me:

You are safe to start using the new DreamObjects endpoint
(objects-us-east-1.dream.io) in your applications, in spite of that
issue.

I wasn’t so confident about that. I explained that I needed to know more info here. What does “corrupted” mean, and what exactly went wrong with the process? Do I need to submit a bug report to the uploading application? And how can a corrupted file be sitting on their system undetected for months? Why didn’t it get flagged upon upload?

In short, what are we doing to prevent this from happening again?

I sent them my checksums for those files (yes they’re big files, multi-part uploads) and this was their reply:

Hmm. Curious. Those md5sums don’t match the md5sums currently in the
US-West 1 cluster version of the bucket, but they do match what we get
when we download from the US-West 1 cluster to a Linux server. Since
these objects are greater than 5 GB, a multi-part upload would have been
required, so maybe the md5sums aren’t going to match and we can just
force the migration anyways, but we’re still working with our
DreamObjects engineers to verify that fact.

I also verified that I could download these files (from the old cluster) and they were not corrupted. But still no answer to my questions. I repeated them. (Along with a few other questions, e.g., “What do you mean by md5sums in the case of multi-part uploads?”)

Yesterday (almost 3 weeks since the start of this), I got this reply [edited for brevity]:

Unfortunately at this moment I don’t have details on why our “Cloud
Engineers believe that those files are corrupted”, …
I have to check with the support representative who
worked with the engineers on this particular case.

I’ve forwarded this newest support ticket / thread to him now, but as I
understand it he’s out of the office for a few days, so it will take a
little bit to get a response.

At this point I’m out of patience and don’t know what to do to get this escalated. And really, the support person is out of the office? How am I supposed to react to that? (“Oh, ok, I’ll wait. Hope he’s having a great time.”) How many other tickets are on hold?

Is Dreamhost serious about data storage or is it just a side project? Ok, I admit I’m a small user but what if my business were depending on this data every single day? How do they expect to keep customers when they can’t quickly fix a problem—or even give me enough info to feel confident to continue using their storage?

Exactly what is the point of using a cloud storage provider if they can’t give me immediate support? Why not just buy a NAS?

Thanks for listening!


#2

Hi there!

We apologize for any frustration! If you can supply us with a support ticket# or domain name on the account we can look into the status and get it escalated over to a manager,

Thanks!
Matt C


#3

There are now several ticket numbers:

8343861
8346583
8348612

[It appears that new ticket numbers are created every time I reply to an email. I would suggest that’s not the best system.]

Thanks.


#4

Thank you for those ticket #'s I was able to get them over to a support manager to look into and they were also sent over to our Cloud team to investigate further, they will update you by email accordingly.

Thanks!
Matt C


#5

Got a reply from support (not a manager nor an engineer).

Our DreamObjects engineers have finally been able to finish their
investigation into why those three objects were not migrated. But first a
brief discussion of terms. DreamObjects is built on Ceph which utilizes a
RADOS Gateway as the in/out access point for all data. Due to a various
assortment of RADOS Gateway bugs, the initial upload of those three
objects did not actually complete successfully, but the aforementioned
bugs combined in a unique way that caused Ceph to incorrectly catalog the
objects in the bucket index. Unfortunately due to those bugs, we are not
able to recover those objects.

Needless to say, this does not inspire confidence in the system. :wink:


#7

i have got the similar failing migration issue with one of my website while migrating, it says files are corrupted and i could not find a way to make it happen and migrate it properly?

Source: www.thetechnologypost.com


#8

Do you know exactly which files are corrupted? And if so, are they particularly large (e.g., > 5Gb)?

Unfortunately, the problem hasn’t gone away with the new cluster. I re-uploaded my failed files to the new cluster and asked the support team to verify them. Their reply:

I’m getting inconsistent results when verifying those new uploads.

Again, I wish the support team wouldn’t speak to me in vague terms. What on earth does “inconsistent results” mean? I very much hope their servers are not giving inconsistent results.